Agents
Any entity capable of perceiving its environment and taking action can be considered an agent. Eg: A Roomba can perceive its environment (floor of the home) and take action (move in any direction + vacuum). It can also (mostly) work in any home without much supervision. It can rely on its past experience (after mapping the whole house) to know which direction it should go and which direction it should not go.
An agent usually exhibits these 4 characteristics:
- Autonomy
- Perception (any way to perceive data eg: vision, text, audio, etc.)
- Decision making
- Action making
Ecosystem
There are usually 5 entities involved in the Agents ecosystem
- Agent
- Environment
- State
- Action
- Reward
An agent acts in the environment. Based on this action, the environment can change its state. Based on this change of state, the environment can reward the agent positively (in case the action was correct) or negatively. The agent observes this reward, reflects on its action, and subsequently makes decisions.
Of all this, the environment is of particular importance. The agent is usually bound to the environment and can send actions to the environment in a way which environment understands. This is crucial since this indicates that usually an agent who is made (i.e. trained) to work in a particular environment, might not work in any different environment.
Intelligent agents
But Roobma is not a very intelligent agent. What makes an agent intelligent? Learning, observing your environment, and taking self-feedback based on your past decisions make an agent intelligent. Roomba’s intelligence is very limited. It only learns the floor plan. What if it can learn where not to go, and what time to start, which areas should be cleaned more, all on its own?
How to improve agentic systems?
- Add an intelligent brain that can dictate what to do and break down the tasks in a well-defined manner
- Some sort of an auditor that evaluates their action and gives them feedback. This can also be a self-reflection where the agent asks itself again to correct it
- Once an agent has learned how to use a tool, it should not figure out how to use the tool again. We would call this a skill.
The industry is moving towards a new trend where they are using LLMs to power their agents.
- Agents can use the world model of LLMs to understand the environment. They need not make agents understand a new environment as long as that environment is generic enough to be included in the world memory
- A lot of work has gone into guiding the LLM to the correct answer using prompt engineering, CoT, etc. This has improved LLM’s performance to reason and come with an acceptable output
- We can also attach a memory module (eg: Vector DBs) to help with recall, reduce rework, improve guidance, and store skills.
- There are multiple frameworks around how we can make an LLM do self-reflection and ask for self-feedback to make sure they don’t hallucinate and improve performance.
Challenges
- Real-time or NRT performance
- Objective evaluation framework
- Continuous learning of the base LLM (currently limited by the training cutoff)
References
- Exploring Large Language Model based Intelligent Agents: Definitions, Methods, and Prospects
- Lilian wang blog
- Voyager
- Earlier they trained the model on streams of people playing Minecraft where the person usually described what they were doing in the game (video + transcribed text)
- Later they just moved to LLM-based agents and removed the whole training loop
- Open interpreter: An agentic system that can interact with your laptop
- Chat dev: A small software company that can build simple software
- Talk by Andrew Ng