The introduction of ChatGPT has brought large language models (LLMs) into widespread use across both tech and non-tech industries. This popularity is primarily due to two factors:
- LLMs as a knowledge storehouse: LLMs are trained on a vast amount of internet data and are updated at regular intervals (that is, GPT-3, GPT-3.5, GPT-4, GPT-4o, and others);
- Emergent abilities: As LLMs grow, they display abilities not found in smaller models.
Does this mean we have already reached human-level intelligence, which we call artificial general intelligence (AGI)? Gartner defines AGI as a form of AI that possesses the ability to understand, learn and apply knowledge across a wide range of tasks and domains. The road to AGI is long, with one key hurdle being the auto-regressive nature of LLM training that predicts words based on past sequences. As one of the pioneers in AI research, Yann LeCun points out that LLMs can drift away from accurate responses due to their auto-regressive nature. Consequently, LLMs have several limitations:
- Limited knowledge: While trained on vast data, LLMs lack up-to-date world knowledge.
- Limited reasoning: LLMs have limited reasoning capability. As Subbarao Kambhampati points out LLMs are good knowledge retrievers but not good reasoners.
- No Dynamicity: LLMs are static and unable to access real-time information.
To overcome LLM’s challenges, a more advanced approach is required. This is where agents become crucial.
Agents to the rescue
The concept of intelligent agent in AI has evolved over two decades, with implementations changing over time. Today, agents are discussed in the context of LLMs. Simply put, an agent is like a Swiss Army knife for LLM challenges: It can help us in reasoning, provide means to get up-to-date information from the Internet (solving dynamicity issues with LLM) and can achieve a task autonomously. With LLM as its backbone, an agent formally comprises tools, memory, reasoning (or planning) and action components.
Components of AI agents
- Tools enable agents to access external information — whether from the internet, databases, or APIs — allowing them to gather necessary data.
- Memory can be short or long-term. Agents use scratchpad memory to temporarily hold results from various sources, while chat history is an example of long-term memory.
- The Reasoner allows agents to think methodically, breaking complex tasks into manageable subtasks for effective processing.
- Actions: Agents perform actions based on their environment and reasoning, adapting and solving tasks iteratively through feedback. ReAct is one of the common methods for iteratively performing reasoning and action.
What are agents good at?
Agents excel at complex tasks, especially when in a role-playing mode, leveraging the enhanced performance of LLMs. For instance, when writing a blog, one agent may focus on research while another handles writing — each tackling a specific sub-goal. This multi-agent approach applies to numerous real-life problems.
Role-playing helps agents stay focused on specific tasks to achieve larger objectives, reducing hallucinations by clearly defining parts of a prompt — such as role, instruction and context. Since LLM performance depends on well-structured prompts, various frameworks formalize this process. One such framework, CrewAI, provides a structured approach to defining role-playing, as we’ll discuss next.
Multi agents vs single agent
Take the example of retrieval augmented generation (RAG) using a single agent. It’s an effective way to empower LLMs to handle domain-specific queries by leveraging information from indexed documents. However, single-agent RAG comes with its own limitations, such as retrieval performance or document ranking. Multi-agent RAG overcomes these limitations by employing specialized agents for document understanding, retrieval and ranking.
In a multi-agent scenario, agents collaborate in different ways, similar to distributed computing patterns: sequential, centralized, decentralized or shared message pools. Frameworks like CrewAI, Autogen, and langGraph+langChain enable complex problem-solving with multi-agent approaches. In this article, I have used CrewAI as the reference framework to explore autonomous workflow management.
Workflow management: A use case for multi-agent systems
Most industrial processes are about managing workflows, be it loan processing, marketing campaign management or even DevOps. Steps, either sequential or cyclic, are required to achieve a particular goal. In a traditional approach, each step (say, loan application verification) requires a human to perform the tedious and mundane task of manually processing each application and verifying them before moving to the next step.
Each step requires input from an expert in that area. In a multi-agent setup using CrewAI, each step is handled by a crew consisting of multiple agents. For instance, in loan application verification, one agent may verify the user’s identity through background checks on documents like a driving license, while another agent verifies the user’s financial details.
This raises the question: Can a single crew (with multiple agents in sequence or hierarchy) handle all loan processing steps? While possible, it complicates the crew, requiring extensive temporary memory and increasing the risk of goal deviation and hallucination. A more effective approach is to treat each loan processing step as a separate crew, viewing the entire workflow as a graph of crew nodes (using tools like langGraph) operating sequentially or cyclically.
Since LLMs are still in their early stages of intelligence, full workflow management cannot be entirely autonomous. Human-in-the-loop is needed at key stages for end-user verification. For instance, after the crew completes the loan application verification step, human oversight is necessary to validate the results. Over time, as confidence in AI grows, some steps may become fully autonomous. Currently, AI-based workflow management functions in an assistive role, streamlining tedious tasks and reducing overall processing time.
Production challenges
Bringing multi-agent solutions into production can present several challenges.
- Scale: As the number of agents grows, collaboration and management become challenging. Various frameworks offer scalable solutions — for example, Llamaindex takes event-driven workflow to manage multi-agents at scale.
- Latency: Agent performance often incurs latency as tasks are executed iteratively, requiring multiple LLM calls. Managed LLMs (like GPT-4o) are slow because of implicit guardrails and network delays. Self-hosted LLMs (with GPU control) come in handy in solving latency issues.
- Performance and hallucination issues: Due to the probabilistic nature of LLM, agent performance can vary with each execution. Techniques like output templating (for instance, JSON format) and providing ample examples in prompts can help reduce response variability. The problem of hallucination can be further reduced by training agents.
Final thoughts
As Andrew Ng points out, agents are the future of AI and will continue to evolve alongside LLMs. Multi-agent systems will advance in processing multi-modal data (text, images, video, audio) and tackling increasingly complex tasks. While AGI and fully autonomous systems are still on the horizon, multi-agents will bridge the current gap between LLMs and AGI.
Abhishek Gupta is a principal data scientist at Talentica Software.
The post Why multi-agent AI tackles complexities LLMs can’t appeared first on Venture Beat.