Enhancing large language models (LLMs) with knowledge beyond their training data is an important area of interest, especially for enterprise applications.
The best-known way to incorporate domain- and customer-specific knowledge into LLMs is to use retrieval-augmented generation (RAG). However, simple RAG techniques are not sufficient in many cases.
Building effective data-augmented LLM applications requires careful consideration of several factors. In a new paper, researchers at Microsoft propose a framework for categorizing different types of RAG tasks based on the type of external data they require and the complexity of the reasoning they involve.
“Data augmented LLM applications is not a one-size-fits-all solution,” the researchers write. “The real-world demands, particularly in expert domains, are highly complex and can vary significantly in their relationship with given data and the reasoning difficulties they require.”
To address this complexity, the researchers propose a four-level categorization of user queries based on the type of external data required and the cognitive processing involved in generating accurate and relevant responses:
– Explicit facts: Queries that require retrieving explicitly stated facts from the data.
– Implicit facts: Queries that require inferring information not explicitly stated in the data, often involving basic reasoning or common sense.
– Interpretable rationales: Queries that require understanding and applying domain-specific rationales or rules that are explicitly provided in external resources.
– Hidden rationales: Queries that require uncovering and leveraging implicit domain-specific reasoning methods or strategies that are not explicitly described in the data.
Each level of query presents unique challenges and requires specific solutions to effectively address them.
Explicit fact queries
Explicit fact queries are the simplest type, focusing on retrieving factual information directly stated in the provided data. “The defining characteristic of this level is the clear and direct dependency on specific pieces of external data,” the researchers write.
The most common approach for addressing these queries is using basic RAG, where the LLM retrieves relevant information from a knowledge base and uses it to generate a response.
However, even with explicit fact queries, RAG pipelines face several challenges at each of the stages. For example, at the indexing stage, where the RAG system creates a store of data chunks that can be later retrieved as context, it might have to deal with large and unstructured datasets, potentially containing multi-modal elements like images and tables. This can be addressed with multi-modal document parsing and multi-modal embedding models that can map the semantic context of both textual and non-textual elements into a shared embedding space.
At the information retrieval stage, the system must make sure that the retrieved data is relevant to the user’s query. Here, developers can use techniques that improve the alignment of queries with document stores. For example, an LLM can generate synthetic answers for the user’s query. The answers per se might not be accurate, but their embeddings can be used to retrieve documents that contain relevant information.
During the answer generation stage, the model must determine whether the retrieved information is sufficient to answer the question and find the right balance between the given context and its own internal knowledge. Specialized fine-tuning techniques can help the LLM learn to ignore irrelevant information retrieved from the knowledge base. Joint training of the retriever and response generator can also lead to more consistent performance.
Implicit fact queries
Implicit fact queries require the LLM to go beyond simply retrieving explicitly stated information and perform some level of reasoning or deduction to answer the question. “Queries at this level require gathering and processing information from multiple documents within the collection,” the researchers write.
For example, a user might ask “How many products did company X sell in the last quarter?” or “What are the main differences between the strategies of company X and company Y?” Answering these queries requires combining information from multiple sources within the knowledge base. This is sometimes referred to as “multi-hop question answering.”
Implicit fact queries introduce additional challenges, including the need for coordinating multiple context retrievals and effectively integrating reasoning and retrieval capabilities.
These queries require advanced RAG techniques. For example, techniques like Interleaving Retrieval with Chain-of-Thought (IRCoT) and Retrieval Augmented Thought (RAT) use chain-of-thought prompting to guide the retrieval process based on previously recalled information.
Another promising approach involves combining knowledge graphs with LLMs. Knowledge graphs represent information in a structured format, making it easier to perform complex reasoning and link different concepts. Graph RAG systems can turn the user’s query into a chain that contains information from different nodes from a graph database.
Interpretable rationale queries
Interpretable rationale queries require LLMs to not only understand factual content but also apply domain-specific rules. These rationales might not be present in the LLM’s pre-training data but they are also not hard to find in the knowledge corpus.
“Interpretable rationale queries represent a relatively straightforward category within applications that rely on external data to provide rationales,” the researchers write. “The auxiliary data for these types of queries often include clear explanations of the thought processes used to solve problems.”
For example, a customer service chatbot might need to integrate documented guidelines on handling returns or refunds with the context provided by a customer’s complaint.
One of the key challenges in handling these queries is effectively integrating the provided rationales into the LLM and ensuring that it can accurately follow them. Prompt tuning techniques, such as those that use reinforcement learning and reward models, can enhance the LLM’s ability to adhere to specific rationales.
LLMs can also be used to optimize their own prompts. For example, DeepMind’s OPRO technique uses multiple models to evaluate and optimize each other’s prompts.
Developers can also use the chain-of-thought reasoning capabilities of LLMs to handle complex rationales. However, manually designing chain-of-thought prompts for interpretable rationales can be time-consuming. Techniques such as Automate-CoT can help automate this process by using the LLM itself to create chain-of-thought examples from a small labeled dataset.
Hidden rationale queries
Hidden rationale queries present the most significant challenge. These queries involve domain-specific reasoning methods that are not explicitly stated in the data. The LLM must uncover these hidden rationales and apply them to answer the question.
For instance, the model might have access to historical data that implicitly contains the knowledge required to solve a problem. The model needs to analyze this data, extract relevant patterns, and apply them to the current situation. This could involve adapting existing solutions to a new coding problem or using documents on previous legal cases to make inferences about a new one.
“Navigating hidden rationale queries… demands sophisticated analytical techniques to decode and leverage the latent wisdom embedded within disparate data sources,” the researchers write.
The challenges of hidden rationale queries include retrieving information that is logically or thematically related to the query, even when it is not semantically similar. Also, the knowledge required to answer the query often needs to be consolidated from multiple sources.
Some methods use the in-context learning capabilities of LLMs to teach them how to select and extract relevant information from multiple sources and form logical rationales. Other approaches focus on generating logical rationale examples for few-shot and many-shot prompts.
However, addressing hidden rationale queries effectively often requires some form of fine-tuning, particularly in complex domains. This fine-tuning is usually domain-specific and involves training the LLM on examples that enable it to reason over the query and determine what kind of external information it needs.
Implications for building LLM applications
The survey and framework compiled by the Microsoft Research team show how far LLMs have come in using external data for practical applications. However, it is also a reminder that many challenges have yet to be addressed. Enterprises can use this framework to make more informed decisions about the best techniques for integrating external knowledge into their LLMs.
RAG techniques can go a long way to overcome many of the shortcomings of vanilla LLMs. However, developers must also be aware of the limitations of the techniques they use and know when to upgrade to more complex systems or avoid using LLMs.
The post Microsoft researchers propose framework for building data-augmented LLM applications appeared first on Venture Beat.