LLMs hallucinate—generate incorrect, misleading, or nonsensical information. Some, like OpenAI CEO Sam Altman, consider AI hallucinations creativity, and others believe hallucinations might be helpful in making new scientific discoveries. However, they aren’t a feature but a bug in most cases where providing a correct response is important.
So, what’s the way to reduce LLM hallucinations? Long-context? RAG? Fine-tuning?
Well, long-context LLMs are not foolproof, vector search RAG is bad, and fine-tuning comes with its own challenges and limitations.
Here are some advanced techniques that you can use to reduce LLM hallucinations.
Using Advanced Prompts
There’s a lot of debate on whether using better or more advanced prompts can solve LLM hallucinations.
Source: X
While some believe that writing more detailed prompts doesn’t help the case, others like Google Brain co-founder Andrew Ng see a potential there.
Ng believes that the reasoning capability of GPT-4 and other advanced models makes them quite good at interpreting complex prompts with detailed instructions.
“With many-shot learning, developers can give dozens, even hundreds of examples in the prompt, and this works better than few-shot learning,” he wrote.
Source: X
Many new developments are also being done to make prompts better like Anthropic releasing a new ‘Prompt Generator’ tool that can turn simple descriptions into advanced prompts optimised for LLMs.
Recently, Marc Andreessen also said that with the right prompting, we can unlock the latent super genius in AI models. “Prompting crafts in many different domains such that you’re kind of unlocking the latent super genius,” he added.
CoVe by Meta AI
Chain-of-Verification (CoVe) by Meta AI is another technique. This method reduces hallucination in LLMs by breaking down fact-checking into manageable steps, enhancing response accuracy, and aligning with human-driven fact-checking processes.
CoVe involves generating an initial response, planning verification questions, answering these questions independently, and producing a final verified response. This method significantly improves the accuracy of the model by systematically verifying and correcting its own outputs.
It enhances performance across various tasks, such as list-based questions, closed-book QA, and long-form text generation, by reducing hallucinations and increasing factual correctness.
Knowledge Graphs
RAG is not limited to vector database matching anymore, there are many advanced RAG techniques being introduced that improve retrieval significantly.
For example, the integration of Knowledge Graphs (KGs) into RAG. By leveraging the structured and interlinked data from KGs, the reasoning capabilities of current RAG systems can be greatly enhanced.
Source: X
RAPTOR
Another technique is Raptor, a method to address questions that span multiple documents by creating a higher level of abstraction. It is particularly useful in answering queries that involve concepts from multiple documents.
Methods like Raptor go really well with long context LLMs because you can just embed full documents without any chunking.
This method reduces hallucinations by integrating external retrieval mechanisms with a transformer model. When a query is received, Raptor first retrieves relevant and verified information from external knowledge bases.
This retrieved data is then embedded into the model’s context alongside the original query. By grounding the model’s responses in factual and pertinent information, Raptor ensures that the generated content is accurate and contextually appropriate.
Mitigating LLM Hallucinations via Conformal Abstention
The paper ‘Mitigating LLM Hallucinations via Conformal Abstention’ introduces a method to reduce hallucinations in LLMs by employing conformal prediction techniques to determine when the model should abstain from providing a response.
By using self-consistency to evaluate response similarity and leveraging conformal prediction for rigorous guarantees, the method ensures that the model only responds when confident in its accuracy.
This approach effectively bounds the hallucination rate while maintaining a balanced abstention rate, particularly benefiting tasks requiring long-form answers. It significantly improves the reliability of model outputs by avoiding incorrect or nonsensical responses.
Reducing Hallucination in Structured Outputs via RAG
Recently, ServiceNow reduced hallucinations in structured outputs through RAG, enhancing LLM performance and enabling out-of-domain generalisation while minimising resource usage.
The technique involves a RAG system, which retrieves relevant JSON objects from external knowledge bases before generating text. This ensures the generation process is grounded in accurate and relevant data.
By incorporating this pre-retrieval step, the model is less likely to produce incorrect or fabricated information, thereby reducing hallucinations. Additionally, this approach allows the use of smaller models without compromising performance, making it efficient and effective.
All these methods and more can help prevent hallucinations and create more robust LLM systems.