How to Mitigate Extrinsic Hallucinations in Large Language Models

Introduction

Large language models (LLMs) are powerful tools, but they sometimes generate fabricated or nonsensical content—a phenomenon known as hallucination. While in-context hallucinations occur when the output contradicts provided context, extrinsic hallucination happens when the model invents information not supported by its training data or external world knowledge. This guide focuses on combating extrinsic hallucinations, ensuring LLM outputs are factual and that the model knows when to say "I don't know." Follow these steps to build or fine-tune LLMs that are both accurate and honest.

How to Mitigate Extrinsic Hallucinations in Large Language Models

What You Need

Access to an LLM (e.g., GPT-4, Llama 2, or your own model)
Pre-training dataset or proxy knowledge base
Ground truth verification tools (e.g., Wikipedia API, factual databases)
Evaluation framework (e.g., human evaluation or automatic metrics like FActScore)
Basic understanding of machine learning and NLP concepts

Step-by-Step Guide

Step 1: Define Extrinsic Hallucination and Its Impact

Before mitigation, clearly understand what extrinsic hallucination means. It occurs when the model generates statements that are not grounded in its pre-training data or widely accepted world knowledge. Unlike in-context errors, these fabrications cannot be fixed by simply providing better context. Define metrics to measure it—for example, the proportion of facts that cannot be verified by external sources. This step ensures your team has a shared understanding and can prioritize efforts.

Step 2: Equip the Model with Grounded Context

Even for extrinsic hallucination, providing relevant, high-quality context in the prompt can help. Use retrieval-augmented generation (RAG) techniques: fetch authoritative documents from a knowledge base and prepend them to the user query. This grounds the model's response in verified facts, reducing the chance it will invent information. For instance, if the model must answer a question about a historical event, supply a passage from a trusted encyclopedia. The model is then less likely to hallucinate because it has a factual anchor.

Step 3: Implement Post-Generation Verification

After the model generates an output, run a verification step. Break the output into atomic claims and check each against a knowledge base (e.g., Wikipedia, Google Knowledge Graph). Use tools like FActScore or build a custom classifier to flag unsupported claims. If a claim cannot be verified, either suppress it or replace it with a truthful statement. This step acts as a safety net, catching hallucinations that slip through the generation process. You can also use a second, smaller model to critique the output (a form of self-consistency or chain-of-thought validation).

Step 4: Train the Model to Express Uncertainty

A crucial technique is teaching the LLM to recognize when it lacks knowledge and to refuse to answer. Fine-tune the model on examples where the correct response is something like "I don't know" or "This information might not be up to date." Use reinforcement learning from human feedback (RLHF) to reward honest uncertainty over confident misinformation. For instance, if a question asks about a very recent event not in the training data, the model should output a disclaimer. This directly addresses extrinsic hallucination because the model stops inventing facts when it has no evidence.

Step 5: Evaluate and Iterate

Set up a consistent evaluation pipeline. Use a held-out test set of questions that are prone to hallucination (e.g., niche facts, recent events). Measure both factual accuracy and the model's ability to say "I don't know." Compare results before and after each mitigation step. Iterate on the training data, prompt design, and verification thresholds. Document false positives (correct facts flagged as hallucinated) to avoid over-censoring. Continuous improvement is key, as extrinsic hallucination patterns evolve with model updates.

Tips for Success

Start with a clear definition: Make sure everyone involved agrees on what constitutes extrinsic hallucination—separate from mere factual errors that could be fixed with better context.
Combine multiple approaches: No single method (RAG, verification, uncertainty training) is perfect. Use them together for best results.
Use diverse knowledge sources: A single knowledge base might be biased or incomplete. Incorporate multiple verifiable databases.
Monitor for over-refusal: If the model becomes too cautious, it may refuse to answer even when it knows the answer. Balance honesty with helpfulness.
Stay updated: Research in hallucination mitigation is rapidly advancing. Follow papers on Factuality in LLMs and Uncertainty Estimation.

By following these steps, you can significantly reduce extrinsic hallucinations, making your LLM more reliable and trustworthy. Remember that the goal is not perfection but consistent improvement—every small gain in factual accuracy reduces the risk of spreading misinformation.