> ## Documentation Index
> Fetch the complete documentation index at: https://docs.openlayer.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Hallucination

> Learn how to use the hallucination test

## Definition

The hallucination test measures the extent to which the generated answer contains
information that is not supported by or contradicts the given context. This metric is
essentially the complement of faithfulness, identifying when your LLM generates unsupported or fabricated information.

## Taxonomy

* **Task types**: LLM.
* **Availability**: <Tooltip tip="Continuously evaluate your models and datasets as you iterate on their versions.">development</Tooltip>
  and <Tooltip tip="Monitor a model in production, measure its health, check for drifts and set up alerts.">monitoring</Tooltip>.

## Why it matters

* Hallucination detection is critical for maintaining trust and accuracy in AI-generated responses, especially in high-stakes applications.
* This metric helps identify when your model is making up facts, providing unsupported claims, or contradicting the given context.
* It's essential for RAG (Retrieval-Augmented Generation) systems where responses should be strictly grounded in the provided information.
* Lower hallucination scores indicate better adherence to factual accuracy and context consistency.

## Required columns

To compute this metric, your dataset must contain the following columns:

* **Outputs**: The generated answer/response from your LLM
* **Context**: The provided context or background information

<Note>
  This metric relies on an LLM evaluator judging your submission. On Openlayer,
  you can configure the underlying LLM used to compute it. Check out the
  [OpenAI](/integrations/openai#openai-llm-evaluator) or
  [Anthropic](/integrations/anthropic#anthropic-llm-evaluator) integration
  guides for details.
</Note>

## Test configuration examples

If you are writing a `tests.json`, here are a few valid configurations for the hallucination test:

<CodeGroup>
  ```json Development theme={null}
  [
    {
      "name": "Hallucination below 0.1",
      "description": "Ensure that generated responses have minimal hallucination with a score below 0.1",
      "type": "performance",
      "subtype": "metricThreshold",
      "thresholds": [
        {
          "insightName": "metrics",
          "insightParameters": null,
          "measurement": "hallucination",
          "operator": "<",
          "value": 0.1
        }
      ],
      "subpopulationFilters": null,
      "mode": "development",
      "usesValidationDataset": true,
      "usesTrainingDataset": false,
      "usesMlModel": false,
      "syncId": "b4dee7dc-4f15-48ca-a282-63e2c04e0689"
    }
  ]
  ```

  ```json Monitoring theme={null}
  [
    {
      "name": "Hallucination below 0.1",
      "description": "Ensure that generated responses have minimal hallucination with a score below 0.1",
      "type": "performance",
      "subtype": "metricThreshold",
      "thresholds": [
        {
          "insightName": "metrics",
          "insightParameters": null,
          "measurement": "hallucination",
          "operator": "<",
          "value": 0.1
        }
      ],
      "subpopulationFilters": null,
      "mode": "monitoring",
      "usesProductionData": true,
      "evaluationWindow": 3600,
      "delayWindow": 0,
      "syncId": "b4dee7dc-4f15-48ca-a282-63e2c04e0689"
    }
  ]
  ```
</CodeGroup>

## Related

* [Faithfulness test](/tests/catalog/faithfulness) - Measure factual consistency with context (complement of hallucination).
* [Groundedness test](/tests/catalog/groundedness) - Ensure responses are grounded in provided context.
* [Context utilization test](/tests/catalog/context-utilization) - Evaluate how well context is used.
* [Answer correctness test](/tests/catalog/answer-correctness) - Measure factual accuracy against ground truth.
* [Aggregate metrics](/tests/performance/aggregate-metrics) - Overview of all available metrics.
