Definition

The context recall test measures the ability of the retriever to retrieve all necessary context for the question. This metric is based on the Ragas context recall metric.

Taxonomy

  • Task types: LLM.
  • Availability: and .

Why it matters

  • Context recall ensures that your retrieval system captures all the relevant information needed to answer a question properly.
  • This metric helps identify when your retrieval mechanism is missing important context that should be available to the LLM.
  • It’s crucial for RAG (Retrieval-Augmented Generation) systems where the quality of retrieved context directly impacts answer quality.

Required columns

To compute this metric, your dataset must contain the following columns:
  • Ground truth: The reference/correct answer
  • Context: The retrieved context or background information
This metric relies on an LLM evaluator judging your submission. On Openlayer, you can configure the underlying LLM used to compute it. Check out the OpenAI or Anthropic integration guides for details.

Test configuration examples

If you are writing a tests.json, here are a few valid configurations for the context recall test:
[
  {
    "name": "Context recall above 0.8",
    "description": "Ensure that the retrieval system captures all necessary context with a score above 0.8",
    "type": "performance",
    "subtype": "metricThreshold",
    "thresholds": [
      {
        "insightName": "metrics",
        "insightParameters": null,
        "measurement": "contextRecall",
        "operator": ">",
        "value": 0.8
      }
    ],
    "subpopulationFilters": null,
    "mode": "development",
    "usesValidationDataset": true,
    "usesTrainingDataset": false,
    "usesMlModel": false,
    "syncId": "b4dee7dc-4f15-48ca-a282-63e2c04e0689"
  }
]