Definition

The answer relevancy test measures how relevant the answer (output) is given the question. This metric is based on the Ragas response relevancy metric.

Taxonomy

  • Task types: LLM.
  • Availability: and .

Why it matters

  • Answer relevancy ensures that your LLM generates responses that are directly related to the input question or prompt.
  • This metric helps identify when your model is providing off-topic or tangential responses that don’t address the user’s actual query.
  • It’s particularly important for chatbots, Q&A systems, and any application where staying on-topic is crucial for user experience.

Required columns

To compute this metric, your dataset must contain the following columns:
  • Input: The question or prompt given to the LLM
  • Outputs: The generated answer/response from your LLM
This metric relies on an LLM evaluator judging your submission. On Openlayer, you can configure the underlying LLM used to compute it. Check out the OpenAI or Anthropic integration guides for details.

Test configuration examples

If you are writing a tests.json, here are a few valid configurations for the answer relevancy test:
[
  {
    "name": "Answer relevancy above 0.8",
    "description": "Ensure that generated responses are relevant to the input questions with a score above 0.8",
    "type": "performance",
    "subtype": "metricThreshold",
    "thresholds": [
      {
        "insightName": "metrics",
        "insightParameters": null,
        "measurement": "answerRelevancy",
        "operator": ">",
        "value": 0.8
      }
    ],
    "subpopulationFilters": null,
    "mode": "development",
    "usesValidationDataset": true,
    "usesTrainingDataset": false,
    "usesMlModel": false,
    "syncId": "b4dee7dc-4f15-48ca-a282-63e2c04e0689"
  }
]