Definition

The coherence test evaluates the logical consistency and flow of the generated answer. This metric is based on the Ragas aspect critique for coherence.

Taxonomy

  • Task types: LLM.
  • Availability: and .

Why it matters

  • Coherence ensures that your LLM generates responses that are logically structured and easy to follow.
  • This metric helps identify when your model produces disjointed, contradictory, or confusing responses.
  • It’s essential for applications where clear communication is important, such as educational content, customer support, or documentation generation.

Required columns

To compute this metric, your dataset must contain the following columns:
  • Input: The question or prompt given to the LLM
  • Outputs: The generated answer/response from your LLM
This metric relies on an LLM evaluator judging your submission. On Openlayer, you can configure the underlying LLM used to compute it. Check out the OpenAI or Anthropic integration guides for details.

Test configuration examples

If you are writing a tests.json, here are a few valid configurations for the coherence test:
[
  {
    "name": "Coherence above 0.7",
    "description": "Ensure that generated responses are logically coherent with a score above 0.7",
    "type": "performance",
    "subtype": "metricThreshold",
    "thresholds": [
      {
        "insightName": "metrics",
        "insightParameters": null,
        "measurement": "coherence",
        "operator": ">",
        "value": 0.7
      }
    ],
    "subpopulationFilters": null,
    "mode": "development",
    "usesValidationDataset": true,
    "usesTrainingDataset": false,
    "usesMlModel": false,
    "syncId": "b4dee7dc-4f15-48ca-a282-63e2c04e0689"
  }
]