Answer correctness

Definition
Taxonomy
Why it matters
Required columns
Test configuration examples
Related

Definition

The answer correctness test compares and evaluates the factual accuracy of the generated response with respect to the reference ground truth. This metric is based on the Ragas factual correctness metric.

Taxonomy

Task types: LLM.
Availability: and .

Why it matters

Answer correctness ensures that your LLM generates factually accurate responses when compared to known ground truth answers.
This metric is crucial for applications where factual accuracy is paramount, such as question-answering systems, educational tools, or information retrieval systems.
It helps identify when your model is generating plausible-sounding but incorrect information.

Required columns

To compute this metric, your dataset must contain the following columns:

Outputs: The generated answer/response from your LLM
Ground truths: The reference/correct answer to compare against

This metric relies on an LLM evaluator judging your submission. On Openlayer, you can configure the underlying LLM used to compute it. Check out the OpenAI or Anthropic integration guides for details.

Test configuration examples

If you are writing a tests.json, here are a few valid configurations for the answer correctness test:

[
  {
    "name": "Answer correctness above 0.8",
    "description": "Ensure that the factual accuracy of generated responses is above 0.8",
    "type": "performance",
    "subtype": "metricThreshold",
    "thresholds": [
      {
        "insightName": "metrics",
        "insightParameters": null,
        "measurement": "answerCorrectness",
        "operator": ">",
        "value": 0.8
      }
    ],
    "subpopulationFilters": null,
    "mode": "development",
    "usesValidationDataset": true,
    "usesTrainingDataset": false,
    "usesMlModel": false,
    "syncId": "b4dee7dc-4f15-48ca-a282-63e2c04e0689"
  }
]

Ragas integration - Learn more about Ragas metrics.
Answer relevancy test - Measure how relevant answers are to questions.
Aggregate metrics - Overview of all available metrics.

Character length

Answer relevancy

⌘I

Get started

Workspace setup

Governance

Observability

Data quality monitoring

Offline testing

Tests

Administration

Other resources

Answer correctness

Definition

Taxonomy

Why it matters

Required columns

Test configuration examples

Get started

Workspace setup

Governance

Observability

Data quality monitoring

Offline testing

Tests

Administration

Other resources

​Definition

​Taxonomy

​Why it matters

​Required columns

​Test configuration examples

​Related

Definition

Taxonomy

Why it matters

Required columns

Test configuration examples

Related