Correctness

Definition
Taxonomy
Why it matters
Required columns
Test configuration examples
Related

Definition

The correctness test evaluates the overall correctness of the generated answer. This metric is based on the Ragas aspect critique for correctness.

Taxonomy

Task types: LLM.
Availability: and .

Why it matters

Correctness ensures that your LLM generates responses that are accurate and free from errors.
This metric helps identify when your model produces incorrect information, logical fallacies, or misleading content.
It’s fundamental for applications where accuracy is critical, such as educational tools, fact-checking systems, or professional assistance applications.

Required columns

To compute this metric, your dataset must contain the following columns:

Input: The question or prompt given to the LLM
Outputs: The generated answer/response from your LLM

This metric relies on an LLM evaluator judging your submission. On Openlayer, you can configure the underlying LLM used to compute it. Check out the OpenAI or Anthropic integration guides for details.

Test configuration examples

If you are writing a tests.json, here are a few valid configurations for the correctness test:

[
  {
    "name": "Correctness above 0.8",
    "description": "Ensure that generated responses are correct with a score above 0.8",
    "type": "performance",
    "subtype": "metricThreshold",
    "thresholds": [
      {
        "insightName": "metrics",
        "insightParameters": null,
        "measurement": "correctness",
        "operator": ">",
        "value": 0.8
      }
    ],
    "subpopulationFilters": null,
    "mode": "development",
    "usesValidationDataset": true,
    "usesTrainingDataset": false,
    "usesMlModel": false,
    "syncId": "b4dee7dc-4f15-48ca-a282-63e2c04e0689"
  }
]

Ragas integration - Learn more about Ragas metrics.
Answer correctness test - Measure factual accuracy against ground truth.
Coherence test - Evaluate logical consistency of responses.
Faithfulness test - Evaluate consistency with provided context.
Aggregate metrics - Overview of all available metrics.

Context utilization

Faithfulness

⌘I

Get started

Workspace setup

Governance

Observability

Data quality monitoring

Offline testing

Tests

Administration

Other resources

Definition

Taxonomy

Why it matters

Required columns

Test configuration examples

Get started

Workspace setup

Governance

Observability

Data quality monitoring

Offline testing

Tests

Administration

Other resources

​Definition

​Taxonomy

​Why it matters

​Required columns

​Test configuration examples

​Related

Definition

Taxonomy

Why it matters

Required columns

Test configuration examples

Related