Faithfulness

Definition
Taxonomy
Why it matters
Required columns
Test configuration examples
Related

Definition

The faithfulness test measures the factual consistency of the generated answer against the given context. This metric is based on the Ragas faithfulness metric.

Taxonomy

Task types: LLM.
Availability: and .

Why it matters

Faithfulness ensures that your LLM generates responses that are consistent with the provided context and doesn’t hallucinate information.
This metric helps identify when your model is making up facts or contradicting the given context.
It’s essential for RAG (Retrieval-Augmented Generation) systems where the model should stay grounded in the provided information.

Required columns

To compute this metric, your dataset must contain the following columns:

Outputs: The generated answer/response from your LLM
Context: The provided context or background information

This metric relies on an LLM evaluator judging your submission. On Openlayer, you can configure the underlying LLM used to compute it. Check out the OpenAI or Anthropic integration guides for details.

Test configuration examples

If you are writing a tests.json, here are a few valid configurations for the faithfulness test:

[
  {
    "name": "Faithfulness above 0.9",
    "description": "Ensure that generated responses are faithful to the provided context with a score above 0.9",
    "type": "performance",
    "subtype": "metricThreshold",
    "thresholds": [
      {
        "insightName": "metrics",
        "insightParameters": null,
        "measurement": "faithfulness",
        "operator": ">",
        "value": 0.9
      }
    ],
    "subpopulationFilters": null,
    "mode": "development",
    "usesValidationDataset": true,
    "usesTrainingDataset": false,
    "usesMlModel": false,
    "syncId": "b4dee7dc-4f15-48ca-a282-63e2c04e0689"
  }
]

Ragas integration - Learn more about Ragas metrics.
Context utilization test - Evaluate how well context is used.
Answer correctness test - Measure factual accuracy against ground truth.
Correctness test - Measure overall correctness of answers.
Aggregate metrics - Overview of all available metrics.

Correctness

Harmfulness

⌘I

Get started

Workspace setup

Governance

Observability

Data quality monitoring

Offline testing

Tests

Administration

Other resources

Definition

Taxonomy

Why it matters

Required columns

Test configuration examples

Get started

Workspace setup

Governance

Observability

Data quality monitoring

Offline testing

Tests

Administration

Other resources

​Definition

​Taxonomy

​Why it matters

​Required columns

​Test configuration examples

​Related

Definition

Taxonomy

Why it matters

Required columns

Test configuration examples

Related