Groundedness

Definition
Taxonomy
Why it matters
Required columns
Evaluation criteria
Scoring guidelines
Examples of violations
Examples of acceptable responses
Related

Definition

The groundedness test evaluates whether every factual statement in the AI assistant’s response is grounded in provided context. This LLM-as-a-judge evaluation ensures that the model doesn’t hallucinate information and only makes claims that are supported by the given context.

Taxonomy

Task types: LLM.
Availability: and .

Why it matters

Groundedness is crucial for RAG (Retrieval-Augmented Generation) systems where responses must be based on retrieved information.
This metric helps prevent hallucination by ensuring that all factual claims are supported by the provided context.
It’s essential for applications where accuracy and trustworthiness are paramount, such as customer support, medical information, or legal assistance.
Helps maintain user trust by ensuring the AI doesn’t make up information that sounds plausible but is unsupported.

Required columns

To compute this metric, your dataset must contain the following columns:

Outputs: The generated response from your LLM
Context: The provided context or retrieved information that should ground the response

To use this test, you must select the underlying LLM used as the evaluator and provide the required API credentials. You can check the OpenAI and Anthropic integration guides for details.

Evaluation criteria

The LLM evaluator assesses responses based on:

Factual Statement Verification: Does every factual statement have a clear basis in the provided context?
Information Source Alignment: Are all specific details, numbers, dates, names, and facts directly supported by the retrieved information?
Hallucination Detection: Does the response contain information that appears to be made up or not present in the context?

Scoring guidelines

Score 1 (Grounded): All factual statements are clearly supported by the provided context
Score 0 (Not Grounded): Contains factual statements that lack clear support in the provided context

Examples of violations

Making specific claims about dates, numbers, or facts not mentioned in the context
Stating opinions as facts without contextual support
Providing specific details about people, places, or events not referenced in the context

Examples of acceptable responses

“Based on the provided information, [specific fact from context]”
“The context shows that [directly supported claim]”
“According to the retrieved information, [factual statement from context]“

LLM-as-a-judge test - Learn about custom LLM evaluation criteria.
Faithfulness test - Measure factual consistency with context using Ragas.
Context utilization test - Evaluate how well context is used.
Toxicity test - Detect harmful content in responses.

Toxicity

Recommends competitor

⌘I

Get started

Workspace setup

Observability

Data quality monitoring

Offline testing

Tests

Administration

Other resources

Definition

Taxonomy

Why it matters

Required columns

Evaluation criteria

Scoring guidelines

Examples of violations

Examples of acceptable responses

Get started

Workspace setup

Observability

Data quality monitoring

Offline testing

Tests

Administration

Other resources

​Definition

​Taxonomy

​Why it matters

​Required columns

​Evaluation criteria

​Scoring guidelines

​Examples of violations

​Examples of acceptable responses

​Related

Definition

Taxonomy

Why it matters

Required columns

Evaluation criteria

Scoring guidelines

Examples of violations

Examples of acceptable responses

Related