The groundedness test evaluates whether every factual statement in the AI assistant’s response is grounded in provided context. This LLM-as-a-judge evaluation ensures that the model doesn’t hallucinate information and only makes claims that are supported by the given context.
To compute this metric, your dataset must contain the following columns:
Outputs: The generated response from your LLM
Context: The provided context or retrieved information that should ground the response
To use this test, you must select the underlying LLM used as the evaluator and
provide the required API credentials. You can check the
OpenAI and
Anthropic integration
guides for details.