Definition
The session coherence test evaluates the logical flow and consistency of a multi-turn conversation. An LLM-as-a-judge reads the full conversation and scores it against four criteria:- Responses logically follow from the user’s messages
- The overall trajectory of the conversation is easy to follow
- Individual responses are well-structured and internally consistent
- Transitions between topics feel smooth
Taxonomy
- Task types: LLM.
- Availability: and .
- Evaluation level: session.
- Polarity: higher score = better.
0= completely incoherent,1= perfectly coherent.
Why it matters
- Coherence is a distinct quality from correctness: an assistant can give factually correct answers that still feel disjointed or contradictory across turns.
- Low coherence is a strong leading indicator of user dissatisfaction even when task outcomes look fine.
Required columns
- Input: The user’s message in each turn.
- Output: The assistant’s response in each turn.
- Session ID: Groups turns belonging to the same conversation.
- Timestamp: Used to reconstruct turn order within a session.
Test configuration examples
Related
- Coherence — trace-level counterpart.
- Session context retention — related but stricter signal about tracking prior facts.

