Definition
The session goal achievement test evaluates whether the user’s goal was met by the end of a conversation. An LLM-as-a-judge infers the user’s goal from their messages and scores the full session against four criteria:- The user’s intent is correctly identified
- The goal is fully resolved (not just partially addressed)
- The session reaches a satisfying conclusion
- User-side signals indicate satisfaction (e.g., acknowledgment, no repeat asks)
Taxonomy
- Task types: LLM.
- Availability: and .
- Evaluation level: session.
- Polarity: higher score = better.
0= goal not achieved at all,1= goal fully achieved.
Why it matters
- Goal achievement is the clearest direct product-quality signal for agentic assistants.
- Tracking it at the session level captures outcomes that per-turn evaluations miss.
Required columns
- Input: The user’s message in each turn.
- Output: The assistant’s response in each turn.
- Session ID: Groups turns belonging to the same conversation.
- Timestamp: Used to reconstruct turn order within a session.
Test configuration examples
Related
- Session task progression — whether the session was making steady progress along the way.
- Session conversation completeness — tracks whether the dialogue reached a clean end.

