Definition
The session task progression test evaluates whether a conversation makes steady progress toward the user’s inferred task. An LLM-as-a-judge reads the full session and scores it against four criteria:- Each turn logically advances the task
- Steps are ordered sensibly (prerequisites before follow-ups)
- Intermediate milestones are acknowledged before moving on
- The conversation doesn’t loop, repeat itself, or backtrack unnecessarily
Taxonomy
- Task types: LLM.
- Availability: and .
- Evaluation level: session.
- Polarity: higher score = better.
0= no progress,1= excellent steady progress.
Why it matters
- A session can end with the goal achieved and still have been inefficient — looping through blind alleys before finding the answer.
- Tracking progression complements Session goal achievement: together they tell you both whether the task was completed and whether the path to completion was clean.
Required columns
- Input: The user’s message in each turn.
- Output: The assistant’s response in each turn.
- Session ID: Groups turns belonging to the same conversation.
- Timestamp: Used to reconstruct turn order within a session.
Test configuration examples
Related
- Session goal achievement — end-state view of the same concern.

