Session task progression

Definition

The session task progression test evaluates whether a conversation makes steady progress toward the user’s inferred task. An LLM-as-a-judge reads the full session and scores it against four criteria:

Each turn logically advances the task
Steps are ordered sensibly (prerequisites before follow-ups)
Intermediate milestones are acknowledged before moving on
The conversation doesn’t loop, repeat itself, or backtrack unnecessarily

Taxonomy

Task types: LLM.
Availability: and .
Evaluation level: session.
Polarity: higher score = better. 0 = no progress, 1 = excellent steady progress.

Why it matters

A session can end with the goal achieved and still have been inefficient — looping through blind alleys before finding the answer.
Tracking progression complements Session goal achievement: together they tell you both whether the task was completed and whether the path to completion was clean.

Required columns

Input: The user’s message in each turn.
Output: The assistant’s response in each turn.
Session ID: Groups turns belonging to the same conversation.
Timestamp: Used to reconstruct turn order within a session.

This metric relies on an LLM evaluator. On Openlayer you can configure the underlying LLM used to compute it. Check out the OpenAI or Anthropic integration guides for details.

Test configuration examples

[
  {
    "name": "Session task progression above 0.7",
    "description": "Ensure sessions make steady progress toward the user's task",
    "type": "performance",
    "subtype": "sessionTaskProgression",
    "thresholds": [
      {
        "insightName": "sessionTaskProgression",
        "measurement": "meanScore",
        "operator": ">=",
        "value": 0.7
      }
    ],
    "subpopulationFilters": null,
    "mode": "monitoring",
    "usesProductionData": true,
    "evaluationWindow": 3600,
    "delayWindow": 0
  }
]

Session goal achievement — end-state view of the same concern.

Session role adherence

Session token count

⌘I

​Definition

​Taxonomy

​Why it matters

​Required columns

​Test configuration examples

​Related

Definition

Taxonomy

Why it matters

Required columns

Test configuration examples

Related