Skip to main content

Definition

The session task progression test evaluates whether a conversation makes steady progress toward the user’s inferred task. An LLM-as-a-judge reads the full session and scores it against four criteria:
  • Each turn logically advances the task
  • Steps are ordered sensibly (prerequisites before follow-ups)
  • Intermediate milestones are acknowledged before moving on
  • The conversation doesn’t loop, repeat itself, or backtrack unnecessarily

Taxonomy

  • Task types: LLM.
  • Availability: and .
  • Evaluation level: session.
  • Polarity: higher score = better. 0 = no progress, 1 = excellent steady progress.

Why it matters

  • A session can end with the goal achieved and still have been inefficient — looping through blind alleys before finding the answer.
  • Tracking progression complements Session goal achievement: together they tell you both whether the task was completed and whether the path to completion was clean.

Required columns

  • Input: The user’s message in each turn.
  • Output: The assistant’s response in each turn.
  • Session ID: Groups turns belonging to the same conversation.
  • Timestamp: Used to reconstruct turn order within a session.
This metric relies on an LLM evaluator. On Openlayer you can configure the underlying LLM used to compute it. Check out the OpenAI or Anthropic integration guides for details.

Test configuration examples

[
  {
    "name": "Session task progression above 0.7",
    "description": "Ensure sessions make steady progress toward the user's task",
    "type": "performance",
    "subtype": "sessionTaskProgression",
    "thresholds": [
      {
        "insightName": "sessionTaskProgression",
        "measurement": "meanScore",
        "operator": ">=",
        "value": 0.7
      }
    ],
    "subpopulationFilters": null,
    "mode": "monitoring",
    "usesProductionData": true,
    "evaluationWindow": 3600,
    "delayWindow": 0
  }
]