Skip to main content

Definition

The session duration test monitors the wall-clock duration of a session — first-turn timestamp to last-turn timestamp. Unlike Session latency, which aggregates per-turn processing time, session duration captures the user-facing end-to-end experience, including idle gaps between turns. No LLM evaluator is involved — it’s a deterministic aggregation.

Taxonomy

  • Task types: LLM.
  • Availability: and .
  • Evaluation level: session.
  • Computation: deterministic aggregation.

Why it matters

  • Session duration is the cleanest proxy for “time spent by a user with the assistant”. Comparing duration against goal achievement shows whether users who spend more time are getting more value.
  • Sudden shifts in duration distribution (longer sessions, flatter distribution) are often an early warning for regressions in assistant quality.

Available measurements

Units follow the timestamp column — typically seconds if openlayer_prediction_timestamp is epoch seconds.
MeasurementWhat it means
meanSessionDurationMean of per-session wall-clock durations (last-turn − first-turn)
medianSessionDurationMedian of per-session wall-clock durations
meanTimeBetweenTurnsMean idle gap between consecutive turns, averaged across all sessions
Sessions with only one turn are excluded from duration calculations — at least two timestamps are needed for a delta.

Required columns

  • Session ID: Groups turns belonging to the same conversation.
  • Timestamp: Per-trace timestamp, used to compute duration as the delta between first and last turn.

Test configuration examples

[
  {
    "name": "Mean session duration below 10 min",
    "description": "Alert when average session wall-clock duration exceeds 10 minutes",
    "type": "performance",
    "subtype": "sessionDuration",
    "thresholds": [
      {
        "insightName": "sessionDuration",
        "measurement": "meanSessionDuration",
        "operator": "<=",
        "value": 600
      }
    ],
    "subpopulationFilters": null,
    "mode": "monitoring",
    "usesProductionData": true,
    "evaluationWindow": 3600,
    "delayWindow": 0
  }
]