Definition
The session duration test monitors the wall-clock duration of a session — first-turn timestamp to last-turn timestamp. Unlike Session latency, which aggregates per-turn processing time, session duration captures the user-facing end-to-end experience, including idle gaps between turns. No LLM evaluator is involved — it’s a deterministic aggregation.Taxonomy
- Task types: LLM.
- Availability: and .
- Evaluation level: session.
- Computation: deterministic aggregation.
Why it matters
- Session duration is the cleanest proxy for “time spent by a user with the assistant”. Comparing duration against goal achievement shows whether users who spend more time are getting more value.
- Sudden shifts in duration distribution (longer sessions, flatter distribution) are often an early warning for regressions in assistant quality.
Available measurements
Units follow the timestamp column — typically seconds ifopenlayer_prediction_timestamp
is epoch seconds.
| Measurement | What it means |
|---|---|
meanSessionDuration | Mean of per-session wall-clock durations (last-turn − first-turn) |
medianSessionDuration | Median of per-session wall-clock durations |
meanTimeBetweenTurns | Mean idle gap between consecutive turns, averaged across all sessions |
Sessions with only one turn are excluded from duration calculations — at least
two timestamps are needed for a delta.
Required columns
- Session ID: Groups turns belonging to the same conversation.
- Timestamp: Per-trace timestamp, used to compute duration as the delta between first and last turn.
Test configuration examples
Related
- Session latency — per-turn processing time, different signal.
- Session record count — number of turns, complementary to duration.

