Session cost - Openlayer

Definition

The session cost test monitors LLM cost aggregated to the session level. For each session, Openlayer sums the per-trace cost column across all turns in the session, then exposes window-level aggregates you can threshold against. No LLM evaluator is involved — it’s a deterministic aggregation.

Taxonomy

Task types: LLM.
Availability: and .
Evaluation level: session.
Computation: deterministic aggregation.

Why it matters

Cost per trace is useful, but cost per session is what maps to a user’s real experience — users don’t see individual LLM calls, they see conversations.
Mean/median session cost tracks baseline spend; totalCost catches the bill for the full window.
Pair with Session record count to distinguish “long session, predictable cost” from “short session, expensive turn”.

Available measurements

Measurement	What it means
`totalCost`	Sum of cost across all traces in the window (global, not per-session)
`meanCostPerSession`	Mean of per-session cost sums across sessions in the window
`medianCostPerSession`	Median of per-session cost sums across sessions in the window

Required columns

Session ID: Groups turns belonging to the same conversation.
Cost: Per-trace cost (usually populated automatically by the Openlayer client or via OpenTelemetry).

Test configuration examples

[
  {
    "name": "Mean session cost below $0.50",
    "description": "Alert when average session cost exceeds $0.50",
    "type": "performance",
    "subtype": "sessionCost",
    "thresholds": [
      {
        "insightName": "sessionCost",
        "measurement": "meanCostPerSession",
        "operator": "<=",
        "value": 0.5
      }
    ],
    "subpopulationFilters": null,
    "mode": "monitoring",
    "usesProductionData": true,
    "evaluationWindow": 3600,
    "delayWindow": 0
  }
]

Session token count — token-based view of the same underlying usage.
Session record count — how many turns contribute to the session cost.
Mean cost, Max cost, Total cost — trace-level cost metrics.

Session conversation completeness

Session duration

⌘I

​Definition

​Taxonomy

​Why it matters

​Available measurements

​Required columns

​Test configuration examples

​Related