> ## Documentation Index
> Fetch the complete documentation index at: https://docs.openlayer.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Column statistics

> Learn how to use the column statistics test to validate statistical properties of your data columns

## Definition

The column statistics test allows you to set thresholds on statistical measures of
individual columns in your dataset. You can select any column and specify a
statistic (such as mean, median, variance, etc.), then define acceptable ranges
or values for that statistic.

This test computes the specified statistical measure for the chosen column and
compares it against your defined threshold.

## Taxonomy

* **Task types**: LLM, tabular classification, tabular regression.
* **Availability**: <Tooltip tip="Continuously evaluate your models and datasets as you iterate on their versions.">development</Tooltip>
  and <Tooltip tip="Monitor a model in production, measure its health, check for drifts and set up alerts.">monitoring</Tooltip>.

## Why it matters

* Column statistics tests help ensure that your data maintains expected statistical
  properties over time.
* They can detect data quality issues, distribution shifts, or unusual patterns in
  individual features.
* These tests are essential for monitoring data consistency and ensuring that model
  inputs remain within expected ranges.
* Statistical validation helps identify potential data pipeline issues or changes in
  data collection processes.

## Available statistics

The following statistical measures are supported:

| Statistic  | Description                      | Typical Use Cases                                       |
| ---------- | -------------------------------- | ------------------------------------------------------- |
| `mean`     | Average value of the column      | Monitor if average values stay within expected ranges   |
| `median`   | Middle value when data is sorted | Detect shifts in central tendency, robust to outliers   |
| `min`      | Minimum value in the column      | Ensure no values fall below acceptable minimums         |
| `max`      | Maximum value in the column      | Detect outliers or values exceeding acceptable maximums |
| `std`      | Standard deviation of the column | Monitor data variability and spread                     |
| `sum`      | Sum of all values in the column  | Useful for totals, counts, or aggregate validations     |
| `count`    | Number of non-null values        | Monitor data completeness                               |
| `variance` | Variance of the column values    | Alternative measure of data spread                      |

## Test configuration examples

If you are writing a `tests.json`, here are a few valid configurations for the column statistics test:

<CodeGroup>
  ```json Development theme={null}
  [
    {
      "name": "Average age within expected range",
      "description": "Ensures the average age in the dataset is greater than 25",
      "type": "integrity",
      "subtype": "columnStatistic",
      "thresholds": [
        {
          "insightName": "columnStatistic",
          "insightParameters": [
            { "name": "column_name", "value": "age" }, // Select the column
            { "name": "statistic", "value": "mean" }   // Select the statistic
          ],
          "measurement": "columnStatistic",
          "operator": ">",
          "value": 25
        }
      ],
      "subpopulationFilters": null,
      "mode": "development",
      "usesValidationDataset": true, // Apply test to the validation set
      "usesTrainingDataset": false,
      "usesMlModel": false,
      "syncId": "b4dee7dc-4f15-48ca-a282-63e2c04e0689" // Some unique id
    },
    {
      "name": "Income variance stability check",
      "description": "Ensures income variance doesn't exceed threshold, indicating stable distribution",
      "type": "integrity",
      "subtype": "columnStatistic",
      "thresholds": [
        {
          "insightName": "columnStatistic",
          "insightParameters": [
            { "name": "column_name", "value": "income" },
            { "name": "statistic", "value": "variance" }
          ],
          "measurement": "columnStatistic",
          "operator": "<=",
          "value": 1000000 // Maximum acceptable variance
        }
      ],
      "subpopulationFilters": null,
      "mode": "development",
      "usesValidationDataset": true,
      "usesTrainingDataset": false,
      "usesMlModel": false,
      "syncId": "96622fba-ea00-4e42-8f42-5e8f5f60805f" // Some unique id
    }
  ]
  ```

  ```json Monitoring theme={null}
  [
    {
      "name": "Transaction amount median monitoring",
      "description": "Monitors median transaction amount to detect unusual patterns",
      "type": "integrity",
      "subtype": "columnStatistic",
      "thresholds": [
        {
          "insightName": "columnStatistic",
          "insightParameters": [
            { "name": "column_name", "value": "transaction_amount" },
            { "name": "statistic", "value": "median" }
          ],
          "measurement": "columnStatistic",
          "operator": ">=",
          "value": 10.0
        },
        {
          "insightName": "columnStatistic",
          "insightParameters": [
            { "name": "column_name", "value": "transaction_amount" },
            { "name": "statistic", "value": "median" }
          ],
          "measurement": "columnStatistic",
          "operator": "<=",
          "value": 500.0
        }
      ],
      "subpopulationFilters": null,
      "mode": "monitoring",
      "usesProductionData": true,
      "evaluationWindow": 3600, // 1 hour
      "delayWindow": 0,
      "syncId": "b4dee7dc-4f15-48ca-a282-63e2c04e0689" // Some unique id
    },
    {
      "name": "Data completeness check",
      "description": "Ensures sufficient non-null values in critical columns",
      "type": "integrity",
      "subtype": "columnStatistic",
      "thresholds": [
        {
          "insightName": "columnStatistic",
          "insightParameters": [
            { "name": "column_name", "value": "customer_id" },
            { "name": "statistic", "value": "count" }
          ],
          "measurement": "columnStatistic",
          "operator": ">=",
          "value": 1000 // Minimum expected records per evaluation window
        }
      ],
      "subpopulationFilters": null,
      "mode": "monitoring",
      "usesProductionData": true,
      "evaluationWindow": 3600, // 1 hour
      "delayWindow": 0,
      "syncId": "96622fba-ea00-4e42-8f42-5e8f5f60805f" // Some unique id
    }
  ]
  ```
</CodeGroup>
