> ## Documentation Index
> Fetch the complete documentation index at: https://docs.openlayer.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Column drift

> Learn how to use the column drift test to detect drift in your data

## Definition

The column drift test allows you to select a dataset column, specify a drift detection method, and set a threshold for the
drift score.

Drift is measured by comparing the selected column on the **current dataset** with a **reference dataset**.

* In **development projects**, the training set is used as the reference and the validation set as the current
  dataset.
* In **monitoring projects**, the reference dataset is [uploaded by the user](/monitoring/uploading-reference-dataset)
  and the production data is the current dataset.

<Info>
  If you want Openlayer to automatically find the best drift detection method
  and threshold, you can use the [Feature
  drift](/tests/consistency/feature-drift-count), and [Label
  drift](/tests/consistency/label-drift) tests instead.
</Info>

## Taxonomy

* **Task types**: LLM, tabular classification, tabular regression, text classification.
* **Availability**: <Tooltip tip="Continuously evaluate your models and datasets as you iterate on their versions.">development</Tooltip>
  and <Tooltip tip="Monitor a model in production, measure its health, check for drifts and set up alerts.">monitoring</Tooltip>.

## Why it matters

* Measuring drift is crucial to maintain the relevance of your models. In development, it allows you to ensure that the data you use to validate your model is similar to the data you used to train it. In monitoring, it allows you to detect when the data your model is receiving is different from the data considered as reference.
* Over time, changes in the underlying data distribution can degrade the performance of your model. Measuring drift helps in identifying these changes early, enabling timely updates or retraining of the model to maintain its performance.

## Drift detection methods

One of the parameters that you must pass to the column drift test is the **drift detection method**. This is the method that will be used to compare the specified columns in the datasets and compute
a drift score, which is what you apply a threshold to.

<img width="700" style={{ borderRadius: "0.5rem" }} src="https://mintcdn.com/openlayer-44/xUBIrdsfSKEziWQ7/images/tests/consistency/drift_methods.png?fit=max&auto=format&n=xUBIrdsfSKEziWQ7&q=85&s=4d6c12bc0555051cbc2fa9c86562ff2f" alt="Drift methods" data-path="images/tests/consistency/drift_methods.png" />

Openlayer supports different drift detection methods, namely:

| Method                                                                                                                     | Application                                       | Score                                                                                                                                                                                        |
| -------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [Anderson-Darling](https://en.wikipedia.org/wiki/Anderson%E2%80%93Darling_test)                                            | Applies only to **numerical columns**.            | Returns a p-value. If p-value \< threshold, drift is detected. Recommended threshold: 0.05.                                                                                                  |
| [Characteristic Stability Index](https://mwburke.github.io/data%20science/2018/04/29/population-stability-index.html)      | Applies to **categorical and numerical columns**. | Returns the computed CSI value. If CSI >= threshold, drift is detected. Recommended threshold: 0.1.                                                                                          |
| [Chi-Square](https://en.wikipedia.org/wiki/Chi-squared_test)                                                               | Applies only to **categorical columns**.          | Returns a p-value. If p-value \< threshold, drift is detected. Recommended threshold: 0.05.                                                                                                  |
| [Cramer-Von-Mises](https://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93von_Mises_criterion)                                  | Applies only to **numerical columns**.            | Returns a p-value. If p-value \< threshold, drift is detected. Recommended threshold: 0.05.                                                                                                  |
| [Energy Distance](https://en.wikipedia.org/wiki/Energy_distance)                                                           | Applies only to **numerical columns**.            | Returns a distance. If distance >= threshold, drift is detected. Recommended threshold: 0.1.                                                                                                 |
| [Epps-Singleton](https://journals.sagepub.com/doi/pdf/10.1177/1536867X0900900307)                                          | Applies only to **numerical columns**.            | Returns a p-value. If p-value \< threshold, drift is detected. Recommended threshold: 0.05.                                                                                                  |
| [Fisher Exact Test](https://en.wikipedia.org/wiki/Fisher%27s_exact_test)                                                   | Applies only to **categorical columns**.          | Returns a p-value. If p-value \< threshold, drift is detected. Recommended threshold: 0.05.                                                                                                  |
| [G-test](https://en.wikipedia.org/wiki/G-test)                                                                             | Applies only to **categorical columns**.          | Returns a p-value. If p-value \< threshold, drift is detected. Recommended threshold: 0.05.                                                                                                  |
| [Hellinger Distance](https://en.wikipedia.org/wiki/Hellinger_distance)                                                     | Applies to **categorical and numerical columns**. | Returns a distance. If distance >= threshold, drift is detected. Recommended threshold: 0.1.                                                                                                 |
| [Jensen-Shannon Distance](https://en.wikipedia.org/wiki/Jensen%E2%80%93Shannon_divergence)                                 | Applies to **categorical and numerical columns**. | Returns a distance. If distance >= threshold, drift is detected. Recommended threshold: 0.1.                                                                                                 |
| [Kullback-Leibler Divergence](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence)                           | Applies to **categorical and numerical columns**. | Returns the divergence. If divergence >= threshold, drift is detected. Recommended threshold: 0.1.                                                                                           |
| [K-S Test](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test)                                                  | Applies only to **numerical columns**.            | Returns a p-value. If p-value \< threshold, drift is detected. Recommended threshold: 0.05.                                                                                                  |
| [Mann-Whitney U-Rank Test](https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test)                                      | Applies only to **numerical columns**.            | Returns a p-value. If p-value \< threshold, drift is detected. Recommended threshold: 0.05.                                                                                                  |
| [Population Stability Index](https://mwburke.github.io/data%20science/2018/04/29/population-stability-index.html)          | Applies to **categorical and numerical columns**. | Returns the computed PSI value. If PSI >= threshold, drift is detected. Recommended threshold: 0.1.                                                                                          |
| [Student's t-test](https://en.wikipedia.org/wiki/Student%27s_t-test)                                                       | Applies only to **numerical columns**.            | Returns a p-value. If p-value \< threshold, drift is detected. Recommended threshold: 0.05.                                                                                                  |
| [Text Content Drift](https://www.evidentlyai.com/blog/evidently-data-quality-monitoring-and-drift-detection-for-text-data) | Applies only to **text columns**.                 | Returns the ROC AUC of a binary classifier trained to distinguish text from the current and reference data. Drift is detected when the ROC AUC is high. Recommended threshold range: 0.5 - 1 |
| [Total Variation Distance](https://en.wikipedia.org/wiki/Total_variation_distance_of_probability_measures)                 | Applies only to **categorical columns**.          | Returns a p-value. If p-value \< threshold, drift is detected. Recommended threshold: 0.05.                                                                                                  |
| [Wasserstein Distance](https://en.wikipedia.org/wiki/Wasserstein_metric)                                                   | Applies only to **numerical columns**.            | Returns a distance. If distance >= threshold, drift is detected. Recommended threshold: 0.1.                                                                                                 |
| [Z-test](https://en.wikipedia.org/wiki/Z-test)                                                                             | Applies only to **categorical columns**.          | Returns a p-value. If p-value \< threshold, drift is detected. Recommended threshold: 0.05.                                                                                                  |

<Warning>
  Note that not all drift detection methods apply to all column types. For
  example, the "Kolmogorov-Smirnov (KS) test" is only available for numerical
  columns, the "Text content drift" method is only available for text columns,
  etc.

  If you select an invalid method for a column, the test will be skipped
  and you will see a message with the justification in the test report.
</Warning>

## Test configuration examples

If you are writing a `tests.json`, here are a few valid configurations for the character length test:

<CodeGroup>
  ```json Development theme={null}
  [
    {
      "name": "Feature `Age` not drifted - K-S test",
      "description": "Asserts that feature `Age` has not drifted, using the K-S test with a 0.05 p-value",
      "type": "consistency",
      "subtype": "columnDrift",
      "thresholds": [
        {
          "insightName": "columnDrift",
          "insightParameters": [
            { "name": "column_name", "value": "Age" }, // Selects the column name
            { "name": "test_type", "value": "K-S test" } // Selects the drift detection method. Check the table above for more methods
          ],
          "measurement": "driftScore",
          "operator": "<",
          "value": 0.05
        }
      ],
      "subpopulationFilters": null,
      "mode": "development",
      "usesValidationDataset": true,
      "usesTrainingDataset": true,
      "usesMlModel": false,
      "syncId": "b4dee7dc-4f15-48ca-a282-63e2c04e0689" // Some unique id
    }
  ]
  ```

  ```json Monitoring theme={null}
  [
    {
      "name": "Feature `Age` not drifted - K-S test",
      "description": "Asserts that feature `Age` has not drifted, using the K-S test with a 0.05 p-value",
      "type": "consistency",
      "subtype": "columnDrift",
      "thresholds": [
        {
          "insightName": "columnDrift",
          "insightParameters": [
            { "name": "column_name", "value": "Age" }, // Selects the column name
            { "name": "test_type", "value": "K-S test" } // Selects the drift detection method. Check the table above for more methods
          ],
          "measurement": "driftScore",
          "operator": "<",
          "value": 0.05
        }
      ],
      "subpopulationFilters": null,
      "mode": "monitoring",
      "usesProductionData": true,
      "evaluationWindow": 3600, // 1 hour
      "delayWindow": 0,
      "syncId": "b4dee7dc-4f15-48ca-a282-63e2c04e0689" // Some unique id
    }
  ]
  ```
</CodeGroup>

## Related

* [Feature drift test](/tests/consistency/feature-drift-count).
* [Label drift test](/tests/consistency/label-drift).
