> ## Documentation Index
> Fetch the complete documentation index at: https://docs.openlayer.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Upload a reference dataset

> Learn how to upload a reference dataset for data drift monitoring on Openlayer

A **reference dataset** is a representative sample of the data your model was
trained on (or any dataset you want to use as a baseline).

Openlayer uses this dataset for tests that monitor **data drift** — by comparing the
distribution of your **live data** against the reference distribution.

## How to upload a reference dataset

You can upload a reference dataset to your inference pipeline with
the [Python SDK](/api-reference/sdk/libraries/python).

<Card title="See full Python example" icon="python" iconType="duotone" href="https://github.com/openlayer-ai/openlayer-python/blob/main/examples/monitoring/upload_reference_dataset.py" />

<Steps>
  <Step title="Load your dataset into a DataFrame">
    Your dataset should be in a format Openlayer can understand. Here’s a minimal example with a single row:

    ```python Python theme={null}
    import pandas as pd

    df = pd.DataFrame(
        {
            "CreditScore": [600],
            "Geography": ["France"],
            "Gender": ["Male"],
            "Age": [40],
            "Tenure": [5],
            "Balance": [100000],
            "NumOfProducts": [1],
            "HasCrCard": [1],
            "IsActiveMember": [1],
            "EstimatedSalary": [50000],
            "AggregateRate": [0.5],
            "Year": [2020],
            "Exited": [0],
        }
    )

    ```
  </Step>

  <Step title="Define the dataset configuration">
    The dataset config is a dictionary containing
    information that helps Openlayer understand your data.

    For example, the dataset above is from a tabular classification task, so our dataset config
    will have information such as the feature names, class names,
    and others:

    ```python Python theme={null}
    from openlayer.types.inference_pipelines import data_stream_params

    # You can replace with `ConfigTabularRegressionData`, `ConfigTextClassificationData`
    # or `ConfigTabularLlmData`, according to your task type
    config = data_stream_params.ConfigTabularClassificationData(
        categorical_feature_names=["Gender", "Geography"],
        class_names=["Retained", "Exited"],
        feature_names=[
            "CreditScore",
            "Geography",
            "Gender",
            "Age",
            "Tenure",
            "Balance",
            "NumOfProducts",
            "HasCrCard",
            "IsActiveMember",
            "EstimatedSalary",
            "AggregateRate",
            "Year",
        ],
        label_column_name="Exited",
    )
    ```
  </Step>

  <Step title="Upload the dataset to Openlayer">
    Now, you can upload your reference dataset alongside its config to Openlayer:

    ```python Python theme={null}
    from openlayer import Openlayer
    from openlayer.lib import data

    data.upload_reference_dataframe(
        client=Openlayer(api_key="YOUR_OPENLAYER_API_KEY_HERE"),
        inference_pipeline_id="YOUR_INFERENCE_PIPELINE_ID_HERE",
        dataset_df=df,
        config=config,
    )
    ```
  </Step>
</Steps>
