Documentation Index
Fetch the complete documentation index at: https://docs.openlayer.com/llms.txt
Use this file to discover all available pages before exploring further.
A reference dataset is a representative sample of the data your model was
trained on (or any dataset you want to use as a baseline).
Openlayer uses this dataset for tests that monitor data drift — by comparing the
distribution of your live data against the reference distribution.
How to upload a reference dataset
You can upload a reference dataset to your inference pipeline with
the Python SDK.
Load your dataset into a DataFrame
Your dataset should be in a format Openlayer can understand. Here’s a minimal example with a single row:import pandas as pd
df = pd.DataFrame(
{
"CreditScore": [600],
"Geography": ["France"],
"Gender": ["Male"],
"Age": [40],
"Tenure": [5],
"Balance": [100000],
"NumOfProducts": [1],
"HasCrCard": [1],
"IsActiveMember": [1],
"EstimatedSalary": [50000],
"AggregateRate": [0.5],
"Year": [2020],
"Exited": [0],
}
)
Define the dataset configuration
The dataset config is a dictionary containing
information that helps Openlayer understand your data.For example, the dataset above is from a tabular classification task, so our dataset config
will have information such as the feature names, class names,
and others:from openlayer.types.inference_pipelines import data_stream_params
# You can replace with `ConfigTabularRegressionData`, `ConfigTextClassificationData`
# or `ConfigTabularLlmData`, according to your task type
config = data_stream_params.ConfigTabularClassificationData(
categorical_feature_names=["Gender", "Geography"],
class_names=["Retained", "Exited"],
feature_names=[
"CreditScore",
"Geography",
"Gender",
"Age",
"Tenure",
"Balance",
"NumOfProducts",
"HasCrCard",
"IsActiveMember",
"EstimatedSalary",
"AggregateRate",
"Year",
],
label_column_name="Exited",
)
Upload the dataset to Openlayer
Now, you can upload your reference dataset alongside its config to Openlayer:from openlayer import Openlayer
from openlayer.lib import data
data.upload_reference_dataframe(
client=Openlayer(api_key="YOUR_OPENLAYER_API_KEY_HERE"),
inference_pipeline_id="YOUR_INFERENCE_PIPELINE_ID_HERE",
dataset_df=df,
config=config,
)