Learn how to upload a reference dataset to your inference pipeline on Openlayer
A reference dataset is usually a representative sample of the training data used by the model. It is required
to monitor data drift — as its distribution serves as a reference to compare the distribution of your published data.
The dataset config is a dictionary containing
information that helps Openlayer understand your data.For example, the dataset above is from a tabular classification task, so our dataset config
will have information such as the feature names, class names,
and others:
Python
Copy
Ask AI
from openlayer.types.inference_pipelines import data_stream_params# You can replace with `ConfigTabularRegressionData`, `ConfigTextClassificationData`# or `ConfigTabularLlmData`, according to your task typeconfig = data_stream_params.ConfigTabularClassificationData( categorical_feature_names=["Gender", "Geography"], class_names=["Retained", "Exited"], feature_names=[ "CreditScore", "Geography", "Gender", "Age", "Tenure", "Balance", "NumOfProducts", "HasCrCard", "IsActiveMember", "EstimatedSalary", "AggregateRate", "Year", ], label_column_name="Exited",)
3
Upload to Openlayer
Now, you can upload your reference dataset alongside its config to Openlayer: