Uploading a reference dataset
A reference dataset is an optional component of a monitoring set up. However, it is necessary if you want to use one of the drift tests (e.g., feature drift, label drift, and others).
Ideally, the reference dataset is a representative sample of the training set used by the deployed model.
Reference datasets are uploaded to an inference pipeline with the upload_reference_dataframe
or upload_reference_dataset
methods
from Openlayer’s Python SDK.
The former is used if the dataset is loaded into memory as a pandas dataframe, while the latter is used if the dataset is saved to disk as a CSV file.
Here, we will show the use of upload_reference_dataframe
but the process is similar for upload_reference_dataset.
A reference dataset is uploaded with:
inference_pipeline.upload_reference_dataframe(
dataset_df=df,
dataset_config=config,
)
where config is a Python dictionary with information about the dataset. The items in such dataset depend on the task type. Refer to the Dataset config guides for details.
Was this page helpful?