The dataset configuration is included alongside the dataset during the upload process to Openlayer. It is usually provided as a dictionary/object or as a YAML file. Refer to the API reference for details on the upload process.

Attributes

To see what goes into tabular regression data configs, select the tab that corresponds to your use case.

  • Development

  • Monitoring

Examples of datasets uploaded to development projects are training and validation sets.

In these cases, the config can contain the following attributes. Note that not all attributes are required, as some contain default values.

AttributeTypeDefaultDescriptionComments
categoricalFeatureNamesList[str][]A list containing the names of all categorical features in the dataset.E.g. [“Gender”, “Geography”].
featureNamesList[str][]List of all input feature names.
labelstr-Type of dataset.Must be one of training or validation
targetColumnNamestr-Name of the column with the targets (ground truth values).
metadataDict[str, any]Dictionary containing metadata about the dataset.This is the metadata that will be displayed on the Openlayer platform.
predictionsColumnNamestr-Name of the column with the model’s predictions.Applies only if you are uploading a model as well.
sepstr‘,’Delimiter to use.Applies only if you are uploading the dataset as a CSV file. E.g. ‘\t’.

Examples

Let’s look at an example dataset from one of the sample notebooks from Openlayer’s examples gallery GitHub repository.

For a training dataset as the one below,

a valid dataset_config.yaml file would be:

featureNames:
  - age
  - sex
  - bmi
  - bp
  - s1
  - s2
  - s3
  - s4
  - s5
  - s6
label: training
predictionsColumnName: predictions
targetColumnName: target