The dataset configuration is included alongside the dataset during the upload process to Openlayer. It is usually provided as a dictionary/object or as a YAML file. Refer to the API reference for details on the upload process.

Attributes

To see what goes into tabular classification data configs, select the tab that corresponds to your use case.

  • Development

  • Monitoring

Examples of datasets uploaded to development projects are training and validation sets.

In these cases, the config can contain the following attributes. Note that not all attributes are required, as some contain default values.

AttributeTypeDefaultDescriptionComments
categoricalFeatureNamesList[str][]A list containing the names of all categorical features in the dataset.E.g. [“Gender”, “Geography”].
classNamesList[str]-List of class names indexed by label integer in the dataset.E.g. [“Retained”, “Exited”] when 0, 1 are in your label column.
featureNamesList[str][]List of all input feature names.
labelstr-Type of dataset.Must be one of training or validation.
labelColumnNamestr-Name of the column with the labels.The data in this column must be zero-indexed integers, matching the list provided in classNames.
metadataDict[str, any]{}Dictionary containing metadata about the dataset.This is the metadata that will be displayed on the Openlayer platform.
predictionsColumnNamestr-Name of the column with the model’s predictions as zero-indexed integers.Applies only if you are uploading a model as well.
predictionScoresColumnNamestr-Name of the column with the model’s predictions as lists of class probabilities.Applies only if you are uploading a model as well.
sepstr‘,’Delimiter to use.Applies only if you are uploading the dataset as a CSV file. E.g. ‘\t’.

Examples

Let’s look at an example dataset from one of the sample notebooks from Openlayer’s examples gallery GitHub repository.

For a training dataset as the one below — which contains predictions scores,

a valid dataset_config.yaml file would be:

categoricalFeatureNames:
  - Gender
  - Geography
classNames:
  - Retained
  - Exited
featureNames:
  - CreditScore
  - Geography
  - Gender
  - Age
  - Tenure
  - Balance
  - NumOfProducts
  - HasCrCard
  - IsActiveMember
  - EstimatedSalary
label: training
labelColumnName: churn
predictionScoresColumnName: predictionScores

Alternatively, for a training dataset as the one below — which contains zero-indexed model predictions,

a valid dataset_config.yaml file would be:

categoricalFeatureNames:
  - Gender
  - Geography
classNames:
  - Retained
  - Exited
featureNames:
  - CreditScore
  - Geography
  - Gender
  - Age
  - Tenure
  - Balance
  - NumOfProducts
  - HasCrCard
  - IsActiveMember
  - EstimatedSalary
label: training
labelColumnName: churn
predictionsColumnName: predictions