Tabular classification dataset config

Attributes

The dataset config YAML file for tabular classification should contain the following attributes:

AttributeTypeDefaultDescriptionComments
categoricalFeatureNamesList[str][]A list containing the names of all categorical features in the dataset.E.g. ["Gender", "Geography"].
classNamesList[str]-List of class names indexed by label integer in the dataset.E.g. ["Retained", "Exited"] when 0, 1 are in your label column.
columnNamesList[str]-List of the dataset's column names.
featureNamesList[str][]List of all input feature names.
labelstr-Type of dataset.Must be one of training or validation
labelColumnNamestr-Name of the column with the labels.The data in this column must be zero-indexed integers, matching the list provided in classNames.
metadataDict[str, any]{}Dictionary containing metadata about the dataset.This is the metadata that will be displayed on the Openlayer platform.
predictionsColumnNamestr-Name of the column with the model's predictions as zero-indexed integers.Applies only if you are uploading a model as well.
predictionScoresColumnNamestr-Name of the column with the model's predictions as lists of
class probabilities
.
Applies only if you are uploading a model as well.
sepstr‘,’Delimiter to use.Applies only if you are uploading the dataset as a CSV file. E.g. '\t'.

Examples

Let’s look at an example dataset from one of the sample notebooks from Openlayer’s examples gallery GitHub repository.

Alternatively, for a training dataset as the one below — which contains zero-indexed model predictions,

a valid dataset_config.yaml file would be:

categoricalFeatureNames:
- Gender
- Geography
classNames:
- Retained
- Exited
columnNames:
- CreditScore
- Geography
- Gender
- Age
- Tenure
- Balance
- NumOfProducts
- HasCrCard
- IsActiveMember
- EstimatedSalary
- churn
- predictionScores
featureNames:
- CreditScore
- Geography
- Gender
- Age
- Tenure
- Balance
- NumOfProducts
- HasCrCard
- IsActiveMember
- EstimatedSalary
label: training
labelColumnName: churn
predictionsColumnName: predictions

For a training dataset as the one below — which contains predictions scores,

a valid dataset_config.yaml file would be:

categoricalFeatureNames:
- Gender
- Geography
classNames:
- Retained
- Exited
columnNames:
- CreditScore
- Geography
- Gender
- Age
- Tenure
- Balance
- NumOfProducts
- HasCrCard
- IsActiveMember
- EstimatedSalary
- churn
- predictionScores
featureNames:
- CreditScore
- Geography
- Gender
- Age
- Tenure
- Balance
- NumOfProducts
- HasCrCard
- IsActiveMember
- EstimatedSalary
label: training
labelColumnName: churn
predictionScoresColumnName: predictionScores