Text classification dataset config
Attributes
The dataset config can be a Python dictionary or be saved as a YAML file.
For text classification, it should contain the following attributes:
Attribute | Type | Default | Description | Comments |
---|---|---|---|---|
classNames | List[str] | - | List of class names indexed by label integer in the dataset. | E.g. ["Retained", "Exited"] when 0, 1 are in your label column. |
label | str | - | Type of dataset. | Must be one of training or validation |
labelColumnName | str | - | Name of the column with the labels. | The data in this column must be zero-indexed integers, matching the list provided in classNames. |
language | str | 'en' | The language of the dataset in ISO 639-1 (alpha-2 code) format. | |
metadata | Dict[str, any] | {} | Dictionary containing metadata about the dataset. | This is the metadata that will be displayed on the Openlayer platform. |
predictionsColumnName | str | - | Name of the column with the model's predictions as zero-indexed integers. | Applies only if you are uploading a model as well. |
predictionScoresColumnName | str | - | Name of the column with the model's predictions as lists of class probabilities. | Applies only if you are uploading a model as well. |
sep | str | ‘,’ | Delimiter to use. | Applies only if you are uploading the dataset as a CSV file. E.g. '\t'. |
textColumnName | str | - | Name of the column with the text. |
Examples
Let’s look at an example dataset from one of the sample notebooks from Openlayer’s examples gallery GitHub repository.
For a training dataset as the one below — which contains predictions scores,

a valid dataset_config.yaml
file would be:
classNames:
- negative
- positive
label: training
labelColumnName: polarity
predictionScoresColumnName: predictionScores
textColumnName: text
Alternatively, for a training dataset as the one below — which contains zero-indexed model predictions,

a valid dataset_config.yaml
file would be:
classNames:
- negative
- positive
label: training
labelColumnName: polarity
predictionsColumnName: predictions
textColumnName: text
Updated 3 days ago