Text classification dataset config

Attributes

The dataset config YAML file for text classification should contain the following attributes:

AttributeTypeDefaultDescriptionComments
classNamesList[str]-List of class names indexed by label integer in the dataset.E.g. ["Retained", "Exited"] when 0, 1 are in your label column.
columnNamesList[str]-List of the dataset's column names.
labelstr-Type of dataset.Must be one of training or validation
labelColumnNamestr-Name of the column with the labels.The data in this column must be zero-indexed integers, matching the list provided in classNames.
languagestr'en'The language of the dataset in ISO 639-1 (alpha-2 code) format.
metadataDict[str, any]{}Dictionary containing metadata about the dataset.This is the metadata that will be displayed on the Openlayer platform.
predictionsColumnNamestr-Name of the column with the model's predictions as zero-indexed integers.Applies only if you are uploading a model as well.
predictionScoresColumnNamestr-Name of the column with the model's predictions as lists of
class probabilities
.
Applies only if you are uploading a model as well.
sepstr‘,’Delimiter to use.Applies only if you are uploading the dataset as a CSV file. E.g. '\t'.
textColumnNamestr-Name of the column with the text.

Examples

Let’s look at an example dataset from one of the sample notebooks from Openlayer’s examples gallery GitHub repository.

Alternatively, for a training dataset as the one below — which contains zero-indexed model predictions,

a valid dataset_config.yaml file would be:

classNames:
- negative
- positive
columnNames:
- polarity
- tweetid
- query_name
- user
- text
- predictions
label: training
labelColumnName: polarity
predictionsColumnName: predictions
textColumnName: text

For a training dataset as the one below — which contains predictions scores,

a valid dataset_config.yaml file would be:

classNames:
- negative
- positive
columnNames:
- polarity
- tweetid
- query_name
- user
- text
- predictionScores
label: training
labelColumnName: polarity
predictionScoresColumnName: predictionScores
textColumnName: text