The dataset configuration is included alongside the dataset during the upload process to Openlayer. It is usually provided as a dictionary/object or as a YAML file. Refer to the API reference for details on the upload process.

Attributes

To see what goes into text classification data configs, select the tab that corresponds to your use case.

  • Development

  • Monitoring

Examples of datasets uploaded to development projects are training and validation sets.

In these cases, the config can contain the following attributes. Note that not all attributes are required, as some contain default values.

AttributeTypeDefaultDescriptionComments
classNamesList[str]-List of class names indexed by label integer in the dataset.E.g. [“Retained”, “Exited”] when 0, 1 are in your label column.
labelstr-Type of dataset.Must be one of training or validation
labelColumnNamestr-Name of the column with the labels.The data in this column must be zero-indexed integers, matching the list provided in classNames.
languagestr‘en’The language of the dataset in ISO 639-1 (alpha-2 code) format.
metadataDict[str, any]Dictionary containing metadata about the dataset.This is the metadata that will be displayed on the Openlayer platform.
predictionsColumnNamestr-Name of the column with the model’s predictions as zero-indexed integers.Applies only if you are uploading a model as well.
predictionScoresColumnNamestr-Name of the column with the model’s predictions as lists of class probabilities.Applies only if you are uploading a model as well.
sepstr‘,’Delimiter to use.Applies only if you are uploading the dataset as a CSV file. E.g. ‘\t’.
textColumnNamestr-Name of the column with the text.

Examples

Let’s look at an example dataset from one of the sample notebooks from Openlayer’s examples gallery GitHub repository.

For a training dataset as the one below — which contains predictions scores,

a valid dataset_config.yaml file would be:

classNames:
  - negative
  - positive
label: training
labelColumnName: polarity
predictionScoresColumnName: predictionScores
textColumnName: text

Alternatively, for a training dataset as the one below — which contains zero-indexed model predictions,

a valid dataset_config.yaml file would be:

classNames:
  - negative
  - positive
label: training
labelColumnName: polarity
predictionsColumnName: predictions
textColumnName: text