The dataset configuration is included alongside the dataset during the upload process to Openlayer. It is usually provided as a dictionary/object or as a YAML file. Refer to the API reference for details on the upload process.

Attributes

To see what goes into LLM data configs, select the tab that corresponds to your use case.

  • Development

  • Monitoring

Examples of datasets uploaded to development projects are validation sets, and fine-tuning sets.

In these cases, the config can contain the following attributes. Note that not all attributes are required, as some contain default values.

AttributeTypeDefaultDescriptionComments
contextColumnNameOptional[str]NoneName of the column with the context retrieved.Applies to RAG use cases. Providing the context enables RAG-specific metrics.
costColumnNameOptional[str]NoneName of the column with the cost associated with each row.
groundTruthColumnNameOptional[str]NoneName of the column with the ground truths.
inputVariableNamesList[str][]List of input variable names.Each input variable should be a dataset column.
labelstr-Type of dataset.Must be one of validation or fine-tuning
metadataDict[str, any]Dictionary containing metadata about the dataset.This is the metadata that will be displayed on the Openlayer platform.
numOfTokenColumnNameOptional[str]NoneName of the column with the number of tokens.
outputColumnNamestr-Name of the column with the model outputs.Applies only if you are uploading a model as well.
questionColumnNameOptional[str]NoneName of the column with the question.Applies to RAG use cases. Providing the question enables RAG-specific metrics.
sepstr‘,’Delimiter to use.Applies only if you are uploading the dataset as a CSV file. E.g. ‘\t’.

Examples

Let’s look at an example dataset from one of the sample notebooks from Openlayer’s examples gallery GitHub repository.

For a validation dataset as the one below — which contains model outputs but no ground truths,

A valid dataset config would be:

dataset_config = {
    "inputVariableNames": ["description", "seed_words"],
    "label": "validation",
    "outputColumnName": "model_output",
}