The dataset configuration is included alongside the dataset during the upload process to Openlayer. It is usually provided as a dictionary/object or as a YAML file. Refer to the API reference for details on the upload process.


To see what goes into LLM data configs, select the tab that corresponds to your use case.

  • Development

  • Monitoring

Examples of datasets uploaded to development projects are validation sets, and fine-tuning sets.

In these cases, the config can contain the following attributes. Note that not all attributes are required, as some contain default values.

contextColumnNameOptional[str]NoneName of the column with the context retrieved.Applies to RAG use cases. Providing the context enables RAG-specific metrics.
costColumnNameOptional[str]NoneName of the column with the cost associated with each row.
groundTruthColumnNameOptional[str]NoneName of the column with the ground truths.
inputVariableNamesList[str][]List of input variable names.Each input variable should be a dataset column.
labelstr-Type of dataset.Must be one of validation or fine-tuning
metadataDict[str, any]Dictionary containing metadata about the dataset.This is the metadata that will be displayed on the Openlayer platform.
numOfTokenColumnNameOptional[str]NoneName of the column with the number of tokens.
outputColumnNamestr-Name of the column with the model outputs.Applies only if you are uploading a model as well.
questionColumnNameOptional[str]NoneName of the column with the question.Applies to RAG use cases. Providing the question enables RAG-specific metrics.
sepstr‘,’Delimiter to use.Applies only if you are uploading the dataset as a CSV file. E.g. ‘\t’.


Let’s look at an example dataset from one of the sample notebooks from Openlayer’s examples gallery GitHub repository.

For a validation dataset as the one below — which contains model outputs but no ground truths,

A valid dataset config would be:

dataset_config = {
    "inputVariableNames": ["description", "seed_words"],
    "label": "validation",
    "outputColumnName": "model_output",