Definition

The total tokens test ensures that the total number of tokens in the data is within a given range.

Given that different LLMs use different tokenizers, to use this test, the data you upload/publish to Openlayer must have a column with the number of tokens in each row. In the data config, the name of this column is specified in the numOfTokenColumnName field.

Taxonomy

  • Category: Performance.
  • Task types: LLM.
  • Availability: and .

Why it matters

  • Depending on the application you are building, you might want to limit the number of tokens generated by the LLM.
  • LLMs have a limited context window, so it is important to ensure that the number of tokens in the data is within the context window capacity.