Tests materialize expectations around your model and data. They are categorized into three types: integrity, consistency, and performance. On this page, you can find a list of tests available on the platform, grouped by type.

To learn more about tests and their role in AI/ML evaluation, check out Understanding tests.

Integrity tests

TestDescriptionTask type
Character lengthDefine min/max bounds on the number of characters in a column across all rows.LLM, text classification
Class imbalance ratioMeasure the ratio between the most common class and the least common class.Tabular classification, text classification
Column averageColumn average must be within range.LLM, tabular classification, tabular regression, text classification
Column contains stringCheck that values in column A are contained in the lists in column B.LLM, tabular classification, tabular regression, text classification
Conflicting labelsCheck for rows with identical feature values but differing labels.Tabular classification, text classification
Correlated featuresPrevent features that are strongly correlated with one another.Tabular classification, tabular regression
Data type validationGuard against features with violating data types.Tabular classification, tabular regression
Duplicate rowsGuard against identical rows in the dataset.LLM, tabular classification, tabular regression, text classification
Empty featuresExpect specified features to not have only null values.Tabular classification, tabular regression
Empty feature countNumber of features that have only null values.Tabular classification, tabular regression
Features missing valuesEnsure specified features do not have missing values.Tabular classification, tabular regression
Feature valuesEnsure feature values do not violate defined ranges or categories.Tabular classification, tabular regression
Great expectationsValidate your data using any expectation supported by GX, an open-source library.LLM, tabular classification, tabular regression, text classification
Ill-formed rowsRows with more non-alphabetical characters than alphabetical.LLM, text classification
Is codeCheck that the data contains compilable and executable code.LLM
Is JSONCheck that the data contains valid JSONs.LLM
Null rowsGuard against rows containing missing values.LLM, tabular classification, tabular regression, text classification
Number of rowsDefine min/max bounds on the number of dataset rows.LLM, tabular classification, tabular regression, text classification
Personal identifiable information (PII)Detect rows containing personally identifiable information.LLM
PPS (predictive power score)PPS (predictive power score) for a feature must be in specific range.Tabular classification, tabular regression
Quasi-constant featuresExpect specified features to be near-constant, with very low variance.Tabular classification, tabular regression
Quasi-constant feature countSet expectations on the number of features that are near-constant, with very low variance.Tabular classification, tabular regression
Special characters ratioCheck the ratio between the number of special characters to alphanumeric in the dataset.LLM, text classification
String validationGuard against rows containing strings that violate defined patterns (RegEx).LLM
Valid URLsEnsure the data contains valid URLsLLM

Consistency tests

TestDescriptionTask type
Column driftMeasure drift in a specific column using one of the drift detection methods supported.LLM, tabular classification, tabular regression, text classification
Column values matchMake sure that rows in your two datasets have the same values for a specific column.LLM, tabular classification, tabular regression, text classification
Feature driftEnsure similar feature distributions between current and reference datasets.Tabular classification, tabular regression
Label driftCheck if label distributions are significantly different between the training and validation sets.Tabular classification, text classification
New categoriesCheck if there are new categories that appear in the validation set not in the training set.Tabular classification, text classification
New labelsLabels in the validation set that are not in the training set.Tabular classification, text classification
Size ratioCheck the ratio between the number of rows in the validation and training datasets.LLM, tabular classification, tabular regression, text classification
Training-validation leakageDetect training rows that are present in the validation dataset.LLM, tabular classification, tabular regression, text classification

Performance tests

TestDescriptionTask type
Aggregate metricsSet aggregate metrics thresholds for whole datasets or subpopulations within it.LLM, tabular classification, tabular regression, text classification
GPT evaluationEvaluate the outputs using an LLM given a custom criteria.LLM
Max costEnsures that the maximum request cost (in USD) for the data is within a given range.LLM
Max latency (ms)Measures the maximum latency of a single request in the period of data.LLM, tabular classification, tabular regression, text classification
Max tokensMeasures the max number of tokens used in one request in the period of data.LLM
Mean costEnsures that the average request cost (in USD) for the data is within a given range.LLM
Mean latency (ms)Measures the mean latency of requests in the period of data.LLM, tabular classification, tabular regression, text classification
Mean tokensMeasures the mean number of tokens used in the period of data.LLM
Total costEnsures that the total request cost (in USD) for the data is within a given range.LLM
Total tokensMeasures the total number of tokens used in the period of data.LLM