Tests overview

Tests materialize expectations around your model and data. They are categorized into three types: integrity, consistency, and performance. On this page, you can find a list of tests available on the platform, grouped by type.

Besides the platform provided tests, you can also create your own tests with custom metrics.

To learn more about tests and their role in AI/ML evaluation, check out Understanding tests.

Integrity tests

Test	Description	Task type
Character length	Define min/max bounds on the number of characters in a column across all rows.	LLM, text classification
Class imbalance ratio	Measure the ratio between the most common class and the least common class.	Tabular classification, text classification
Column average	Column average must be within range.	LLM, tabular classification, tabular regression, text classification
Column contains string	Check that values in column A are contained in the lists in column B.	LLM, tabular classification, tabular regression, text classification
Conflicting labels	Check for rows with identical feature values but differing labels.	Tabular classification, text classification
Correlated features	Prevent features that are strongly correlated with one another.	Tabular classification, tabular regression
Data type validation	Guard against features with violating data types.	Tabular classification, tabular regression
Duplicate rows	Guard against identical rows in the dataset.	LLM, tabular classification, tabular regression, text classification
Empty features	Expect specified features to not have only null values.	Tabular classification, tabular regression
Empty feature count	Number of features that have only null values.	Tabular classification, tabular regression
Features missing values	Ensure specified features do not have missing values.	Tabular classification, tabular regression
Feature values	Ensure feature values do not violate defined ranges or categories.	Tabular classification, tabular regression
Great expectations	Validate your data using any expectation supported by GX, an open-source library.	LLM, tabular classification, tabular regression, text classification
Ill-formed rows	Rows with more non-alphabetical characters than alphabetical.	LLM, text classification
Is code	Check that the data contains compilable and executable code.	LLM
Is JSON	Check that the data contains valid JSONs.	LLM
Null rows	Guard against rows containing missing values.	LLM, tabular classification, tabular regression, text classification
Number of rows	Define min/max bounds on the number of dataset rows.	LLM, tabular classification, tabular regression, text classification
Personal identifiable information (PII)	Detect rows containing personally identifiable information.	LLM
PPS (predictive power score)	PPS (predictive power score) for a feature must be in specific range.	Tabular classification, tabular regression
Quasi-constant features	Expect specified features to be near-constant, with very low variance.	Tabular classification, tabular regression
Quasi-constant feature count	Set expectations on the number of features that are near-constant, with very low variance.	Tabular classification, tabular regression
Special characters ratio	Check the ratio between the number of special characters to alphanumeric in the dataset.	LLM, text classification
String validation	Guard against rows containing strings that violate defined patterns (RegEx).	LLM
Valid URLs	Ensure the data contains valid URLs	LLM

Consistency tests

Test	Description	Task type
Column drift	Measure drift in a specific column using one of the drift detection methods supported.	LLM, tabular classification, tabular regression, text classification
Column values match	Make sure that rows in your two datasets have the same values for a specific column.	LLM, tabular classification, tabular regression, text classification
Feature drift	Ensure similar feature distributions between current and reference datasets.	Tabular classification, tabular regression
Label drift	Check if label distributions are significantly different between the training and validation sets.	Tabular classification, text classification
New categories	Check if there are new categories that appear in the validation set not in the training set.	Tabular classification, text classification
New labels	Labels in the validation set that are not in the training set.	Tabular classification, text classification
Size ratio	Check the ratio between the number of rows in the validation and training datasets.	LLM, tabular classification, tabular regression, text classification
Training-validation leakage	Detect training rows that are present in the validation dataset.	LLM, tabular classification, tabular regression, text classification

Performance tests

Test	Description	Task type
Aggregate metrics	Set aggregate metrics thresholds for whole datasets or subpopulations within it.	LLM, tabular classification, tabular regression, text classification
LLM evaluation	Evaluate the outputs using an LLM given a custom criteria.	LLM
Max cost	Ensures that the maximum request cost (in USD) for the data is within a given range.	LLM
Max latency (ms)	Measures the maximum latency of a single request in the period of data.	LLM, tabular classification, tabular regression, text classification
Max tokens	Measures the max number of tokens used in one request in the period of data.	LLM
Mean cost	Ensures that the average request cost (in USD) for the data is within a given range.	LLM
Mean latency (ms)	Measures the mean latency of requests in the period of data.	LLM, tabular classification, tabular regression, text classification
Mean tokens	Measures the mean number of tokens used in the period of data.	LLM
Total cost	Ensures that the total request cost (in USD) for the data is within a given range.	LLM
Total tokens	Measures the total number of tokens used in the period of data.	LLM

Get started

Set up tests

Test your system offline

Monitor your live system

Other resources

Integrity tests

Consistency tests

Performance tests

Get started

Set up tests

Test your system offline

Monitor your live system

Other resources

​Integrity tests

​Consistency tests

​Performance tests

Integrity tests

Consistency tests

Performance tests