Set up tests
Tests overview
Explore the tests available on the platform
Tests materialize expectations around your model and data. They are categorized into three types: integrity, consistency, and performance. On this page, you can find a list of tests available on the platform, grouped by type.
Besides the platform provided tests, you can also create your own tests with custom metrics.
To learn more about tests and their role in AI/ML evaluation, check out Understanding tests.
Integrity tests
Test | Description | Task type |
---|---|---|
Character length | Define min/max bounds on the number of characters in a column across all rows. | LLM, text classification |
Class imbalance ratio | Measure the ratio between the most common class and the least common class. | Tabular classification, text classification |
Column average | Column average must be within range. | LLM, tabular classification, tabular regression, text classification |
Column contains string | Check that values in column A are contained in the lists in column B. | LLM, tabular classification, tabular regression, text classification |
Conflicting labels | Check for rows with identical feature values but differing labels. | Tabular classification, text classification |
Correlated features | Prevent features that are strongly correlated with one another. | Tabular classification, tabular regression |
Data type validation | Guard against features with violating data types. | Tabular classification, tabular regression |
Duplicate rows | Guard against identical rows in the dataset. | LLM, tabular classification, tabular regression, text classification |
Empty features | Expect specified features to not have only null values. | Tabular classification, tabular regression |
Empty feature count | Number of features that have only null values. | Tabular classification, tabular regression |
Features missing values | Ensure specified features do not have missing values. | Tabular classification, tabular regression |
Feature values | Ensure feature values do not violate defined ranges or categories. | Tabular classification, tabular regression |
Great expectations | Validate your data using any expectation supported by GX, an open-source library. | LLM, tabular classification, tabular regression, text classification |
Ill-formed rows | Rows with more non-alphabetical characters than alphabetical. | LLM, text classification |
Is code | Check that the data contains compilable and executable code. | LLM |
Is JSON | Check that the data contains valid JSONs. | LLM |
Null rows | Guard against rows containing missing values. | LLM, tabular classification, tabular regression, text classification |
Number of rows | Define min/max bounds on the number of dataset rows. | LLM, tabular classification, tabular regression, text classification |
Personal identifiable information (PII) | Detect rows containing personally identifiable information. | LLM |
PPS (predictive power score) | PPS (predictive power score) for a feature must be in specific range. | Tabular classification, tabular regression |
Quasi-constant features | Expect specified features to be near-constant, with very low variance. | Tabular classification, tabular regression |
Quasi-constant feature count | Set expectations on the number of features that are near-constant, with very low variance. | Tabular classification, tabular regression |
Special characters ratio | Check the ratio between the number of special characters to alphanumeric in the dataset. | LLM, text classification |
String validation | Guard against rows containing strings that violate defined patterns (RegEx). | LLM |
Valid URLs | Ensure the data contains valid URLs | LLM |
Consistency tests
Test | Description | Task type |
---|---|---|
Column drift | Measure drift in a specific column using one of the drift detection methods supported. | LLM, tabular classification, tabular regression, text classification |
Column values match | Make sure that rows in your two datasets have the same values for a specific column. | LLM, tabular classification, tabular regression, text classification |
Feature drift | Ensure similar feature distributions between current and reference datasets. | Tabular classification, tabular regression |
Label drift | Check if label distributions are significantly different between the training and validation sets. | Tabular classification, text classification |
New categories | Check if there are new categories that appear in the validation set not in the training set. | Tabular classification, text classification |
New labels | Labels in the validation set that are not in the training set. | Tabular classification, text classification |
Size ratio | Check the ratio between the number of rows in the validation and training datasets. | LLM, tabular classification, tabular regression, text classification |
Training-validation leakage | Detect training rows that are present in the validation dataset. | LLM, tabular classification, tabular regression, text classification |
Performance tests
Test | Description | Task type |
---|---|---|
Aggregate metrics | Set aggregate metrics thresholds for whole datasets or subpopulations within it. | LLM, tabular classification, tabular regression, text classification |
LLM evaluation | Evaluate the outputs using an LLM given a custom criteria. | LLM |
Max cost | Ensures that the maximum request cost (in USD) for the data is within a given range. | LLM |
Max latency (ms) | Measures the maximum latency of a single request in the period of data. | LLM, tabular classification, tabular regression, text classification |
Max tokens | Measures the max number of tokens used in one request in the period of data. | LLM |
Mean cost | Ensures that the average request cost (in USD) for the data is within a given range. | LLM |
Mean latency (ms) | Measures the mean latency of requests in the period of data. | LLM, tabular classification, tabular regression, text classification |
Mean tokens | Measures the mean number of tokens used in the period of data. | LLM |
Total cost | Ensures that the total request cost (in USD) for the data is within a given range. | LLM |
Total tokens | Measures the total number of tokens used in the period of data. | LLM |
Was this page helpful?