Character length | Define min/max bounds on the number of characters in a column across all rows. | LLM, text classification |
Class imbalance ratio | Measure the ratio between the most common class and the least common class. | Tabular classification, text classification |
Column average | Column average must be within range. | LLM, tabular classification, tabular regression, text classification |
Column contains string | Check that values in column A are contained in the lists in column B. | LLM, tabular classification, tabular regression, text classification |
Conflicting labels | Check for rows with identical feature values but differing labels. | Tabular classification, text classification |
Correlated features | Prevent features that are strongly correlated with one another. | Tabular classification, tabular regression |
Data type validation | Guard against features with violating data types. | Tabular classification, tabular regression |
Duplicate rows | Guard against identical rows in the dataset. | LLM, tabular classification, tabular regression, text classification |
Empty features | Expect specified features to not have only null values. | Tabular classification, tabular regression |
Empty feature count | Number of features that have only null values. | Tabular classification, tabular regression |
Features missing values | Ensure specified features do not have missing values. | Tabular classification, tabular regression |
Feature values | Ensure feature values do not violate defined ranges or categories. | Tabular classification, tabular regression |
Great expectations | Validate your data using any expectation supported by GX, an open-source library. | LLM, tabular classification, tabular regression, text classification |
Ill-formed rows | Rows with more non-alphabetical characters than alphabetical. | LLM, text classification |
Is code | Check that the data contains compilable and executable code. | LLM |
Is JSON | Check that the data contains valid JSONs. | LLM |
Null rows | Guard against rows containing missing values. | LLM, tabular classification, tabular regression, text classification |
Number of rows | Define min/max bounds on the number of dataset rows. | LLM, tabular classification, tabular regression, text classification |
Personal identifiable information (PII) | Detect rows containing personally identifiable information. | LLM |
PPS (predictive power score) | PPS (predictive power score) for a feature must be in specific range. | Tabular classification, tabular regression |
Quasi-constant features | Expect specified features to be near-constant, with very low variance. | Tabular classification, tabular regression |
Quasi-constant feature count | Set expectations on the number of features that are near-constant, with very low variance. | Tabular classification, tabular regression |
Special characters ratio | Check the ratio between the number of special characters to alphanumeric in the dataset. | LLM, text classification |
String validation | Guard against rows containing strings that violate defined patterns (RegEx). | LLM |
Valid URLs | Ensure the data contains valid URLs | LLM |