
Definition
The anomalous column count test automatically learns time series patterns for each column in your dataset and detects when values fall outside predicted bounds. For numeric columns, it tracks statistical measures (like averages) over time, while for categorical columns, it monitors category counts. The test continuously learns expected ranges for each column and counts how many columns exhibit anomalous behavior on each evaluation, comparing this count against your specified threshold.Taxonomy
- Task types: Tabular classification, tabular regression.
- Availability: only.
This test is only available in monitoring mode as it requires historical data
to learn time series patterns and establish baseline expectations for each
column.
Why it matters
- Automated monitoring: Provides comprehensive data quality monitoring with minimal configuration required
- Early anomaly detection: Identifies unusual patterns across all columns simultaneously before they impact model performance
- Time series learning: Adapts to natural variations and trends in your data over time
- Comprehensive coverage: Monitors both numeric and categorical columns automatically
- Minimal setup: No need to manually configure thresholds for individual columns - the system learns appropriate bounds
How it works
The test operates through several phases:-
Learning phase: Analyzes historical data to establish time series patterns for each column
- Numeric columns: Tracks statistical measures (averages, medians, etc.) over time
- Categorical columns: Monitors counts of each category over time
- Prediction: Uses learned patterns to predict expected upper and lower bounds for each column’s current values
-
Anomaly detection: Compares current column values against predicted bounds
- Values outside the confidence interval are flagged as anomalous
- Counting: Counts the total number of columns exhibiting anomalous behavior
- Threshold comparison: Compares the anomalous column count against your specified threshold
Configuration parameters
The test supports an optionalinterval_width
parameter that controls the confidence interval for anomaly detection:
- interval_width: Confidence interval width (default: 0.95)
0.95
= 95% confidence interval (stricter, detects more anomalies)0.99
= 99% confidence interval (more lenient, detects fewer anomalies)
Test configuration examples
If you are writing atests.json
, here are a few valid configurations for the anomalous column count test: