Definition
The anomalous column count test automatically learns time series patterns for each
column in your dataset and detects when values fall outside predicted bounds. For
numeric columns, it tracks statistical measures (like averages) over time, while for
categorical columns, it monitors category counts.
The test continuously learns expected ranges for each column and counts how many
columns exhibit anomalous behavior on each evaluation, comparing this count against
your specified threshold.
Taxonomy
Task types : Tabular classification, tabular regression.
Availability : monitoring only.
This test is only available in monitoring mode as it requires historical data
to learn time series patterns and establish baseline expectations for each
column.
Why it matters
Automated monitoring : Provides comprehensive data quality monitoring with minimal configuration required
Early anomaly detection : Identifies unusual patterns across all columns simultaneously before they impact model performance
Time series learning : Adapts to natural variations and trends in your data over time
Comprehensive coverage : Monitors both numeric and categorical columns automatically
Minimal setup : No need to manually configure thresholds for individual columns - the system learns appropriate bounds
How it works
The test operates through several phases:
Learning phase : Analyzes historical data to establish time series patterns for each column
Numeric columns : Tracks statistical measures (averages, medians, etc.) over time
Categorical columns : Monitors counts of each category over time
Prediction : Uses learned patterns to predict expected upper and lower bounds for each column’s current values
Anomaly detection : Compares current column values against predicted bounds
Values outside the confidence interval are flagged as anomalous
Counting : Counts the total number of columns exhibiting anomalous behavior
Threshold comparison : Compares the anomalous column count against your specified threshold
Configuration parameters
The test supports an optional interval_width parameter that controls the confidence interval for anomaly detection:
interval_width : Confidence interval width (default: 0.95)
0.95 = 95% confidence interval (stricter, detects more anomalies)
0.99 = 99% confidence interval (more lenient, detects fewer anomalies)
Test configuration examples
If you are writing a tests.json, here are a few valid configurations for the anomalous column count test:
Monitoring - Basic Setup
Monitoring - With Custom Confidence Interval
[
{
"name" : "No anomalous columns detected" ,
"description" : "Alerts when any column shows anomalous behavior based on learned patterns" ,
"type" : "integrity" ,
"subtype" : "anomalousColumnCount" ,
"thresholds" : [
{
"insightName" : "anomalousColumnCount" ,
"measurement" : "anomalousColumnCount" ,
"operator" : "<=" ,
"value" : 0 // No anomalous columns allowed
}
],
"subpopulationFilters" : null ,
"mode" : "monitoring" ,
"usesProductionData" : true ,
"evaluationWindow" : 86400 , // 24 hours (daily evaluation)
"delayWindow" : 0 ,
"syncId" : "b4dee7dc-4f15-48ca-a282-63e2c04e0689" // Some unique id
}
]