Group by column statistic

Definition

The group by column statistic test allows you to measure a statistical property of one column grouped by the unique values of another column, and then set thresholds on how many groups fail to meet your criteria. For each unique value in the grouping column, the test calculates the specified statistic on the target column and checks if it meets your defined condition. The test then counts how many groups fail this condition and compares against your threshold.

Taxonomy

Task types: LLM, tabular classification, tabular regression.
Availability: and .

Why it matters

This test helps ensure statistical consistency across different segments or categories in your data.
It can detect bias, inconsistencies, or quality issues that affect specific subgroups differently.
It’s essential for fairness validation, ensuring that model inputs have similar statistical properties across different demographics or categories.
It helps identify data collection issues that might affect certain groups disproportionately.

How it works

The test follows these steps:

Group the data by unique values in the specified grouping column
Calculate the statistic (mean, median, etc.) on the target column for each group
Apply the condition to each group’s statistic (e.g., mean >= 25)
Count failing groups that don’t meet the condition
Compare the count/percentage of failing groups against your threshold

Available statistics

The following statistical measures are supported for the target column:

Statistic	Description	Example Use Case
`sum`	Sum of all values in each group	Total sales by region
`mean`	Average value for each group	Average age by geography
`median`	Median value for each group	Median income by job category
`min`	Minimum value in each group	Minimum score by demographic
`max`	Maximum value in each group	Maximum transaction by customer type
`count`	Number of records in each group	Sample size validation by segment
`variance`	Variance of values in each group	Consistency check by category
`std`	Standard deviation for each group	Variability assessment by group

Test configuration examples

If you are writing a tests.json, here are a few valid configurations for the group by column statistic test:

[
  {
    "name": "Average age consistency across geographies",
    "description": "Ensures that average age in each geography is at least 25, with max 1 failing geography allowed",
    "type": "integrity",
    "subtype": "groupByColumnStatsCheck",
    "thresholds": [
      {
        "insightName": "groupByColumnStatsCheck",
        "insightParameters": [
          { "name": "target_column_statistic", "value": "mean" },     // Statistic to calculate
          { "name": "target_column_name", "value": "age" },          // Column to analyze
          { "name": "operator", "value": ">=" },                    // Condition for each group
          { "name": "value", "value": 25 },                         // Threshold for each group
          { "name": "group_by_column_name", "value": "geography" }   // Column to group by
        ],
        "measurement": "failingGroupCount",  // Count of groups that fail the condition
        "operator": "<=",
        "value": 1  // Allow at most 1 geography to fail
      }
    ],
    "subpopulationFilters": null,
    "mode": "development",
    "usesValidationDataset": true,
    "usesTrainingDataset": false,
    "usesMlModel": false,
    "syncId": "b4dee7dc-4f15-48ca-a282-63e2c04e0689" // Some unique id
  },
  {
    "name": "Income distribution fairness check",
    "description": "Ensures no more than 10% of job categories have median income below $40K",
    "type": "integrity",
    "subtype": "groupByColumnStatsCheck",
    "thresholds": [
      {
        "insightName": "groupByColumnStatsCheck",
        "insightParameters": [
          { "name": "target_column_statistic", "value": "median" },
          { "name": "target_column_name", "value": "income" },
          { "name": "operator", "value": ">=" },
          { "name": "value", "value": 40000 },
          { "name": "group_by_column_name", "value": "job_category" }
        ],
        "measurement": "failingGroupPercentage",  // Percentage of groups that fail
        "operator": "<=",
        "value": 10.0  // Allow at most 10% of job categories to fail
      }
    ],
    "subpopulationFilters": null,
    "mode": "development",
    "usesValidationDataset": true,
    "usesTrainingDataset": false,
    "usesMlModel": false,
    "syncId": "96622fba-ea00-4e42-8f42-5e8f5f60805f" // Some unique id
  }
]

Get started

Workspace setup

Governance

Observability

Data quality monitoring

Offline testing

Tests

Administration

Other resources

Group by column statistic

Definition

Taxonomy

Why it matters

How it works

Available statistics

Test configuration examples

Get started

Workspace setup

Governance

Observability

Data quality monitoring

Offline testing

Tests

Administration

Other resources

​Definition

​Taxonomy

​Why it matters

​How it works

​Available statistics

​Test configuration examples

Definition

Taxonomy

Why it matters

How it works

Available statistics

Test configuration examples