Group by column statistic

Definition

The group by column statistic test allows you to measure a statistical property of one column grouped by the unique values of another column, and then set thresholds on how many groups fail to meet your criteria. For each unique value in the grouping column, the test calculates the specified statistic on the target column and checks if it meets your defined condition. The test then counts how many groups fail this condition and compares against your threshold.

Taxonomy

  • Task types: LLM, tabular classification, tabular regression.
  • Availability: and .

Why it matters

  • This test helps ensure statistical consistency across different segments or categories in your data.
  • It can detect bias, inconsistencies, or quality issues that affect specific subgroups differently.
  • It’s essential for fairness validation, ensuring that model inputs have similar statistical properties across different demographics or categories.
  • It helps identify data collection issues that might affect certain groups disproportionately.

How it works

The test follows these steps:
  1. Group the data by unique values in the specified grouping column
  2. Calculate the statistic (mean, median, etc.) on the target column for each group
  3. Apply the condition to each group’s statistic (e.g., mean >= 25)
  4. Count failing groups that don’t meet the condition
  5. Compare the count/percentage of failing groups against your threshold

Available statistics

The following statistical measures are supported for the target column:
StatisticDescriptionExample Use Case
sumSum of all values in each groupTotal sales by region
meanAverage value for each groupAverage age by geography
medianMedian value for each groupMedian income by job category
minMinimum value in each groupMinimum score by demographic
maxMaximum value in each groupMaximum transaction by customer type
countNumber of records in each groupSample size validation by segment
varianceVariance of values in each groupConsistency check by category
stdStandard deviation for each groupVariability assessment by group

Test configuration examples

If you are writing a tests.json, here are a few valid configurations for the group by column statistic test:
[
  {
    "name": "Average age consistency across geographies",
    "description": "Ensures that average age in each geography is at least 25, with max 1 failing geography allowed",
    "type": "integrity",
    "subtype": "groupByColumnStatsCheck",
    "thresholds": [
      {
        "insightName": "groupByColumnStatsCheck",
        "insightParameters": [
          { "name": "target_column_statistic", "value": "mean" },     // Statistic to calculate
          { "name": "target_column_name", "value": "age" },          // Column to analyze
          { "name": "operator", "value": ">=" },                    // Condition for each group
          { "name": "value", "value": 25 },                         // Threshold for each group
          { "name": "group_by_column_name", "value": "geography" }   // Column to group by
        ],
        "measurement": "failingGroupCount",  // Count of groups that fail the condition
        "operator": "<=",
        "value": 1  // Allow at most 1 geography to fail
      }
    ],
    "subpopulationFilters": null,
    "mode": "development",
    "usesValidationDataset": true,
    "usesTrainingDataset": false,
    "usesMlModel": false,
    "syncId": "b4dee7dc-4f15-48ca-a282-63e2c04e0689" // Some unique id
  },
  {
    "name": "Income distribution fairness check",
    "description": "Ensures no more than 10% of job categories have median income below $40K",
    "type": "integrity",
    "subtype": "groupByColumnStatsCheck",
    "thresholds": [
      {
        "insightName": "groupByColumnStatsCheck",
        "insightParameters": [
          { "name": "target_column_statistic", "value": "median" },
          { "name": "target_column_name", "value": "income" },
          { "name": "operator", "value": ">=" },
          { "name": "value", "value": 40000 },
          { "name": "group_by_column_name", "value": "job_category" }
        ],
        "measurement": "failingGroupPercentage",  // Percentage of groups that fail
        "operator": "<=",
        "value": 10.0  // Allow at most 10% of job categories to fail
      }
    ],
    "subpopulationFilters": null,
    "mode": "development",
    "usesValidationDataset": true,
    "usesTrainingDataset": false,
    "usesMlModel": false,
    "syncId": "96622fba-ea00-4e42-8f42-5e8f5f60805f" // Some unique id
  }
]