Definition

The features missing values test allows you to specify the number (or percentage) of missing values that are allowed for each feature.

Taxonomy

  • Category: Integrity.
  • Task types: Tabular classification, tabular regression.
  • Availability: and .

Why it matters

  • Missing values can have a direct impact on model performance.
  • The values missing from certain features can indicate issues with the data collection/ingestion process.
  • Measuring and tracking the number of missing values can inform the imputation strategies to be used.

Test configuration examples

If you are writing a tests.json, here are a few valid configurations for the character length test:

[
  {
    "name": "Feature 'Year' does not contain missing values",
    "description": "Asserts that the feature 'Year' does not contain missing values",
    "type": "integrity",
    "subtype": "featureMissingValues",
    "thresholds": [
      {
        "insightName": "featureProfile",
        "insightParameters": [
          { "name": "name", "value": "Year" } // Selects feature `Year`
        ],
        "measurement": "percentMissingValues", // Must be one of `percentMissingValues` or `numMissingValues`
        "operator": "<=",
        "value": 0.0
      }
    ],
    "subpopulationFilters": null,
    "mode": "development",
    "usesValidationDataset": true, // Apply test to the validation set
    "usesTrainingDataset": false,
    "usesMlModel": false,
    "syncId": "b4dee7dc-4f15-48ca-a282-63e2c04e0689" // Some unique id
  }
]