Skip to main content

Definition

The new categories test checks if there are new categories in the validation set which are not present in the training set for the categorical features.

Taxonomy

  • Task types: Tabular classification, tabular regression.
  • Availability: .

Why it matters

  • If the validation set contains new categories, the model is not prepared to make good predictions for them.

Test configuration examples

If you are writing a tests.json, here are a few valid configurations for the character length test:
[
  {
    "name": "No new categories",
    "description": "Asserts that there are no new categories in the current dataset if compared to the reference dataset",
    "type": "consistency",
    "subtype": "newCategoryCount",
    "thresholds": [
      {
        "insightName": "newCategories",
        "insightParameters": null,
        "measurement": "newCategoryCount",
        "operator": "<=",
        "value": 0
      }
    ],
    "subpopulationFilters": null,
    "mode": "development",
    "usesValidationDataset": true,
    "usesTrainingDataset": true,
    "usesMlModel": false,
    "syncId": "b4dee7dc-4f15-48ca-a282-63e2c04e0689" // Some unique id
  }
]