ROC AUC

Definition
Taxonomy
Why it matters
Required columns
Test configuration examples
Related

Definition

The ROC AUC test measures the macro-average of the area under the receiver operating characteristic curve score for each class, treating all classes equally. For multi-class classification tasks, it uses the one-versus-one configuration. ROC AUC evaluates the model’s ability to distinguish between classes across all classification thresholds.

Taxonomy

Task types: Tabular classification, text classification.
Availability: and .

Why it matters

ROC AUC provides a threshold-independent measure of classification performance, evaluating the model’s discriminative ability across all possible decision thresholds.
It’s particularly useful for comparing models and understanding their ranking performance, regardless of the specific classification threshold chosen.
Higher ROC AUC values indicate better model performance, with 1.0 representing perfect discrimination and 0.5 representing random performance.
This metric is especially valuable when you need to understand the trade-offs between true positive rate and false positive rate.

Required columns

To compute this metric, your dataset must contain the following columns:

Prediction probabilities: The predicted class probabilities from your classification model
Ground truths: The actual/true class labels

ROC AUC requires predicted probabilities, not just class labels. Ensure your model outputs probability estimates for each class.

Test configuration examples

If you are writing a tests.json, here are a few valid configurations for the ROC AUC test:

[
  {
    "name": "ROC AUC above 0.85",
    "description": "Ensure that the ROC AUC score is above 0.85",
    "type": "performance",
    "subtype": "metricThreshold",
    "thresholds": [
      {
        "insightName": "metrics",
        "insightParameters": null,
        "measurement": "rocAuc",
        "operator": ">",
        "value": 0.85
      }
    ],
    "subpopulationFilters": null,
    "mode": "development",
    "usesValidationDataset": true,
    "usesTrainingDataset": false,
    "usesMlModel": true,
    "syncId": "b4dee7dc-4f15-48ca-a282-63e2c04e0689"
  }
]

Log loss test - Probabilistic measure of classification performance.
Accuracy test - Overall classification correctness.
Precision test - Measure positive prediction accuracy.
Recall test - Measure ability to find all positive instances.
Aggregate metrics - Overview of all available metrics.

Log loss

Tests configuration

⌘I

Get started

Workspace setup

Governance

Observability

Data quality monitoring

Offline testing

Tests

Administration

Other resources

Definition

Taxonomy

Why it matters

Required columns

Test configuration examples

Get started

Workspace setup

Governance

Observability

Data quality monitoring

Offline testing

Tests

Administration

Other resources

​Definition

​Taxonomy

​Why it matters

​Required columns

​Test configuration examples

​Related

Definition

Taxonomy

Why it matters

Required columns

Test configuration examples

Related