Definition
The ROC AUC test measures the macro-average of the area under the receiver operating characteristic curve score for each class, treating all classes equally. For multi-class classification tasks, it uses the one-versus-one configuration. ROC AUC evaluates the model’s ability to distinguish between classes across all classification thresholds.Taxonomy
- Task types: Tabular classification, text classification.
- Availability: and .
Why it matters
- ROC AUC provides a threshold-independent measure of classification performance, evaluating the model’s discriminative ability across all possible decision thresholds.
- It’s particularly useful for comparing models and understanding their ranking performance, regardless of the specific classification threshold chosen.
- Higher ROC AUC values indicate better model performance, with 1.0 representing perfect discrimination and 0.5 representing random performance.
- This metric is especially valuable when you need to understand the trade-offs between true positive rate and false positive rate.
Required columns
To compute this metric, your dataset must contain the following columns:- Prediction probabilities: The predicted class probabilities from your classification model
- Ground truths: The actual/true class labels
ROC AUC requires predicted probabilities, not just class labels. Ensure your
model outputs probability estimates for each class.
Test configuration examples
If you are writing atests.json, here are a few valid configurations for the ROC AUC test:
Related
- Log loss test - Probabilistic measure of classification performance.
- Accuracy test - Overall classification correctness.
- Precision test - Measure positive prediction accuracy.
- Recall test - Measure ability to find all positive instances.
- Aggregate metrics - Overview of all available metrics.

