Definition
The ROC AUC test measures the macro-average of the area under the receiver operating characteristic curve score for each class, treating all classes equally. For multi-class classification tasks, it uses the one-versus-one configuration. ROC AUC evaluates the model’s ability to distinguish between classes across all classification thresholds.Taxonomy
- Task types: Tabular classification, text classification.
- Availability: and .
Why it matters
- ROC AUC provides a threshold-independent measure of classification performance, evaluating the model’s discriminative ability across all possible decision thresholds.
- It’s particularly useful for comparing models and understanding their ranking performance, regardless of the specific classification threshold chosen.
- Higher ROC AUC values indicate better model performance, with 1.0 representing perfect discrimination and 0.5 representing random performance.
- This metric is especially valuable when you need to understand the trade-offs between true positive rate and false positive rate.
Required columns
To compute this metric, your dataset must contain the following columns:- Prediction probabilities: The predicted class probabilities from your classification model
- Ground truths: The actual/true class labels
ROC AUC requires predicted probabilities, not just class labels. Ensure your
model outputs probability estimates for each class.
Test configuration examples
If you are writing atests.json
, here are a few valid configurations for the ROC AUC test:
Related
- Log loss test - Probabilistic measure of classification performance.
- Accuracy test - Overall classification correctness.
- Precision test - Measure positive prediction accuracy.
- Recall test - Measure ability to find all positive instances.
- Aggregate metrics - Overview of all available metrics.