Log loss

Definition
Taxonomy
Why it matters
Required columns
Test configuration examples
Related

Definition

The log loss test measures the dissimilarity between predicted probabilities and the true distribution. Also known as cross-entropy loss or binary cross-entropy (in the binary classification case), it evaluates how well the model’s predicted probabilities match the actual class labels.

Taxonomy

Task types: Tabular classification, text classification.
Availability: and .

Why it matters

Log loss provides a probabilistic measure of classification performance, considering not just correctness but also confidence in predictions.
It heavily penalizes confident wrong predictions, making it sensitive to model calibration and overconfidence.
Lower log loss values indicate better model performance, with 0 representing perfect probability predictions.
This metric is particularly valuable when you need well-calibrated probability estimates, not just class predictions.

Required columns

To compute this metric, your dataset must contain the following columns:

Prediction probabilities: The predicted class probabilities from your classification model
Ground truths: The actual/true class labels

Log loss requires predicted probabilities, not just class labels. Ensure your model outputs probability estimates for each class.

Test configuration examples

If you are writing a tests.json, here are a few valid configurations for the log loss test:

[
  {
    "name": "Log loss below 0.3",
    "description": "Ensure that the log loss is below 0.3",
    "type": "performance",
    "subtype": "metricThreshold",
    "thresholds": [
      {
        "insightName": "metrics",
        "insightParameters": null,
        "measurement": "logLoss",
        "operator": "<",
        "value": 0.3
      }
    ],
    "subpopulationFilters": null,
    "mode": "development",
    "usesValidationDataset": true,
    "usesTrainingDataset": false,
    "usesMlModel": true,
    "syncId": "b4dee7dc-4f15-48ca-a282-63e2c04e0689"
  }
]

ROC AUC test - Area under the receiver operating characteristic curve.
Accuracy test - Overall classification correctness.
Precision test - Measure positive prediction accuracy.
Recall test - Measure ability to find all positive instances.
Aggregate metrics - Overview of all available metrics.

Geometric mean

ROC AUC

⌘I

Get started

Workspace setup

Observability

Data quality monitoring

Offline testing

Tests

Administration

Other resources

Definition

Taxonomy

Why it matters

Required columns

Test configuration examples

Get started

Workspace setup

Observability

Data quality monitoring

Offline testing

Tests

Administration

Other resources

​Definition

​Taxonomy

​Why it matters

​Required columns

​Test configuration examples

​Related

Definition

Taxonomy

Why it matters

Required columns

Test configuration examples

Related