Performance goals - full dataset

Navigate to the performance goal creation page

Before starting this part of the tutorial, please navigate to the performance goal creation page.

Let’s start with the definition.


What are performance goals?

Performance goals define the expected level of model performance for the entire validation set or specific subpopulations.

For instance, we may aim for our model to achieve a minimum F1 score of 0.8 on the validation set. We can also establish more specific goals, such as a precision target of 0.95 for messages that have the tokens “player” and "Canada".

For starters, an interesting performance goal to be created sets the expected model performance for the whole validation set. To do so, let’s first interpret the information displayed in the “Metrics” section.


Actionable insights

Our model performs better on the training set than on the validation set.

A higher training performance is expected, but the gap between the training and validation performance is helpful to understand if the model suffers from a bias or variance issue.

In our case, it seems like there is still room for improvement when it comes mitigating the bias. We can get closer to overfitting the training set and then start to apply regularization strategies, such as getting more data.

The “Metrics” section also has a graph view with additional information.

Navigate to the metric graph view

Once in the graph view, we can switch between different aggregate metrics and look at the metrics on a per-class basis using the dropdowns.

Creating a performance goal

You can click “Create goal” on the left-hand panel to create a performance goal for the whole validation set. By doing so, the goal creation model will show up and ask for a metric threshold.

Create a performance goal for the whole validation set

Click “Create goal” to create a performance goal for the whole validation set.

The information from the “Metrics” section is handy to help us choose a threshold for the whole validation set. The training performance is almost an upper bound for the performance we can expect by regularizing this modeling approach. Therefore, starting with a threshold slightly below the training performance is a reasonable choice.

Let’s use an F1 threshold of 0.9. You can also add multiple metric thresholds to the same goal.

Performance goal report

After creating the goal, we can see the goal card on the goals page. Let’s explore the information shown in performance goal reports.

As usual, the goal report provides information that helps us understand and solve the failed goal. In this case, not only the metrics and confusion matrix are available but also histograms for the model’s confidence distribution and label distribution. Comparing such distributions can help us understand the model’s behavior.


Another interesting piece of information available inside performance goal reports is row-level explainability. In broad strokes, explainability techniques help justify our model’s predictions.

Row-level explainability

To access the explainability scores in the performance goal report, navigate to the "Row analysis" tab. Click on different rows to view their explainability scores, which are powered by LIME.

Note that to access these row-level explainability, a full model must be uploaded, as demonstrated in the previous notebook. However, uploading just a shell model is sufficient to access all other insights.

Now that we understand performance goals, we can start breaking down the validation set into subpopulations and creating individual goals for them. That’s what we will do in the next part of the tutorial!