Performance goals - subpopulations

Despite being a natural first performance goal, aggregate metrics computed over the whole validation set provide a low-resolution picture of what’s going on. The performance of our model is, likely, not uniform across different cohorts of the data, as in the image below.

A better and more realistic approach to ultimately achieve a high model performance is to focus on improving the model one slice of data at a time. That’s the importance of creating performance goals for subpopulations.


What are subpopulations?

Subpopulations are data cohorts. They can be defined by token values (e.g., the data cohort that contains the token "stars" and not the token "NHL") or by other criteria (e.g., the data cohort known to be critical from domain expertise.)

The components of the performance goal creation page exist to help us break down the validation set into subpopulations. Among the components are the filters, the token cloud, and suggested subpopulations.


Filtering is the key enabler to exploring the different subpopulations. The sidebar on the left of the performance goal creation page (titled “Subpopulation explorer”) allows easy filtering by different criteria.

Adding filters

We might be interested in exploring the model performance for text mentioning some Canadian NHL teams. To do so, filter the rows that contain the token “Canadiens” or “Senators”.

After doing so, the other page components are updated. If you scroll down, you can inspect the rows that fall into that subpopulation. Furthermore, the filters that define this subpopulation are added to the “Subpopulation explorer” sidebar.


Actionable insights

This is an interesting subpopulation. From the “Metrics” section, we can see that there is a significant gap in performance if we compare it to the validation and training sets.

Once we are satisfied with a subpopulation, we can create a goal for it.

Create a subpopulation performance goal

Click “Create goal” under the filters to create a subpopulation performance goal.

We can set a threshold for the accuracy score of 1.0, because we really want to get these rows right.


Solving failing goals

The poor performance for this subpopulation seems to be related to the fact that there are fewer rows on the training set mentioning Canadian hockey teams than there are for American ones.

Here is a new Colab notebook where we strive to solve this issue and commit the new version to the platform.

Token cloud

At a glance, the token cloud shows the tokens associated with mistakes within the subpopulation being explored.

Sort the token cloud

The token cloud can be sorted by “Size” or by “Metric.” Explore the trade-off between error rate and how often such tokens appear in the subpopulation.

Add a token to the filter

After hovering over a token, the option to add it to the filter appears. This is a great way to break subpopulations down even further.

Suggested subpopulations

The suggested subpopulations are on the sidebar of the performance goal creation page. They are automatically found by the Openlayer platform and represent token combinations that result in significant error rates.

They can give us a head-start in the subpopulation exploration process, pointing in the direction of interesting token combinations that might be causing our model issues.