Model versioning

Rinse and repeat

Let’s take a step back and recall what we have done in the previous parts of the tutorial:

  1. First, we identified a slice of data where the model performance was far from ideal, namely the large error class where the model predicts Not urgent when in fact, the label is Urgent;
  2. We have used explainability and the functionalities of filtering & tagging to diagnose that a big chunk of those errors is related to the poor performance of our model for sentences from certain locations. In this case, the poor performance is a symptom of a training set that presents mostly Urgent sentences from a few locations;
  3. Finally, we have created a test that asserts this issue is fixed and not re-introduced in the future.

Now, it’s time to finally fix the issue and commit the new model version to the platform!

Solving the issue

In general, the solutions to the problems identified in a round of error analysis can vary and require a new iteration of model training.

The location issue we identified earlier in the tutorial is no different. As we have confirmed by inspecting our training set, the difference in model behavior is a symptom of certain locations being underrepresented in the training set.

To solve the issue, it is clear that we need more data for different locations to augment our training set. We can get additional samples by collecting more data or by generating synthetic data.

In our case, we augmented our training set by adding new rows with different locations for every Urgent sample that mentions a location. For example, if the original training set contained a sentence such as “We need help at Port-au-Prince”, the augmented training set should contain additional samples such as “We need help at Croix-des-Bouquet”, “We need help at Tabarre”, “We need help at La Gonarve”, among others, which are all locations that also appear on the training set.

Here is the link to a Colab notebook where we use the augmented training set to retrain the same gradient boosting model.

Running the notebook cells

Please run the notebook cells up to the point we evaluate the new model performance. How is our new model doing? Do you see the recall for the Urgent class?

Our model is doing better! The recall for the class Urgent jumped from 0.72 to 0.86, indicating that the location issue could have been mitigated. But are there additional problems that need action?

Let’s commit the new model version to the platform to find out.

Loading the project

We are going to add the new model version to the same “Urgent event classification” project. To do so, we will use our Python API again.

To retrieve the project in order to use the add_model method, we will need to instantiate the client with our API key and use the client’s load_project method.

Instantiating the client and loading the project

Create a new cell on the notebook we provided, right after the model evaluation part. On that cell, we will first instantiate the Openlayer Client, replacing ‘YOUR_API_KEY_HERE’
with your API key. Then, we will load the “Urgent event classification” project by specifying its name.

import openlayer

client = openlayer.OpenlayerClient('YOUR_API_KEY_HERE')
project = client.load_project(name='Urgent event classification')

Recall we mentioned that every project name should be unique within a user’s account. That’s why! So that you can always easily retrieve it to add new models or new datasets. Beware that the name is case-sensitive, so you need to specify the same name you did when you first created the project.

Model versioning

Now that we have retrieved the project, we can go ahead and use the project’s add_model method.

Let’s just briefly talk about model versioning.

Versioning models inside a project happens via the nameargument passed to the add_model method.

  • If add_modelis called with a model name already exists inside the project, Openlayer treats it as a new version of an existing model lineage;
  • On the other hand, if add_model is called with a model name that still doesn’t exist inside the project, Openlayer treats it as the first version of a new, separate, model lineage.

With such a mechanism, practitioners are free to organize their model’s inside a project the way they find more convenient. Some people might prefer to have a single model lineage inside a project, so every model version is organized in a single tree. Other people may prefer to keep separate trees for each model architecture, framework, or experiment type. All these options are valid, and you are free to choose according to your preferences.

In our case, we will upload a new version of the same model lineage. To do so, we will call the project’s add_model method with the name we already defined inside the project: “Gradient boosting classifier.”


Adding a new model version

Create a new cell on the notebook we provided, right after the project loading cell. On that cell, we will upload the new gradient boosting classifier to the project by calling the project's add_model method. We will include a commit message to describe the changes.

from openlayer.models import ModelType

model = project.add_model(
    class_names=["Not urgent", "Urgent"],
    name='Gradient boosting classifier',
    commit_message='Attempt to fix location issue for Urgent class',

That’s it!

You will notice that once the new model is uploaded, two things happen:

  1. A new run report is created. This is the entry door to possibly identifying and diagnosing issues with the new model;
  2. All tests created run automatically so that we can assert the issues we’d like to solve are indeed fixed and that we don’t have regressions.

As we can see, our new model version seems to be a step in the right direction, as both our tests have passed!

Model diff

A lot of the decisions in an ML project are comparative by nature, especially when it comes to deciding between model versions.

That’s what model diff is for!

With model diff, you can compare any two model versions that are inside a project. By doing so, you can evaluate if one model is better than the other going way beyond just the simple overall aggregate metrics. Furthermore, you are able to assess if there are unwanted regressions you haven’t spotted before and possibly discard the commit.

You can create a model diff by using the dropdowns available on the top of the project page. There, you can select two different model commits to compare side-by-side.

Let’s create a model diff between our two gradient boosting classifier versions.

As we can see, the new model version seems to be significantly better than the previous one when we consider the recall for the Urgent class. We got this improvement by making the most significant error class we had identified in the previous parts of the tutorial smaller. Moreover, by inspecting the new run report, it is possible to note that the location issue we observed was attenuated.

There is no free lunch. We can also observe that we sacrificed a bit of performance on the Not urgent class in this process. That seems to be a reasonable balance, given the application.

Of course, there is still a lot of room for improvement and that’s what the run report is there for. By following this tutorial, you have been exposed to the identify-diagnose-test-solve framework, which is extremely powerful when the goal is systematically improving ML models. Furthermore, you are now capable of using the tools provided by Openlayer to apply the framework to your own models and datasets.