Model versioning

Rinse and repeat

Let’s take a step back and recall what we have done in the previous parts of the tutorial:

  1. First, we identified a slice of data where the model performance was far from ideal, namely the large error class where the model predicts Exited when in fact, the label is Retained;
  2. We have used explainability and the functionalities of filtering & tagging to diagnose that a big chunk of those errors is related to the poor performance of our model for female users. In this case, the poor performance is a symptom of a biased training set;
  3. Finally, we have created a test that asserts this issue is fixed and not re-introduced in the future.

Now, it’s time to finally fix the issue and commit the new model version to the platform!

Solving the issue

In general, the solutions to the problems identified in a round of error analysis can vary and require a new iteration of model training.

The gender issue we identified earlier in the tutorial is no different. As we have confirmed by inspecting our training set, the difference in model behavior for users of the male and female gender is a symptom of female users being underrepresented in the training set.

To solve the issue, it is clear that we need more data for female users to augment our training set. We can get the additional data samples by collecting more data or by generating synthetic data. In our case, we have collected new additional data.

Here is the link to a Colab notebook where we make the issue with our old dataset clear, augment our training set with newly collected data, and retrain the same gradient boosting model.

Running the notebook cells

Please, run the notebook cells up to the point we evaluate the new model performance. How is our new model doing? Do you see the F1?

Our model is doing much better overall! The F1 score jumped from 0.64 to 0.79. But was the gender issue solved? Are there additional problems that need action?

Let’s commit the new model version to the platform to find out.

Loading the project

We are going to add the new model version to the same “Churn prediction” project. To do so, we will use our Python API again.

To retrieve the project in order to use the add_model method, we will need to instantiate the client with our API key and use the client’s load_project method.

Instantiating the client and loading the project

Create a new cell on the notebook we provided, right after the model evaluation part. On that cell, we will first instantiate the Openlayer Client, replacing YOUR_API_KEY_HERE
with your API key. Then, we will load the “Churn prediction” project by specifying its name.

import openlayer

client = openlayer.OpenlayerClient('YOUR_API_KEY_HERE')
project = client.load_project(name='Churn prediction')

Recall we mentioned that every project name should be unique within a user’s account. That’s why! So that you can always easily retrieve it to add new models or new datasets. Beware that the name is case-sensitive, so you need to specify the same name you did when you first created the project.

Model versioning

Now that we have retrieved the project, we can go ahead and use the project’s add_model method.

Let’s just briefly talk about model versioning.

Versioning models inside a project happens via the nameargument passed to the add_model method.

  • If add_modelis called with a model name already exists inside the project, Openlayer treats it as a new version of an existing model lineage;
  • On the other hand, if add_model is called with a model name that still doesn’t exist inside the project, Openlayer treats it as the first version of a new, separate, model lineage.

With such a mechanism, practitioners are free to organize their model’s inside a project the way they find more convenient. Some people might prefer to have a single model lineage inside a project, so every model version is organized in a single tree. Other people may prefer to keep separate trees for each model architecture, framework, or experiment type. All these options are valid, and you are free to choose according to your preferences.

In our case, we will upload a new version of the same model lineage. To do so, we will call the project’s add_model method with the name we already defined inside the project: “Churn Classifier.”

Adding a new model version

Create a new cell on the notebook we provided, right after the project loading cell. On that cell, we will upload the new gradient boosting classifier to the project by calling the project's add_model method. We will include a commit message to describe the changes.

from openlayer.models import ModelType

model = project.add_model(
    name='Churn Classifier',
    commit_message='Retrain on augmented training set with female users',

That’s it!

You will notice that once the new model is uploaded, two things happen:

  1. A new run report is created. This is the entry door to possibly identifying and diagnosing issues with the new model;
  2. All tests created run automatically so that we can assert the issues we’d like to solve are indeed fixed and that we don’t have regressions.

As we can see, our new model version is not only better overall, but the gender issue seems to be fixed, as both our tests have passed!

Model diff

A lot of the decisions in an ML project are comparative by nature, especially when it comes to deciding between model versions.

That’s what model diff is for!

With model diff, you can compare any two model versions that are inside a project. By doing so, you can evaluate if one model is better than the other going way beyond just the simple overall aggregate metrics. Furthermore, you are able to assess if there are unwanted regressions you haven’t spotted before and possibly discard the commit.

You can create a model diff by using the dropdowns available at the top of the project page. There, you can select two different model commits to compare side-by-side. Let’s create a model diff between our two gradient boosting classifier versions.

As we can see, the new model version seems to be significantly better than the previous one across multiple dimensions! Note how we significantly improved the overall F1 score of the model by shrinking the largest error class from one iteration to the next.

Of course, there is still room for improvement and that’s what the run report is there for. By following this tutorial, you have been exposed to the identify-diagnose-test-solve framework, which is extremely powerful when the goal is systematically improving ML models. Furthermore, you are now capable of using the tools provided by Openlayer to apply the framework to your own models and datasets.