In the previous parts of the tutorial, we’ve been exploring the Report, which contains powerful tools for model validation and debugging. Now, we will step back and return to the task page to explore another aspect of error analysis: testing.
Returning to the Project page
Please return to the project page to continue the tutorial.
Let’s briefly talk about testing.
Test-driven development is common practice in software engineering. In ML, a field not that far away, tests are not as common as they should be.
Testing in ML (if done at all) is usually comprised of a single engineer writing a script to test a few cases that came up during a sloppy error analysis procedure. However, thorough testing goes a long way in ensuring model quality, helping practitioners catch mistakes proactively rather than retroactively.
The test suite is at your disposal to guarantee you are systematically moving in the right direction and that the same identified issues on this iteration of error analysis won't haunt your future model versions.
In this part of the tutorial, we will create a metric test. The goal is to assert that every future model we commit to the platform does not manifest the biased behavior we identified in the previous parts of the tutorial.
To check out the other testing possibilities, refer to the testing page.
In the project page, notice that there is a block with all the tests created. If you have been following the tutorial, yours will not display any tests yet.
To create your first test, click on the Create a test button in the upper right corner of the Tests section.
You will be redirected to the test creation page. The first thing you’ll see, at the top of the page is the test category to select from. For now, for tabular data, we offer Metric, Confidence, and Invariance tests.
If you would like to use other testing frameworks, feel free to reach out so that we can accommodate your needs!
Your first tests will be metric tests for our churn classification model.
Metric tests are extremely powerful. The idea is to assert that the model performance, measured by an aggregate metric, is above a specified threshold for a certain user group.
Let’s create a test that asserts the F1 score for the future models is above 0.75 for the female users.
First, select Metric on the Category panel on the Test page. After selecting it, the configuration section should appear below it.
Now, we can define the test’s configuration:
- Metric: the aggregate metric of choice. In our case, we will select the F1 score, which was the problematic metric we identified in the previous parts of the tutorial;
- Pass threshold: the value that needs to be surpassed for a test to be considered successful. In our case, we will set it to 0.75.
Finally, in the Data section, we can define the data cohort over which the aggregate metric is computed. Let’s select the data for female users using a filter over the data table.
After clicking on Create, we are all set!
To run the test, hover over the tests for the current model table and click on Run test.
By adding a test to a project, we make sure that every new model that we include is going to be tested. This will allow us to assert that the same problems won’t happen again.
It is also important that in the process of fixing our model for the female gender we also don’t regress in the performance for the male gender. For that, we need to create a second test. Create a second test that asserts the model F1 for the samples from male users is at least equal to 0.75.
In the next part of the tutorial, we will solve our model’s gender bias and commit the new version to the platform.
Updated 3 months ago