Onboarding models and datasets
Now you are ready for the fun part!
We will explore the problem of churn classification with Openlayer.
To make your life easier, here is a link to a Colab notebook where you’ll have everything you need to follow this tutorial. We are going to use a modified version of the Churn Modelling dataset from Kaggle.
Training the model
The first part of the notebook looks like a standard training pipeline. It contains all the code to load the dataset, apply a one-hot encoding to the categorical features, and train a gradient-boosting classifier. We added comments to guide you along the way.
Train and evaluate the model
Please run the notebook cells up to the point where we evaluate the model’s performance. How is our model doing? Do you see the accuracy and F1?
Despite their popularity, aggregate metrics, such as accuracy, can be very misleading.
They play an important role, but provide little help when answering questions such as:
- What should we do next to improve the model? Change the model architecture? Collect more data? Preprocess the features differently? Or something else?
- Does our model perform equally well for different user groups? For example, what’s the performance for users from Spain? What about for users from France?
- Are there hidden biases in our model?
The list of questions we can ask is virtually infinite and staring at the accuracy won’t get us very far. To start answering these types of questions, we need to give our model and datasets a new home: the Openlayer platform.
Uploading models and datasets
We use the
openlayer Python API to interact with the Openlayer platform. That’s the role of the second part of the notebook! It demonstrates how to use the Python API to upload our model and datasets.
Upload the model and datasets to the platform
Please run the second part of the notebook. Don’t forget to replace the
YOUR_API_KEY_HEREin the cell that instantiates the Openlayer Client with your API key (which you can find in the platform's "Account settings".)
For now, don’t worry about all the details of the upload process. You can always refer to the API reference to learn about individual methods.
With our model and datasets on the platform, we are good to move to the next part of the tutorial!
Updated 2 months ago