Aggregate metrics such as accuracy or precision are the first thing that comes to mind when we think of model evaluation.

Despite their usefulness, they provide little help when answering questions such as:

  • What should we do next to improve the model? Change the model architecture? Collect more data? Preprocess the features differently? Or something else?
  • Does our model perform equally well for different subpopulations?
  • Are there hidden biases in our model?
  • Are we really improving from one model version to the next?

The list of questions we can ask is virtually infinite and staring at the accuracy won’t get us very far. To start answering these types of questions, we need to give our model and datasets a new home: the Openlayer platform.

This walkthrough explores the development of a churn classification model with Openlayer.