Upload datasets and models
Dataset and model uploads are done using the openlayer
Python API. The flow usually goes as follows:
- Create or load a project;
- Add datasets to the project’s staging area;
- Add a model to the project’s staging area;
- Commit;
- Push the staging area to the platform.
Check out our examples gallery
If you’d like to see concrete examples of the process above for various popular ML frameworks, refer to our examples gallery GitHub repository.
1. Creating or loading a project
To create or load a project, you can run:
import openlayer
from openlayer.tasks import TaskType
client = openlayer.OpenlayerClient("YOUR_API_KEY_HERE")
project = client.create_or_load_project(
name="Fraud classification",
task_type=TaskType.TabularClassification, # Check the API reference for all task types
description="Evaluation of ML approaches to detect frauds"
)
Refer to the “Create and load projects” how-to guide for details.
2. Adding datasets
Datasets are added to a project’s staging area with the add_dataframe
or add_dataset
methods. The former is used if the dataset is loaded into memory as a pandas dataframe, while the latter is used if the dataset is saved to disk as a CSV file.
Here, we will show the use of add_dataframe
but the process is similar for add_dataset
.
Preparing the dataset config file
First, prepare a dataset config YAML.
The information contained in the YAML file will vary depending on the task type. In general, it contains the dataset’s columnNames
, label
(which indicates if it’s a training
or validation
set), and others. Refer to the API reference for all the details.
Add the dataset
Once the dataset_config.yaml
file is ready, add the dataset to a project’s staging area with:
project.add_dataframe(
dataset_df=df,
dataset_config_file_path="dataset_config.yaml",
)
3. Adding models
Models are added to a project’s staging area with the add_model
method.
When it comes to uploading models to the Openlayer platform, there are two options:
- The first one is to upload a shell model. Shell models are the most straightforward way to get started. They are comprised of metadata and all of the analysis on the platform is done via its predictions (which are uploaded with the datasets).
- The second option is to upload a full model, with artifacts. When a full model is uploaded, it becomes available in the platform and it is possible to use all the explainability techniques available, perform a series of robustness assessments with it, and others.
Preparing the model config file
Regardless of whether you are uploading a full model or a shell model, first, prepare a model config YAML file.
As with datasets, the information contained in the YAML file will vary depending on the task type. In general, it will contain the model name
, architectureType
, and others. Refer to the API reference for all the details.
Add a shell model
Shell models are made of just the config file. To add a shell model to a project’s staging area, run:
project.add_model(
model_config_file_path="model_config.yaml",
)
Add a full model
If, instead of a shell model, you are interested in uploading a full model, you should prepare a model package. A model package is nothing more than a folder with all the information necessary to run inference with the model.
Please refer to our examples gallery GitHub repository for code examples preparing a model package for various popular ML frameworks. Once the model package is ready, you can add a full model to the project’s staging area with:
project.add_model(
model_package_dir="model_package",
model_config_file_path="model_config.yaml",
sample_data = df[feature_names].iloc[:10, :] # Some sample data to test model inference
)
4. Committing
Once your project’s staging area contains all the resources you want to upload, add a commit message with:
project.commit("Initial commit!")
5. Pushing
Finally, push the resources from a project’s staging area to the Openlayer platform with:
project.push()
The upload will be successful if the resources in the staging area are compatible with one another. Two high-level compatibility checks performed are described below.
Staging area with dataset(s)-only
If you want to push a staging area that contains only a training set, only a validation set, or only these two datasets, you need to ensure that the datasets do not contain columns with the model's predictions.
The columns where the model's predictions are usually specified are predictionsColumnName
or predictionScoresColumnName
, depending on the task type.
There is a single exception to this rule. If you have only a validation set with model predictions, it's acceptable to upload it. This is because we'll assume that the predictions are for a previous version of the model that's already on the platform. However, if there's no previous model version, the upload will fail in the platform.
Staging area with a model
To push a staging area with a model, whether it's a shell model or a full model, you need to add training and validation sets or just a validation set. Additionally, the dataset(s) must contain a column with the model's predictions, which are usually specified in the columns predictionsColumnName
or predictionScoresColumnName
, depending on the task type.
Updated about 1 month ago