Datasets and models are uploaded with the openlayer Python client. The flow usually goes as follows:

1

Create or load a project

2

Add datasets to the project’s staging area

3

Add a model to the project’s staging area

4

Write a commit message

5

Push the staging area to the platform

Check out our examples gallery

If you’d like to see concrete examples of the process above for various popular ML frameworks, refer to our examples gallery GitHub repository.

Prerequisites

To follow along with this guide, you’ll need:

1. Create or load a project

To load an existing project, you can run:

import openlayer
from openlayer.tasks import TaskType

client = openlayer.OpenlayerClient("YOUR_API_KEY_HERE")

project = client.load_project(name="Fraud classification")

Refer to the Create and load projects guide for details.

2. Add datasets to the project’s staging area

Datasets are added to a project’s staging area with the add_dataframe or add_dataset methods. The former is used if the dataset is loaded into memory as a pandas dataframe, while the latter is used if the dataset is saved to disk as a CSV file.

Here, we will show the use of add_dataframe but the process is similar for add_dataset.

Preparing the dataset config

First, prepare a dataset config. The config can be provided as a Python dictionary or as a YAML file.

The information in the config will vary depending on the task type. Refer to the Write dataset configs guides for details.

Add the dataset

Once the config is ready, add the dataset to a project’s staging area with:

# If passed as a Python dict `dataset_config`
project.add_dataframe(
    dataset_df=df,
    dataset_config=dataset_config
)

# Or if passed as a YAML file
project.add_dataframe(
    dataset_df=df,
    dataset_config_file_path="dataset_config.yaml"
)

3. Add a model to the project’s staging area

Models are added to a project’s staging area with the add_model method.

When it comes to uploading models to the Openlayer platform, there are a few different options, explained in details in the Versioning topic guide. In short, one of three paths can be taken:

  1. Upload a shell model. Shell models have the most straightforward upload process. They are comprised of metadata and all of the analysis on the platform is done via their predictions (which are uploaded with the datasets).
  2. Upload a full model, with artifacts. When a full model is uploaded, it becomes available in the platform and it is possible to use all the explainability techniques available, perform a series of robustness assessments with it, and others.
  3. Use the direct-to-API upload, which is valid only for LLMs.

Refer to this section on the Versioning topic guide to decide which option is best for you.

Preparing the model config file

Regardless of the model upload path chosen, first, prepare a model config.

As with datasets, the config can be passed as a Python dictionary or as a YAML file, and the information contained in the YAML file will vary depending on the task type. Refer to the Write model configs guides for the details.

Add a shell model

Shell models are made of just the config. To add a shell model to a project’s staging area, run:

# If passed as a Python dict `model_config`
project.add_model(
    model_config=model_config,
)

# Or if passed as a YAML file
project.add_model(
    model_config_file_path="model_config.yaml",
)

Add a full model

If, instead of a shell model, you are interested in uploading a full model, you should prepare a model package. A model package is nothing more than a folder with all the information necessary to run inference with the model.

Please refer to our examples gallery GitHub repository for code examples preparing a model package for various popular ML frameworks. Once the model package is ready, you can add a full model to the project’s staging area with:

# If passed as a Python dict `model_config`
project.add_model(
    model_package_dir="model_package",
    model_config=model_config,
    sample_data = df[feature_names].iloc[:10, :]  # Some sample data to test model inference
)

# Or if passed as a YAML file
project.add_model(
    model_package_dir="model_package",
    model_config_file_path="model_config.yaml",
    sample_data = df[feature_names].iloc[:10, :]  # Some sample data to test model inference
)

4. Write a commit message

Once your project’s staging area contains all the resources you want to upload, add a commit message with:

project.commit("Initial commit!")

5. Push the staging area to the platform

Finally, push the resources from a project’s staging area to the Openlayer platform with:

project.push()

The upload will be successful if the resources in the staging area are compatible with one another. Two high-level compatibility checks performed are described below.

Staging area with dataset(s)-only

If you want to push a staging area that contains only a training set, only a validation set, or only these two datasets, you need to ensure that the datasets do not contain columns with the model’s predictions.

The columns where the model’s predictions are usually specified are predictionsColumnName or predictionScoresColumnName, depending on the task type.

There is a single exception to this rule. If you have only a validation set with model predictions, it’s acceptable to upload it. This is because we’ll assume that the predictions are for a previous version of the model that’s already on the platform. However, if there’s no previous model version, the upload will fail in the platform.

Staging area with a model

To push a staging area with a model, whether it’s a shell model or a full model, you need to add training and validation sets or just a validation set. Additionally, the dataset(s) must contain a column with the model’s predictions, which are usually specified in the columns predictionsColumnName or predictionScoresColumnName, depending on the task type.