Learn how to use datasets stored in Amazon S3 with Openlayer
This guide explains how to use datasets stored in Amazon S3 bucket with Openlayer.Openlayer currently accepts datasets in two formats: pandas dataframes and CSV files. Consequently, the first step is to ensure that the data you wish to use is in one of these formats.
This is the recommended option if you can load your dataset into memory using a pandas dataframe. To retrieve your data from S3 and load it into a pandas dataframe, use the following code:
Copy
Ask AI
import boto3import pandas as pd# The AWS profile that has access to the S3 bucketAWS_PROFILE = "your_profile"# Information about the location of the dataset in the S3 bucketS3_BUCKET = "bucket_name"S3_KEY = "path/to/dataset.csv"session = boto3.session.Session( profile_name=AWS_PROFILE)s3 = session.client("s3")s3_data = s3.get_object( Bucket=S3_BUCKET, Key=S3_KEY)df = pd.read_csv(s3_data["Body"])
With the dataset as a pandas dataframe, you can upload it to the the platform either in
development or monitoring mode.
This is the recommended option if you prefer saving your dataset to disk instead of loading it to memory, as in the previous section. To retrieve your data from S3 and save it to disk, use the following code:
Copy
Ask AI
import boto3# The AWS profile that has access to the S3 bucketAWS_PROFILE = "your_profile"# Information about the location of the dataset in the S3 bucketS3_BUCKET = "bucket_name"S3_KEY = "path/to/dataset.csv"OUTPUT_FILE = "dataset.csv"session = boto3.session.Session( profile_name=AWS_PROFILE)s3 = session.client("s3")s3.download_file( Bucket=S3_BUCKET, Key=S3_KEY, Filename=OUTPUT_FILE)