Skip to main content

Uploading Data to Dataset

Learn how to upload data to a dataset


Uploading data to a dataset in Clarifai is essential for training and evaluating your machine learning models. Whether you're working with images, videos, text, audio, or other data types, Clarifai’s SDKs provide flexible and efficient methods to upload data from various sources.

tip

Click here to learn more about the different methods of uploading data to a dataset.

Customize Batch Size

When uploading inputs to the Clarifai platform, there are limits on the size and number of inputs per upload, as detailed here. However, by using methods from the Dataset class — such as Dataset.upload_from_folder(), Dataset.upload_from_url(), or Dataset.upload_dataset() — you can bypass these restrictions and efficiently upload larger volumes of inputs.

For example, when uploading images in bulk, such methods incrementally process and upload them in multiple batches, ensuring that each batch contains a maximum of 128 images and does not exceed 128MB in size – which ensures adherence to the upload restrictions.

You can also customize the batch_size variable, which allows for concurrent upload of inputs and annotations. For example, if your images folder exceeds 128MB, you can set the variable to ensure that each batch contains an appropriate number of images while staying within the 128MB per batch limit.

The default batch_size is set to 32, but you can customize it to any value between 1 (minimum) and 128 (maximum).

Here is an example:

dataset.upload_from_folder(folder_path='./images', input_type='image', labels=True, batch_size=50)

Upload Image

Simplify your image data upload process with the Clarifai API's DataLoader functionality. This versatile feature allows you to effortlessly upload image data in bulk, streamlining your workflow for enhanced efficiency. Whether you prefer uploading images directly from a folder or leveraging the convenience of a CSV format, our DataLoader seamlessly accommodates both methods.

Visit this page for more information.

from clarifai.client.dataset import Dataset


# Create a dataset object
dataset = Dataset(user_id="user_id", app_id="test_app", dataset_id="first_dataset",pat=”YOUR_PAT”)
#To upload without concepts(labels=False)
#upload data from folder
dataset.upload_from_folder(folder_path='./images', input_type='image', labels=True)
Output
Uploading inputs: 100%|██████████| 1/1 [00:04<00:00,  4.44s/it]

Upload Text

Leverage the power of the Clarifai API to seamlessly upload text data with our versatile dataloader. Whether you prefer the convenience of organizing your text data in folders or opt for the structured approach offered by the CSV format, our API accommodates both methods. By utilizing the dataloader, you can effortlessly streamline the process of uploading text data, ensuring a smooth integration into your workflow.

Visit this page for more information.

from clarifai.client.dataset import Dataset

# Create the dataset object
dataset = Dataset(user_id="user_id", app_id="test_app", dataset_id="first_dataset",pat=”YOUR_PAT”)
#To upload without concepts(labels=False)
# upload dataset from folder
dataset.upload_from_folder(folder_path='./data', input_type='text', labels=True)
Output
Uploading inputs: 100%|██████████| 1/1 [00:02<00:00,  2.68s/it]

Upload Audio

Seamlessly upload your audio datasets using the versatile dataloader feature, providing you with two convenient options: uploading audio files directly from a folder or utilizing the efficiency of a CSV format. This flexibility in data upload empowers you to effortlessly incorporate diverse audio datasets into your applications, ensuring a smooth and streamlined workflow.

Visit this page for more information.

from clarifai.client.dataset import Dataset


#Create a dataset object
dataset = Dataset(user_id="user_id", app_id="test_app", dataset_id="first_dataset",pat=”YOUR_PAT”)
#To upload without concepts(labels=False)
#Upload data from csv
dataset.upload_from_csv(csv_path='/Users/adithyansukumar/Desktop/data/test.csv', input_type='audio',csv_type='url', labels=True)
Output
Uploading inputs: 100%|██████████| 1/1 [00:03<00:00,  3.22s/it]

Upload Video

Elevate your multimedia analysis capabilities with the Clarifai SDKs, enabling you to effortlessly upload video data using the versatile dataloader. Seamlessly integrate video data into your projects by leveraging the dataloader, which supports uploading videos either directly from a folder or in the convenient CSV format.

Visit this page for more information.

from clarifai.client.dataset import Dataset


#Create a dataset object
dataset = Dataset(user_id="user_id", app_id="test_app", dataset_id="first_dataset",pat=”YOUR_PAT”)
#To upload without concepts(labels=False)
#Upload data from csv
dataset.upload_from_csv(csv_path='/Users/adithyansukumar/Desktop/data/test.csv', input_type='audio',csv_type='url', labels=True)
Output
Uploading inputs: 100%|██████████| 1/1 [00:03<00:00,  3.22s/it]

Upload Image with Annotation

Leverage the full potential of the Clarifai API by seamlessly uploading images with annotations. This advanced functionality allows you to enrich your image data by providing bounding box coordinates along with the image itself. By incorporating annotations, you enhance the depth and context of your visual data.

Visit this page for more information.

from clarifai.client.input import Inputs


url = "https://samples.clarifai.com/BarackObama.jpg"
#replace your "user_id", "app_id", "dataset_id".
input_object = Inputs(user_id="user_id", app_id="test_app",pat=”YOUR_PAT”)

# Upload image data from a specified URL with a unique input ID "bbox"
input_object.upload_from_url(input_id="bbox", image_url=url)

# Define bounding box coordinates for the annotation (left, top, right, bottom)
bbox_points = [.1, .1, .8, .9]

# Generate a bounding box annotation proto with specified label ("face") and bounding box coordinates
annotation = input_object.get_bbox_proto(input_id="bbox", label="face", bbox=bbox_points)

# Upload the generated annotation to associate with the previously uploaded image
input_object.upload_annotations([annotation])
Output
2024-01-19 16:16:28 INFO     clarifai.client.input:                                                    input.py:696

Annotations Uploaded

code: SUCCESS

description: "Ok"

req_id: "b5ca21ebc19cbbfe0c21706b4c1cd909"

Upload Image with Mask Annotation

This advanced functionality allows you to add mask to image data by providing polygon points as coordinates along with the image itself.

from clarifai.client.input import Inputs


url = "https://samples.clarifai.com/BarackObama.jpg"
#replace your "user_id", "app_id", "dataset_id".
input_object = Inputs(user_id="USER_ID", app_id="APP_ID",pat="YOUR_PAT")

# Upload image data from a specified URL with a unique input ID "mask"
input_object.upload_from_url(input_id="mask", image_url=url)

# Define mask points
mask = [[0.87, 0.66],[0.45 , 1.0], [0.82 ,0.42]]# polygon points

annotation = input_object.get_mask_proto(input_id="mask", label="obama", polygons=mask)

# Upload the generated annotation to associate with the previously uploaded image
input_object.upload_annotations([annotation])
Output
2024-07-10 08:23:07 INFO     clarifai.client.input:                                                    input.py:760
Annotations Uploaded
code: SUCCESS
description: "Ok"
req_id: "8816febaa1ce4ecab9fb3e3a1614a100"

INFO:clarifai.client.input:
Annotations Uploaded
code: SUCCESS
description: "Ok"
req_id: "8816febaa1ce4ecab9fb3e3a1614a100"

Upload Video with Annotation

Using our API, you have the capability to seamlessly upload videos enriched with annotations. This process involves more than just submitting the video file; you can enhance the contextual understanding by providing bounding box coordinates that precisely define the regions of interest within the video frames. By including this annotation data, you add valuable context to your video content.

Visit this page for more information.

from clarifai.client.input import Inputs

url = "https://samples.clarifai.com/beer.mp4"
#replace your "user_id", "app_id", "dataset_id".
input_object = Inputs(user_id="user_id", app_id="test_app",pat=”YOUR_PAT”)

# Upload an image from a URL with a specified input ID
input_object.upload_from_url(input_id="bbox", video_url=url)

# Define bounding box coordinates for annotation
bbox_points = [.1, .1, .8, .9]

# Create an annotation using the bounding box coordinates
annotation = input_object.get_bbox_proto(input_id="video_bbox", label="glass", bbox=bbox_points)

# Upload the annotation associated with the image
input_object.upload_annotations([annotation])
Output
[input_id: "video_bbox"

data {

regions {

region_info {

bounding_box {

top_row: 0.1

left_col: 0.1

bottom_row: 0.9

right_col: 0.8

}

}

data {

concepts {

id: "id-glass"

name: "glass"

value: 1

}

}

}

}]

Upload Text with Annotation

This functionality enables you to provide context and additional information alongside your text, enhancing the understanding and relevance of the uploaded content. Whether you're attaching metadata, categorizing content, or incorporating detailed annotations, the API effortlessly accommodates your specific needs. This feature not only streamlines the process of inputting annotated text but also enriches the dataset, allowing for more nuanced and accurate analysis.

Visit this page for more information.

from clarifai.client.input import Inputs

url = "https://samples.clarifai.com/featured-models/Llama2_Conversational-agent.txt"
concepts = ["mobile","camera"]
#replace your "user_id", "app_id", "dataset_id".
input_object = Inputs(user_id="user_id", app_id="test_app",pat=”YOUR_PAT”)
#Upload data from url with annotation
input_object.upload_from_url(input_id="text1",text_url=url, labels=concepts)
Output
2024-01-19 16:23:54 INFO     clarifai.client.input:                                                    input.py:669

Inputs Uploaded

code: SUCCESS

description: "Ok"

details: "All inputs successfully added"

req_id: "d5baa282c87ac0f91f0ef4083644ea82"

Batch Upload Image data while tracking status

With our robust capabilities, you can actively monitor the status of your dataset upload, ensuring transparency and control throughout the entire operation. This feature provides valuable visibility into the progress of your data transfer, allowing you to track and analyze the status effortlessly.

Visit this page for more information.

from clarifai.client.dataset import Dataset
from clarifai.datasets.upload.utils import load_module_dataloader


#replace your "user_id", "app_id", "dataset_id".
dataset = Dataset(user_id="user_id", app_id="test_app", dataset_id="first_dataset")
#create dataloader object
cifar_dataloader = load_module_dataloader('./image_classification/cifar10')
#set get_upload_status=True for showing upload status
dataset.upload_dataset(dataloader=cifar_dataloader,get_upload_status=True)
Output
Uploading Dataset: 100%|██████████| 1/1 [00:17<00:00, 17.99s/it]

Retry Upload From Log File

This feature is used to retry upload from logs for failed inputs. When using upload_dataset function the failed inputs can be logged into file and later can be used to resume the upload process.

info

Set retry_duplicates to True if you want to retry duplicate with new Input_id in current dataset.

#importing load_module_dataloader for calling the dataloader object in dataset.py in the local data folder
from clarifai.datasets.upload.utils import load_module_dataloader
from clarifai.client.dataset import Dataset


#replace your "user_id", "app_id", "dataset_id".
dataset = Dataset(user_id="user_id", app_id="app_id", dataset_id="dataset_id")

cifar_dataloader = load_module_dataloader('./image_classification/cifar10')

dataset.retry_upload_from_logs(dataloader=cifar_dataloader, log_file_path='path to log file', retry_duplicates=True, log_warnings=True)
Output
WARNING:root:Retrying upload for 9 duplicate inputs...

Uploading Dataset: 100%|██████████| 1/1 [00:24<00:00, 24.32s/it]