Upload Data to Dataset via API
Learn how to upload data to a dataset via the API
Uploading data to a dataset in Clarifai is essential for training and evaluating your machine learning models.
Whether you're working with images, videos, text, audio, or other data types, we provide flexible and efficient methods to upload data from various sources.
Before using the Python SDK, Node.js SDK, or any of our gRPC clients, ensure they are properly installed on your machine. Refer to their respective installation guides for instructions on how to install and initialize them.
Click here to learn more about the different methods of uploading data to a dataset.
When uploading inputs to the Clarifai platform, there are limits on the size and number of inputs per upload, as detailed here. However, by using methods from the Dataset
class — such as Dataset.upload_from_folder()
, Dataset.upload_from_url()
, or Dataset.upload_dataset()
— you can bypass these restrictions and efficiently upload larger volumes of inputs.
For example, when uploading images in bulk, such methods incrementally process and upload them in multiple batches, ensuring that each batch contains a maximum of 128 images and does not exceed 128MB in size – which ensures adherence to the upload restrictions.
You can also customize the batch_size
variable, which allows for concurrent upload of inputs and annotations. For example, if your images folder exceeds 128MB, you can set the variable to ensure that each batch contains an appropriate number of images while staying within the 128MB per batch limit.
The default batch_size
is set to 32, but you can customize it to any value between 1 (minimum) and 128 (maximum).
Here is an example:
dataset.upload_from_folder(folder_path='./images', input_type='image', labels=True, batch_size=50)
Add Inputs to a Dataset
You can add inputs to a dataset by specifying their input IDs.
- cURL
curl --location --request POST "https://api.clarifai.com/v2/users/YOUR_USER_ID_HERE/apps/YOUR_APP_ID_HERE/datasets/YOUR_DATASET_ID_HERE/inputs" \
--header "Authorization: Key YOUR_PAT_HERE" \
--header "Content-Type: application/json" \
--data-raw '{
"dataset_inputs": [
{
"input": {
"id": "YOUR_INPUT_ID_HERE"
}
}
]
}'
Upload Image Data
You can upload image data in bulk either from a folder or by using a CSV file.
- Python SDK
- Node.js SDK
from clarifai.client.dataset import Dataset
# Create a dataset object
dataset = Dataset(user_id="user_id", app_id="test_app", dataset_id="first_dataset",pat=”YOUR_PAT”)
#To upload without concepts(labels=False)
#upload data from folder
dataset.upload_from_folder(folder_path='./images', input_type='image', labels=True)
import { Dataset } from "clarifai-nodejs";
import path from "path";
const dataset = new Dataset({
datasetId: "first_dataset",
authConfig: {
pat: process.env.CLARIFAI_PAT,
userId: process.env.CLARIFAI_USER_ID,
appId: "test_app",
},
});
await dataset.uploadFromFolder({
folderPath: path.resolve(__dirname, "../../assets/voc/images"),
inputType: "image",
labels: true,
});
Upload Text Data
You can upload text data in bulk either from a folder or by using a CSV file.
- Python SDK
- Node.js SDK
from clarifai.client.dataset import Dataset
# Create the dataset object
dataset = Dataset(user_id="user_id", app_id="test_app", dataset_id="first_dataset",pat=”YOUR_PAT”)
#To upload without concepts(labels=False)
# upload dataset from folder
dataset.upload_from_folder(folder_path='./data', input_type='text', labels=True)
import { Dataset } from "clarifai-nodejs";
import path from "path";
const dataset = new Dataset({
datasetId: "first_dataset",
authConfig: {
pat: process.env.CLARIFAI_PAT,
userId: process.env.CLARIFAI_USER_ID,
appId: "test_app",
},
});
await dataset.uploadFromFolder({
folderPath: path.resolve(__dirname, "../../assets"),
inputType: "text",
labels: true,
});
Upload Audio Data
You can upload audio data in bulk either from a folder or by using a CSV file.
- Python SDK
- Node.js SDK
from clarifai.client.dataset import Dataset
#Create a dataset object
dataset = Dataset(user_id="user_id", app_id="test_app", dataset_id="first_dataset",pat=”YOUR_PAT”)
#To upload without concepts(labels=False)
#Upload data from csv
dataset.upload_from_csv(csv_path='/Users/adithyansukumar/Desktop/data/test.csv', input_type='audio',csv_type='url', labels=True)
import { Dataset } from "clarifai-nodejs";
import path from "path";
const dataset = new Dataset({
datasetId: "first_dataset",
authConfig: {
pat: process.env.CLARIFAI_PAT,
userId: process.env.CLARIFAI_USER_ID,
appId: "test_app",
},
});
await dataset.uploadFromCSV({
csvPath: path.resolve(__dirname, "../../assets/audio.csv"),
csvType: "file",
labels: true,
inputType: "audio",
});
Upload Video Data
You can upload video data in bulk either from a folder or by using a CSV file.
- Python SDK
- Node.js SDK
from clarifai.client.dataset import Dataset
#Create a dataset object
dataset = Dataset(user_id="user_id", app_id="test_app", dataset_id="first_dataset",pat=”YOUR_PAT”)
#To upload without concepts(labels=False)
#Upload data from csv
dataset.upload_from_csv(csv_path='/Users/adithyansukumar/Desktop/data/test.csv', input_type='audio',csv_type='url', labels=True)
import { Dataset } from "clarifai-nodejs";
import path from "path";
const dataset = new Dataset({
datasetId: "first_dataset",
authConfig: {
pat: process.env.CLARIFAI_PAT,
userId: process.env.CLARIFAI_USER_ID,
appId: "test_app",
},
});
await dataset.uploadFromCSV({
csvPath: path.resolve(__dirname, "../../assets/video.csv"),
csvType: "file",
inputType: "video",
labels: true,
});
Upload Image Data With Annotations
You can upload image data along with bounding box annotations, allowing you to add depth and contextual information to your visual data.
- Python SDK
- Node.js SDK
from clarifai.client.input import Inputs
url = "https://samples.clarifai.com/BarackObama.jpg"
#replace your "user_id", "app_id", "dataset_id".
input_object = Inputs(user_id="user_id", app_id="test_app",pat=”YOUR_PAT”)
# Upload image data from a specified URL with a unique input ID "bbox"
input_object.upload_from_url(input_id="bbox", image_url=url)
# Define bounding box coordinates for the annotation (left, top, right, bottom)
bbox_points = [.1, .1, .8, .9]
# Generate a bounding box annotation proto with specified label ("face") and bounding box coordinates
annotation = input_object.get_bbox_proto(input_id="bbox", label="face", bbox=bbox_points)
# Upload the generated annotation to associate with the previously uploaded image
input_object.upload_annotations([annotation])
import { Input } from "clarifai-nodejs";
const imageUrl = "https://samples.clarifai.com/BarackObama.jpg";
const input = new Input({
authConfig: {
userId: process.env.CLARIFAI_USER_ID,
pat: process.env.CLARIFAI_PAT,
appId: "test_app",
},
});
await input.uploadFromUrl({
inputId: "bbox",
imageUrl,
});
const bboxPoints = [0.1, 0.1, 0.8, 0.9];
const annotation = Input.getBboxProto({
inputId: "bbox",
label: "face",
bbox: bboxPoints,
});
await input.uploadAnnotations({
batchAnnot: [annotation],
});
Upload Image Data With Mask Annotations
You can add masks to image data by providing polygon coordinates along with the image, enabling precise region-based annotations.
- Python SDK
- Node.js SDK
from clarifai.client.input import Inputs
url = "https://samples.clarifai.com/BarackObama.jpg"
#replace your "user_id", "app_id", "dataset_id".
input_object = Inputs(user_id="USER_ID", app_id="APP_ID",pat="YOUR_PAT")
# Upload image data from a specified URL with a unique input ID "mask"
input_object.upload_from_url(input_id="mask", image_url=url)
# Define mask points
mask = [[0.87, 0.66],[0.45 , 1.0], [0.82 ,0.42]]# polygon points
annotation = input_object.get_mask_proto(input_id="mask", label="obama", polygons=mask)
# Upload the generated annotation to associate with the previously uploaded image
input_object.upload_annotations([annotation])
import { Input, Polygon } from "clarifai-nodejs";
const imageUrl = "https://samples.clarifai.com/BarackObama.jpg";
const input = new Input({
authConfig: {
userId: process.env.CLARIFAI_USER_ID,
pat: process.env.CLARIFAI_PAT,
appId: process.env.CLARIFAI_APP_ID,
},
});
await input.uploadFromUrl({
inputId: "mask",
imageUrl,
});
const maskPoints:Polygon[] = [[[0.87, 0.66],[0.45 , 1.0], [0.82 ,0.42]]];
const annotation = Input.getMaskProto({
inputId: "mask",
label: "obama",
polygons: maskPoints,
});
await input.uploadAnnotations({
batchAnnot: [annotation],
});
Upload Video Data With Annotations
You can upload videos with enriched annotations by including bounding box coordinates that define regions of interest within individual frames, adding valuable context to your video content.
- Python SDK
- Node.js SDK
from clarifai.client.input import Inputs
url = "https://samples.clarifai.com/beer.mp4"
#replace your "user_id", "app_id", "dataset_id".
input_object = Inputs(user_id="user_id", app_id="test_app",pat=”YOUR_PAT”)
# Upload an image from a URL with a specified input ID
input_object.upload_from_url(input_id="bbox", video_url=url)
# Define bounding box coordinates for annotation
bbox_points = [.1, .1, .8, .9]
# Create an annotation using the bounding box coordinates
annotation = input_object.get_bbox_proto(input_id="video_bbox", label="glass", bbox=bbox_points)
# Upload the annotation associated with the image
input_object.upload_annotations([annotation])
import { Input } from "clarifai-nodejs";
const videoUrl = "https://samples.clarifai.com/beer.mp4";
const input = new Input({
authConfig: {
userId: process.env.CLARIFAI_USER_ID,
pat: process.env.CLARIFAI_PAT,
appId: "test_app",
},
});
await input.uploadFromUrl({
inputId: "video-bbox",
videoUrl,
});
const bboxPoints = [0.1, 0.1, 0.8, 0.9];
const annotation = Input.getBboxProto({
inputId: "bbox",
label: "glass",
bbox: bboxPoints,
});
await input.uploadAnnotations({
batchAnnot: [annotation],
});
Upload Text Data With Annotations
You can enrich your uploaded text data by attaching metadata, categorizing the content, or adding detailed annotations to enhance structure and context.
- Python SDK
- Node.js SDK
from clarifai.client.input import Inputs
url = "https://samples.clarifai.com/featured-models/Llama2_Conversational-agent.txt"
concepts = ["mobile","camera"]
#replace your "user_id", "app_id", "dataset_id".
input_object = Inputs(user_id="user_id", app_id="test_app",pat=”YOUR_PAT”)
#Upload data from url with annotation
input_object.upload_from_url(input_id="text1",text_url=url, labels=concepts)
import { Input } from "clarifai-nodejs";
const textUrl =
"https://samples.clarifai.com/featured-models/Llama2_Conversational-agent.txt";
const concepts = ["mobile", "camera"];
const input = new Input({
authConfig: {
userId: process.env.CLARIFAI_USER_ID,
pat: process.env.CLARIFAI_PAT,
appId: "test_app",
},
});
await input.uploadFromUrl({
inputId: "text1",
textUrl,
labels: concepts,
});
Batch Upload Image Data While Tracking Status
You can actively monitor the status of your dataset upload, giving you clear visibility into the progress and making it easy to track and analyze the data transfer process.
- Python SDK
from clarifai.client.dataset import Dataset
from clarifai.datasets.upload.utils import load_module_dataloader
#replace your "user_id", "app_id", "dataset_id".
dataset = Dataset(user_id="user_id", app_id="test_app", dataset_id="first_dataset")
#create dataloader object
cifar_dataloader = load_module_dataloader('./image_classification/cifar10')
#set get_upload_status=True for showing upload status
dataset.upload_dataset(dataloader=cifar_dataloader,get_upload_status=True)
Retry Upload From Log File
You can retry uploads for failed inputs directly from the logs. When using the upload_dataset
function, any failed inputs are automatically logged to a file, which can later be used to resume and retry the upload process seamlessly.
Set retry_duplicates
to True
if you want to retry duplicate with new Input_id in current dataset.
- Python SDK
#importing load_module_dataloader for calling the dataloader object in dataset.py in the local data folder
from clarifai.datasets.upload.utils import load_module_dataloader
from clarifai.client.dataset import Dataset
#replace your "user_id", "app_id", "dataset_id".
dataset = Dataset(user_id="user_id", app_id="app_id", dataset_id="dataset_id")
cifar_dataloader = load_module_dataloader('./image_classification/cifar10')
dataset.retry_upload_from_logs(dataloader=cifar_dataloader, log_file_path='path to log file', retry_duplicates=True, log_warnings=True)