Managing Datasets
Learn how to interact with datasets using Clarifai SDKs
Effectively navigate the complexities of dataset management using the Clarifai SDKs, where a suite of robust tools empowers you to handle datasets with unparalleled efficiency. This comprehensive set of functionalities enables you to seamlessly organize, modify, and analyze your image data. Whether you are creating new datasets from scratch, updating existing ones with fresh information, or fine-tuning your data for optimal model performance, the SDK delivers a seamless and intuitive interface.
Our SDK goes beyond mere dataset manipulation; it offers a complete solution for every step of your data journey. With the ability to effortlessly upload new datasets, swiftly delete redundant ones, and manipulate existing datasets according to your specific needs, you gain full control over your data pipeline. This ensures a fluid and adaptable workflow, allowing you to focus on deriving meaningful insights and maximizing the potential of your image data.
Creating Datasets
Leverage the robust capabilities of the Clarifai SDKs to seamlessly generate datasets within your application. Through the API, you can initiate the creation of a dataset by specifying a unique dataset ID. This process empowers you to tailor your datasets to the specific needs of your application, ensuring a customized and efficient data.
Visit this link for more information.
- Python
- Typescript
from clarifai.client.app import App
app = App(app_id="test_app", user_id="user_id",pat=”YOUR_PAT”)
# Provide the dataset name as parameter in the create_dataset function
dataset = app.create_dataset(dataset_id="first_dataset")
Output
2024-01-19 14:22:26 INFO clarifai.client.app: app.py:310
Dataset created
code: SUCCESS
description: "Ok"
req_id: "1dd6eeb1a82394a9a92becee55faf50e"
import { App } from "clarifai-nodejs";
const app = new App({
authConfig: {
pat: process.env.CLARIFAI_PAT!,
userId: process.env.CLARIFAI_USER_ID!,
appId: "test_app",
},
});
const dataset = await app.createDataset({
datasetId: "first_dataset",
});
console.log(dataset);
Create a Dataset Version
Leveraging the power of the Clarifai SDKs, you can effortlessly generate a new dataset version tailored to your specific needs. This process involves utilizing the API to initiate the creation of a version for a designated dataset, identified by its unique dataset ID. By seamlessly integrating this functionality into your workflow, you gain the ability to manage and track different iterations of your datasets effectively.
Visit this page for more information.
- Python
- Typescript
from clarifai.client.dataset import Dataset
# Create a dataset object
dataset = Dataset(dataset_id='first_dataset', user_id='user_id', app_id='test_app',pat=’YOUR_PAT’)
# Create a new version of the dataset
dataset_version = dataset.create_version(description='dataset_version_description')
Output
2024-01-19 14:26:31 INFO clarifai: dataset.py:96
Dataset Version created
code: SUCCESS
description: "Ok"
req_id: "14802ff0826d6487dc454aa39877667e"
import { Dataset } from "clarifai-nodejs";
const datset = new Dataset({
datasetId: "first_dataset",
authConfig: {
pat: process.env.CLARIFAI_PAT!,
userId: process.env.CLARIFAI_USER_ID!,
appId: "test_app",
},
});
const datasetVersion = await datset.createVersion({
id: "1",
description: "new dataset version description",
});
console.log(datasetVersion);
Patch a Dataset
You can apply patch operations to a dataset — merging, removing, or overwriting data. While all these actions support overwriting by default, they have specific behaviors when handling lists of objects.
- The
merge
action replaces akey:value
pair withkey:new_value
, or appends to an existing list. For dictionaries, it merges entries that share the sameid
field. - The
remove
action is only used to delete the dataset's cover image on the platform UI. - The
overwrite
action completely replaces an existing object with a new one.
Below is an example of patching a dataset to update its description, notes, and image URL.
- Python
from clarifai.client.app import App
app = App(app_id="YOUR_APP_ID_HERE", user_id="YOUR_USER_ID_HERE", pat="YOUR_PAT_HERE")
# Update the dataset by merging the new description and notes
app.patch_dataset(dataset_id='YOUR_DATASET_ID_HERE', action='merge', description='Demo testing', notes="Hi Guys! This note is for Demo")
# Update the dataset's image URL with a new one
app.patch_dataset(dataset_id='YOUR_DATASET_ID_HERE', action='merge', image_url='https://samples.clarifai.com/metro-north.jpg')
# Remove the dataset's image by specifying the 'remove' action
app.patch_dataset(dataset_id='YOUR_DATASET_ID_HERE', action='remove', image_url='https://samples.clarifai.com/metro-north.jpg')
Upload Image
Simplify your image data upload process with the Clarifai API's DataLoader functionality. This versatile feature allows you to effortlessly upload image data in bulk, streamlining your workflow for enhanced efficiency. Whether you prefer uploading images directly from a folder or leveraging the convenience of a CSV format, our DataLoader seamlessly accommodates both methods.
Visit this page for more information.
- Python
- Typescript
from clarifai.client.dataset import Dataset
# Create a dataset object
dataset = Dataset(user_id="user_id", app_id="test_app", dataset_id="first_dataset",pat=”YOUR_PAT”)
#To upload without concepts(labels=False)
#upload data from folder
dataset.upload_from_folder(folder_path='./images', input_type='image', labels=True)
Output
Uploading inputs: 100%|██████████| 1/1 [00:04<00:00, 4.44s/it]
import { Dataset } from "clarifai-nodejs";
import path from "path";
const dataset = new Dataset({
datasetId: "first_dataset",
authConfig: {
pat: process.env.CLARIFAI_PAT,
userId: process.env.CLARIFAI_USER_ID,
appId: "test_app",
},
});
await dataset.uploadFromFolder({
folderPath: path.resolve(__dirname, "../../assets/voc/images"),
inputType: "image",
labels: true,
});
Upload Text
Leverage the power of the Clarifai API to seamlessly upload text data with our versatile dataloader. Whether you prefer the convenience of organizing your text data in folders or opt for the structured approach offered by the CSV format, our API accommodates both methods. By utilizing the dataloader, you can effortlessly streamline the process of uploading text data, ensuring a smooth integration into your workflow.
Visit this page for more information.
- Python
- Typescript
from clarifai.client.dataset import Dataset
# Create the dataset object
dataset = Dataset(user_id="user_id", app_id="test_app", dataset_id="first_dataset",pat=”YOUR_PAT”)
#To upload without concepts(labels=False)
# upload dataset from folder
dataset.upload_from_folder(folder_path='./data', input_type='text', labels=True)
Output
Uploading inputs: 100%|██████████| 1/1 [00:02<00:00, 2.68s/it]
import { Dataset } from "clarifai-nodejs";
import path from "path";
const dataset = new Dataset({
datasetId: "first_dataset",
authConfig: {
pat: process.env.CLARIFAI_PAT,
userId: process.env.CLARIFAI_USER_ID,
appId: "test_app",
},
});
await dataset.uploadFromFolder({
folderPath: path.resolve(__dirname, "../../assets"),
inputType: "text",
labels: true,
});
Upload Audio
Seamlessly upload your audio datasets using the versatile dataloader feature, providing you with two convenient options: uploading audio files directly from a folder or utilizing the efficiency of a CSV format. This flexibility in data upload empowers you to effortlessly incorporate diverse audio datasets into your applications, ensuring a smooth and streamlined workflow.
Visit this page for more information.
- Python
- Typescript
from clarifai.client.dataset import Dataset
#Create a dataset object
dataset = Dataset(user_id="user_id", app_id="test_app", dataset_id="first_dataset",pat=”YOUR_PAT”)
#To upload without concepts(labels=False)
#Upload data from csv
dataset.upload_from_csv(csv_path='/Users/adithyansukumar/Desktop/data/test.csv', input_type='audio',csv_type='url', labels=True)
Output
Uploading inputs: 100%|██████████| 1/1 [00:03<00:00, 3.22s/it]
import { Dataset } from "clarifai-nodejs";
import path from "path";
const dataset = new Dataset({
datasetId: "first_dataset",
authConfig: {
pat: process.env.CLARIFAI_PAT,
userId: process.env.CLARIFAI_USER_ID,
appId: "test_app",
},
});
await dataset.uploadFromCSV({
csvPath: path.resolve(__dirname, "../../assets/audio.csv"),
csvType: "file",
labels: true,
inputType: "audio",
});
Upload Video
Elevate your multimedia analysis capabilities with the Clarifai SDKs, enabling you to effortlessly upload video data using the versatile dataloader. Seamlessly integrate video data into your projects by leveraging the dataloader, which supports uploading videos either directly from a folder or in the convenient CSV format.
Visit this page for more information.
- Python
- Typescript
from clarifai.client.dataset import Dataset
#Create a dataset object
dataset = Dataset(user_id="user_id", app_id="test_app", dataset_id="first_dataset",pat=”YOUR_PAT”)
#To upload without concepts(labels=False)
#Upload data from csv
dataset.upload_from_csv(csv_path='/Users/adithyansukumar/Desktop/data/test.csv', input_type='audio',csv_type='url', labels=True)
Output
Uploading inputs: 100%|██████████| 1/1 [00:03<00:00, 3.22s/it]
import { Dataset } from "clarifai-nodejs";
import path from "path";
const dataset = new Dataset({
datasetId: "first_dataset",
authConfig: {
pat: process.env.CLARIFAI_PAT,
userId: process.env.CLARIFAI_USER_ID,
appId: "test_app",
},
});
await dataset.uploadFromCSV({
csvPath: path.resolve(__dirname, "../../assets/video.csv"),
csvType: "file",
inputType: "video",
labels: true,
});
Upload Image with Annotation
Leverage the full potential of the Clarifai API by seamlessly uploading images with annotations. This advanced functionality allows you to enrich your image data by providing bounding box coordinates along with the image itself. By incorporating annotations, you enhance the depth and context of your visual data.
Visit this page for more information.
- Python
- Typescript
from clarifai.client.input import Inputs
url = "https://samples.clarifai.com/BarackObama.jpg"
#replace your "user_id", "app_id", "dataset_id".
input_object = Inputs(user_id="user_id", app_id="test_app",pat=”YOUR_PAT”)
# Upload image data from a specified URL with a unique input ID "bbox"
input_object.upload_from_url(input_id="bbox", image_url=url)
# Define bounding box coordinates for the annotation (left, top, right, bottom)
bbox_points = [.1, .1, .8, .9]
# Generate a bounding box annotation proto with specified label ("face") and bounding box coordinates
annotation = input_object.get_bbox_proto(input_id="bbox", label="face", bbox=bbox_points)
# Upload the generated annotation to associate with the previously uploaded image
input_object.upload_annotations([annotation])
Output
2024-01-19 16:16:28 INFO clarifai.client.input: input.py:696
Annotations Uploaded
code: SUCCESS
description: "Ok"
req_id: "b5ca21ebc19cbbfe0c21706b4c1cd909"
import { Input } from "clarifai-nodejs";
const imageUrl = "https://samples.clarifai.com/BarackObama.jpg";
const input = new Input({
authConfig: {
userId: process.env.CLARIFAI_USER_ID,
pat: process.env.CLARIFAI_PAT,
appId: "test_app",
},
});
await input.uploadFromUrl({
inputId: "bbox",
imageUrl,
});
const bboxPoints = [0.1, 0.1, 0.8, 0.9];
const annotation = Input.getBboxProto({
inputId: "bbox",
label: "face",
bbox: bboxPoints,
});
await input.uploadAnnotations({
batchAnnot: [annotation],
});
Upload Image with Mask Annotation
This advanced functionality allows you to add mask to image data by providing polygon points as coordinates along with the image itself.
- Python
- Typescript
from clarifai.client.input import Inputs
url = "https://samples.clarifai.com/BarackObama.jpg"
#replace your "user_id", "app_id", "dataset_id".
input_object = Inputs(user_id="USER_ID", app_id="APP_ID",pat="YOUR_PAT")
# Upload image data from a specified URL with a unique input ID "mask"
input_object.upload_from_url(input_id="mask", image_url=url)
# Define mask points
mask = [[0.87, 0.66],[0.45 , 1.0], [0.82 ,0.42]]# polygon points
annotation = input_object.get_mask_proto(input_id="mask", label="obama", polygons=mask)
# Upload the generated annotation to associate with the previously uploaded image
input_object.upload_annotations([annotation])
Output
2024-07-10 08:23:07 INFO clarifai.client.input: input.py:760
Annotations Uploaded
code: SUCCESS
description: "Ok"
req_id: "8816febaa1ce4ecab9fb3e3a1614a100"
INFO:clarifai.client.input:
Annotations Uploaded
code: SUCCESS
description: "Ok"
req_id: "8816febaa1ce4ecab9fb3e3a1614a100"
import { Input, Polygon } from "clarifai-nodejs";
const imageUrl = "https://samples.clarifai.com/BarackObama.jpg";
const input = new Input({
authConfig: {
userId: process.env.CLARIFAI_USER_ID,
pat: process.env.CLARIFAI_PAT,
appId: process.env.CLARIFAI_APP_ID,
},
});
await input.uploadFromUrl({
inputId: "mask",
imageUrl,
});
const maskPoints:Polygon[] = [[[0.87, 0.66],[0.45 , 1.0], [0.82 ,0.42]]];
const annotation = Input.getMaskProto({
inputId: "mask",
label: "obama",
polygons: maskPoints,
});
await input.uploadAnnotations({
batchAnnot: [annotation],
});
Upload Video with Annotation
Using our API, you have the capability to seamlessly upload videos enriched with annotations. This process involves more than just submitting the video file; you can enhance the contextual understanding by providing bounding box coordinates that precisely define the regions of interest within the video frames. By including this annotation data, you add valuable context to your video content.
Visit this page for more information.
- Python
- Typescript
from clarifai.client.input import Inputs
url = "https://samples.clarifai.com/beer.mp4"
#replace your "user_id", "app_id", "dataset_id".
input_object = Inputs(user_id="user_id", app_id="test_app",pat=”YOUR_PAT”)
# Upload an image from a URL with a specified input ID
input_object.upload_from_url(input_id="bbox", video_url=url)
# Define bounding box coordinates for annotation
bbox_points = [.1, .1, .8, .9]
# Create an annotation using the bounding box coordinates
annotation = input_object.get_bbox_proto(input_id="video_bbox", label="glass", bbox=bbox_points)
# Upload the annotation associated with the image
input_object.upload_annotations([annotation])
Output
[input_id: "video_bbox"
data {
regions {
region_info {
bounding_box {
top_row: 0.1
left_col: 0.1
bottom_row: 0.9
right_col: 0.8
}
}
data {
concepts {
id: "id-glass"
name: "glass"
value: 1
}
}
}
}]
import { Input } from "clarifai-nodejs";
const videoUrl = "https://samples.clarifai.com/beer.mp4";
const input = new Input({
authConfig: {
userId: process.env.CLARIFAI_USER_ID,
pat: process.env.CLARIFAI_PAT,
appId: "test_app",
},
});
await input.uploadFromUrl({
inputId: "video-bbox",
videoUrl,
});
const bboxPoints = [0.1, 0.1, 0.8, 0.9];
const annotation = Input.getBboxProto({
inputId: "bbox",
label: "glass",
bbox: bboxPoints,
});
await input.uploadAnnotations({
batchAnnot: [annotation],
});
Upload Text with Annotation
This functionality enables you to provide context and additional information alongside your text, enhancing the understanding and relevance of the uploaded content. Whether you're attaching metadata, categorizing content, or incorporating detailed annotations, the API effortlessly accommodates your specific needs. This feature not only streamlines the process of inputting annotated text but also enriches the dataset, allowing for more nuanced and accurate analysis.
Visit this page for more information.
- Python
- Typescript
from clarifai.client.input import Inputs
url = "https://samples.clarifai.com/featured-models/Llama2_Conversational-agent.txt"
concepts = ["mobile","camera"]
#replace your "user_id", "app_id", "dataset_id".
input_object = Inputs(user_id="user_id", app_id="test_app",pat=”YOUR_PAT”)
#Upload data from url with annotation
input_object.upload_from_url(input_id="text1",text_url=url, labels=concepts)
Output
2024-01-19 16:23:54 INFO clarifai.client.input: input.py:669
Inputs Uploaded
code: SUCCESS
description: "Ok"
details: "All inputs successfully added"
req_id: "d5baa282c87ac0f91f0ef4083644ea82"
import { Input } from "clarifai-nodejs";
const textUrl =
"https://samples.clarifai.com/featured-models/Llama2_Conversational-agent.txt";
const concepts = ["mobile", "camera"];
const input = new Input({
authConfig: {
userId: process.env.CLARIFAI_USER_ID,
pat: process.env.CLARIFAI_PAT,
appId: "test_app",
},
});
await input.uploadFromUrl({
inputId: "text1",
textUrl,
labels: concepts,
});
Batch Upload Image data while tracking status
With our robust capabilities, you can actively monitor the status of your dataset upload, ensuring transparency and control throughout the entire operation. This feature provides valuable visibility into the progress of your data transfer, allowing you to track and analyze the status effortlessly.
Visit this page for more information.
- Python
from clarifai.client.dataset import Dataset
from clarifai.datasets.upload.utils import load_module_dataloader
#replace your "user_id", "app_id", "dataset_id".
dataset = Dataset(user_id="user_id", app_id="test_app", dataset_id="first_dataset")
#create dataloader object
cifar_dataloader = load_module_dataloader('./image_classification/cifar10')
#set get_upload_status=True for showing upload status
dataset.upload_dataset(dataloader=cifar_dataloader,get_upload_status=True)
Output
Uploading Dataset: 100%|██████████| 1/1 [00:17<00:00, 17.99s/it]
Retry Upload From Log File
This feature is used to retry upload from logs for failed inputs. When using upload_dataset
function the failed inputs can be logged into file and later can be used to resume the upload process.
Set retry_duplicates
to True
if you want to retry duplicate with new Input_id in current dataset.
- Python
#importing load_module_dataloader for calling the dataloader object in dataset.py in the local data folder
from clarifai.datasets.upload.utils import load_module_dataloader
from clarifai.client.dataset import Dataset
#replace your "user_id", "app_id", "dataset_id".
dataset = Dataset(user_id="user_id", app_id="app_id", dataset_id="dataset_id")
cifar_dataloader = load_module_dataloader('./image_classification/cifar10')
dataset.retry_upload_from_logs(dataloader=cifar_dataloader, log_file_path='path to log file', retry_duplicates=True, log_warnings=True)
Output
WARNING:root:Retrying upload for 9 duplicate inputs...
Uploading Dataset: 100%|██████████| 1/1 [00:24<00:00, 24.32s/it]
List Dataset Inputs
You can list the inputs in a dataset by providing the dataset ID.
- Python
from clarifai.client.input import Inputs
# Replace your "user_id", "app_id", "pat", and "dataset_id"
input_obj = Inputs(user_id="YOUR_USER_ID_HERE", app_id="YOUR_APP_ID_HERE", pat="YOUR_PAT_HERE")
inputs_generator = input_obj.list_inputs(dataset_id="YOUR_DATASET_ID_HERE")
inputs = list(inputs_generator)
print(inputs)
Merge Dataset
Here’s an example of merging a dataset with the ID merge_dataset_id
into another dataset with the ID dataset_id
using the merge_dataset
feature from the Dataset
class.
Note that all inputs from the source dataset (merge_dataset_id
) will be added to the target dataset (dataset_id
).
- Python
from clarifai.client.dataset import Dataset
# Replace your "user_id", "app_id", "pat", and "dataset_id"
dataset = Dataset(user_id="YOUR_USER_ID_HERE", app_id="YOUR_APP_ID_HERE", dataset_id="dataset_id", pat="YOUR_PAT_HERE")
dataset.merge_dataset(merge_dataset_id="merge_dataset_id")
Export Dataset
With our API, you can efficiently retrieve your datasets in a compressed zip file format, streamlining the process of data retrieval and enhancing your dataset management capabilities. Whether you're archiving your data for backup, sharing datasets across applications, or conducting in-depth analyses externally, the export functionality provides a convenient and efficient solution.
Visit this page for more information.
The clarifai-data-protobuf.zip
file can be downloaded from the dataset section in the portal.
- Python
from clarifai.client.dataset import Dataset
import os
os.environ["CLARIFAI_PAT"]="YOUR_PAT"
# The “clarifai-data-protobuf.zip” file can be downloaded from the dataset section in the portal.
Dataset().export(save_path='path to output.zip file', local_archive_path='path to clarifai-data-protobuf.zip file')
Output
2024-01-22 15:14:03 INFO clarifai.datasets.export.inputs_annotations: path: inputs_annotations.py:48
Downloads/clarifai-data-protobuf.zip
2024-01-22 15:14:03 INFO clarifai.datasets.export.inputs_annotations: path: inputs_annotations.py:48
Downloads/clarifai-data-protobuf.zip
INFO clarifai.datasets.export.inputs_annotations: Obtained file inputs_annotations.py:56
name list. 1 entries.
INFO clarifai.datasets.export.inputs_annotations: Obtained file inputs_annotations.py:56
name list. 1 entries.
Downloading Dataset: 100%|██████████| 13/13 [00:00<00:00, 6486.55it/s]
2024-01-22 15:14:04 INFO clarifai.datasets.export.inputs_annotations: Downloaded 0 inputs_annotations.py:221
inputs+annotations to output.zip
2024-01-22 15:14:04 INFO clarifai.datasets.export.inputs_annotations: Downloaded 0 inputs_annotations.py:221
inputs+annotations to output.zip
INFO clarifai.datasets.export.inputs_annotations: closing file inputs_annotations.py:92
objects.
INFO clarifai.datasets.export.inputs_annotations: closing file inputs_annotations.py:92
objects.
SDH Enabled Inputs Download
This functionality empowers users to seamlessly retrieve and download inputs that have been enhanced or optimized through SDH technology. By harnessing the power of SDH, this feature ensures a superior and efficient download experience for inputs, providing a level of performance and flexibility that aligns with modern computing demands.
Visit this page for more information.
- Python
from clarifai.client.input import Inputs
input_obj = Inputs( user_id='user_id', app_id='test_app')
#listing inputs
input_generator = input_obj.list_inputs(page_no=1,per_page=1,input_type='image')
inputs_list = list(input_generator)
#downloading_inputs
input_bytes = input_obj.download_inputs(inputs_list)
with open('demo.jpg','wb') as f:
f.write(input_bytes[0])
Delete Dataset Version
Within the Clarifai SDKs, you have the capability to precisely manage your datasets by removing specific versions with ease. This feature empowers you to selectively delete a particular version of your dataset through the API. Whether you are refining your dataset collection, optimizing storage resources, or ensuring data accuracy, this functionality provides a targeted and efficient solution.
Visit this page for more information.
Be certain that you want to delete a particular dataset version as the operation cannot be undone.
- Python
- Typescript
from clarifai.client.dataset import Dataset
#Create dataset object
dataset = Dataset(dataset_id='first_dataset', user_id='user_id', app_id='test_app')
#Delete dataset version
dataset.delete_version(version_id='dataset_version')
Output
Output
2024-01-22 15:22:25 INFO clarifai: dataset.py:124
Dataset Version Deleted
code: SUCCESS
description: "Ok"
details: "Dataset version \'a4b032e9083f4cbfbdfe5617b1a4d5e7\' deleted"
req_id: "1fac439af87c37dae27684fcbe49b80b"
import { Dataset } from "clarifai-nodejs";
const dataset = new Dataset({
datasetId: "first_dataset",
authConfig: {
pat: process.env.CLARIFAI_PAT!,
userId: process.env.CLARIFAI_USER_ID!,
appId: "test_app",
},
});
await dataset.deleteVersion("1");
Delete Dataset
Within the Clarifai SDKs, removing a dataset is a straightforward process enabled by the API. By supplying the unique identifier, known as the dataset ID, you gain the capability to seamlessly eliminate a dataset from your Clarifai account. It's essential to note that this functionality extends beyond a singular dataset removal; it also initiates the deletion of all associated dataset versions.
Visit this page for more information.
Be certain that you want to delete a particular dataset as the operation cannot be undone.
- Python
- Typescript
from clarifai.client.app import App
app = App(app_id="test_app", user_id="user_id")
# Provide the dataset name as parameter in delete_dataset function
app.delete_dataset(dataset_id="demo_dataset")
Output
2024-01-22 15:24:29 INFO clarifai.client.app: app.py:617
Dataset Deleted
code: SUCCESS
description: "Ok"
details: "Dataset \'demo_dataset\' deleted"
req_id: "97a8c49418da156d1b0227f9fa5f8dda"
import { App } from "clarifai-nodejs";
const app = new App({
authConfig: {
pat: process.env.CLARIFAI_PAT!,
userId: process.env.CLARIFAI_USER_ID!,
appId: "test_app",
},
});
await app.deleteDataset({ datasetId: "first_dataset" });