Datasets Management
Use vector search to sort, rank, and retrieve images
Via the UI
Get Datasets
List of Datasets
To get a list of datasets, go to the individual page of your application. Then, select the Datasets option in the collapsible left sidebar.
You'll be redirected to the Datasets manager page, where you can get the already created datasets in your application.
Export a Dataset
You can export the inputs in your dataset, alongside their annotations, to an external storage system.
To do so, start by clicking the icon at the extreme end of a dataset field to select the format you want to use for exporting your dataset. From the list that drops down, you can select any of the following data formats:
-
Clarifai-Data-Protobuf, which is the default — This is a protocol buffer (protobuf) format used by Clarifai to structure data for machine learning tasks. Protocol buffers are a language-agnostic, platform-neutral mechanism for serializing structured data.
-
Clarifai-Data-JSON — This is a JSON (JavaScript Object Notation) format used by Clarifai to structure data. JSON is a text-based, lightweight data-interchange format that's easy to read and write.
-
COCO — This is the COCO (Common Objects in Context) format used by Clarifai to structure data. COCO is a large-scale, popular dataset used in machine learning and computer vision tasks.
After selecting your preferred export format, click the Generate button. Once the export file has been processed, the Generate button will become a Download button, which you can click to download your dataset.
The export feature only works after adding inputs to a dataset and creating and selecting a dataset version. Learn how to create dataset version here.
Dataset ID or Version ID
To copy a dataset ID to the clipboard, go to its individual page and click the copy button next to the dataset's ID.
To copy a dataset version ID to the clipboard, click the copy button next to the Selected Version search field.
Update Datasets
Update a Dataset Version
After making some changes to your dataset — such as adding or removing inputs, or adding or removing annotations — you may want to update your dataset version to reflect the changes.
To update a dataset version, go to the individual page of the dataset and select the Refresh Metrics option that drops down after clicking the ellipsis in the upper-right corner of the page.
Finally, click the Update status button.
The updated inputs and annotations in your dataset will be displayed in the Overview tab.
You can also choose the dataset version you'd like to use from the Selected Version drop-down list.
Update Cover Image
To update a dataset's cover image, click the Change cover image button. A window will appear that allows you to upload an image for the dataset.
Merge Datasets
You can merge datasets by transferring inputs and their annotations from a source dataset to a destination dataset. Note that this process does not remove the inputs from the source dataset; they remain intact while being duplicated to the destination.
Start by selecting the Inputs option in your app's collapsible left sidebar. You'll be redirected to the Inputs-Manager page, where the inputs in your app are displayed.
Next, navigate to the Datasets section and select the dataset from which you want to transfer inputs. Once selected, all the available inputs in the dataset will be displayed on the page.
To choose the inputs for transfer, hover over each of them and click the small empty box in the upper-left corner to select them.
- Mouse click: Selects a single item or input.
- Shift + mouse click: Selects a range of inputs between the first and last clicked item.
Next, click the Dataset... button that appears at the bottom section of the page.
The small window that pops up allows you to add or remove inputs from the selected datasets.
Select the Add option, which lets you add inputs to the destination dataset (the option is selected by default). Then, select the destination dataset from the Select Datasets search field.
If you select the Apply to all search results button, all the inputs that are visually similar to the one(s) you've initially selected will also be added. This allows you to merge datasets easily and fast.
If you want to create a new destination dataset:
- Click the plus sign (+) next to the Select Datasets search field.
- Type the new dataset name in the search field. The new name you've typed will appear underneath the search field.
- Click the Add new dataset button to create the dataset. The new dataset will be successfully added to your app and selected as a destination.
Finally, click the Add Inputs button at the bottom of the pop-up window to complete adding the selected inputs to the destination dataset.
Alternatively, you can remove inputs from a dataset by selecting the Remove option, selecting the desired dataset, and clicking the Remove Inputs button.
After merging the datasets, remember to update the dataset version of your destination dataset to ensure the latest version reflects the newly added inputs and annotations.
Delete Datasets
To delete a dataset, go to the individual page of the dataset and select the Delete Dataset option that drops down after clicking the ellipsis in the upper-right corner of the page.
Please proceed with extreme caution, as deleted datasets cannot be recovered.
Via the API
List Datasets
You can list the datasets in your app.
- cURL
curl --location --request GET "https://api.clarifai.com/v2/users/YOUR_USER_ID_HERE/apps/YOUR_APP_ID_HERE/datasets?page=1&per_page=100" \
--header "Authorization: Key YOUR_PAT_HERE" \
--header "Content-Type: application/json"
List Dataset Versions
You can list all the versions associated with your dataset to view its update history and changes over time.
- cURL
curl --location -g --request GET "https://api.clarifai.com/v2/users/YOUR_USER_ID_HERE/apps/YOUR_APP_ID_HERE/datasets/YOUR_DATASET_ID_HERE/versions?page=1&per_page=100" \
--header "Authorization: Key YOUR_PAT_HERE" \
--header "Content-Type: application/json"
Get a Dataset
You can retrieve the details of a specific dataset by providing its ID.
- cURL
curl --location -g --request GET "https://api.clarifai.com/v2/users/YOUR_USER_ID_HERE/apps/YOUR_APP_ID_HERE/datasets/YOUR_DATASET_ID_HERE" \
--header "Authorization: Key YOUR_PAT_HERE" \
--header "Content-Type: application/json"
Get a Dataset Version
You can retrieve the details of a specific dataset version by providing its version ID.
- cURL
curl --location -g --request GET "https://api.clarifai.com/v2/users/YOUR_USER_ID_HERE/apps/YOUR_APP_ID_HERE/datasets/YOUR_DATASET_ID_HERE/versions/YOUR_DATASET_VERSION_ID_HERE" \
--header "Authorization: Key YOUR_PAT_HERE" \
--header "Content-Type: application/json"
Get a Dataset Input
You can retrieve an input in a dataset by specifying its ID.
- cURL
curl --location -g --request GET "https://api.clarifai.com/v2/users/YOUR_USER_ID_HERE/apps/YOUR_APP_ID_HERE/datasets/YOUR_DATASET_ID_HERE/inputs/YOUR_INPUT_ID_HERE" \
--header "Authorization: Key YOUR_PAT_HERE" \
--header "Content-Type: application/json"
List Dataset Inputs
You can list the inputs in a dataset by providing the dataset ID.
- Python SDK
- cURL
from clarifai.client.input import Inputs
# Replace your "user_id", "app_id", "pat", and "dataset_id"
input_obj = Inputs(user_id="YOUR_USER_ID_HERE", app_id="YOUR_APP_ID_HERE", pat="YOUR_PAT_HERE")
inputs_generator = input_obj.list_inputs(dataset_id="YOUR_DATASET_ID_HERE")
inputs = list(inputs_generator)
print(inputs)
curl --location -g --request GET "https://api.clarifai.com/v2/users/YOUR_USER_ID_HERE/apps/YOUR_APP_ID_HERE/datasets/YOUR_DATASET_ID_HERE/inputs?page=1&per_page=100" \
--header "Authorization: Key YOUR_PAT_HERE" \
--header "Content-Type: application/json"
Export Dataset
You can download your datasets in a compressed ZIP file format for easy storage, sharing, or offline access.
The clarifai-data-protobuf.zip
file can be downloaded from the export dataset section on the platform UI.
- Python SDK
from clarifai.client.dataset import Dataset
import os
os.environ["CLARIFAI_PAT"]="YOUR_PAT"
# The “clarifai-data-protobuf.zip” file can be downloaded from the dataset section in the portal.
Dataset().export(save_path='path to output.zip file', local_archive_path='path to clarifai-data-protobuf.zip file')
SDH Enabled Inputs Download
You can download inputs that have been enhanced or optimized using Secure Data Hosting (SDH) technology.
This feature leverages the power of SDH to deliver a faster, more efficient download experience — offering performance and flexibility tailored to modern computing needs.
- Python SDK
from clarifai.client.input import Inputs
input_obj = Inputs( user_id='user_id', app_id='test_app')
#listing inputs
input_generator = input_obj.list_inputs(page_no=1,per_page=1,input_type='image')
inputs_list = list(input_generator)
#downloading_inputs
input_bytes = input_obj.download_inputs(inputs_list)
with open('demo.jpg','wb') as f:
f.write(input_bytes[0])
Patch a Dataset
You can apply patch operations to a dataset — merging, removing, or overwriting data. While all these actions support overwriting by default, they have specific behaviors when handling lists of objects.
- The
merge
action replaces akey:value
pair withkey:new_value
, or appends to an existing list. For dictionaries, it merges entries that share the sameid
field. - The
remove
action is only used to delete the dataset's cover image on the platform UI. - The
overwrite
action completely replaces an existing object with a new one.
Below is an example of patching a dataset to update its description, notes, and image URL.
- Python SDK
from clarifai.client.app import App
app = App(app_id="YOUR_APP_ID_HERE", user_id="YOUR_USER_ID_HERE", pat="YOUR_PAT_HERE")
# Update the dataset by merging the new description and notes
app.patch_dataset(dataset_id='YOUR_DATASET_ID_HERE', action='merge', description='Demo testing', notes="Hi Guys! This note is for Demo")
# Update the dataset's image URL with a new one
app.patch_dataset(dataset_id='YOUR_DATASET_ID_HERE', action='merge', image_url='https://samples.clarifai.com/metro-north.jpg')
# Remove the dataset's image by specifying the 'remove' action
app.patch_dataset(dataset_id='YOUR_DATASET_ID_HERE', action='remove', image_url='https://samples.clarifai.com/metro-north.jpg')
Below is an example of updating dataset's description and metadata.
- cURL
curl --location --request PATCH "https://api.clarifai.com/v2/users/YOUR_USER_ID_HERE/apps/YOUR_APP_ID_HERE/datasets/" \
--header "Authorization: Key YOUR_PAT_HERE" \
--header "Content-Type: application/json" \
--data-raw '{
"datasets": [
{
"id": "YOUR_DATASET_ID_HERE",
"description": "This is the new foo dataset",
"metadata": {
"foo": "bar"
}
}
],
"action": "overwrite"
}'
Patch With a Default Filter
Below is an example of updating a dataset with a default filter.
- cURL
curl --location --request PATCH "https://api.clarifai.com/v2/users/YOUR_USER_ID_HERE/apps/YOUR_APP_ID_HERE/datasets" \
--header "Authorization: Key YOUR_PAT_HERE" \
--header "Content-Type: application/json" \
--data-raw '{
"datasets": [
{
"id": "YOUR_DATASET_ID_HERE",
"description": "This is the new foo dataset",
"metadata": {
"foo": "bar"
},
"default_filter_id": "YOUR_DATASET_ID_FILTER_HERE"
}
],
"action": "overwrite"
}'
Patch Dataset Version
Below is an example of updating a dataset version's name.
- cURL
curl --location -g --request PATCH "https://api.clarifai.com/v2/users/YOUR_USER_ID_HERE/apps/YOUR_APP_ID_HERE/datasets/YOUR_DATASET_ID_HERE/versions" \
--header "Authorization: Key YOUR_PAT_HERE" \
--header "Content-Type: application/json" \
--data-raw '{
"dataset_versions": [
{
"id": "YOUR_DATASET_VERSION_ID_HERE",
"name": "dataset version updated name"
}
],
"action": "overwrite"
}'
Merge Datasets
Here’s an example of merging a dataset with the ID merge_dataset_id
into another dataset with the ID dataset_id
using the merge_dataset
feature from the Dataset
class.
Note that all inputs from the source dataset (merge_dataset_id
) will be added to the target dataset (dataset_id
).
- Python SDK
from clarifai.client.dataset import Dataset
# Replace your "user_id", "app_id", "pat", and "dataset_id"
dataset = Dataset(user_id="YOUR_USER_ID_HERE", app_id="YOUR_APP_ID_HERE", dataset_id="dataset_id", pat="YOUR_PAT_HERE")
dataset.merge_dataset(merge_dataset_id="merge_dataset_id")
Delete Dataset Inputs
You can delete the inputs in a dataset by specifying their IDs.
- cURL
curl --location -g --request DELETE "https://api.clarifai.com/v2/users/YOUR_USER_ID_HERE/apps/YOUR_APP_ID_HERE/datasets/YOUR_DATASET_ID_HERE/inputs/" \
--header "Authorization: Key YOUR_PAT_HERE" \
--header "Content-Type: application/json" \
--data-raw '{
"input_ids": ["YOUR_INPUT_ID_HERE"]
}'
Delete Dataset Version
You can easily remove specific versions of your dataset.
Be certain that you want to delete a particular dataset version as the operation cannot be undone.
- Python SDK
- Node.js SDK
- cURL
from clarifai.client.dataset import Dataset
#Create dataset object
dataset = Dataset(dataset_id='first_dataset', user_id='user_id', app_id='test_app')
#Delete dataset version
dataset.delete_version(version_id='dataset_version')
import { Dataset } from "clarifai-nodejs";
const dataset = new Dataset({
datasetId: "first_dataset",
authConfig: {
pat: process.env.CLARIFAI_PAT!,
userId: process.env.CLARIFAI_USER_ID!,
appId: "test_app",
},
});
await dataset.deleteVersion("1");
curl --location -g --request DELETE "https://api.clarifai.com/v2/users/YOUR_USER_ID_HERE/apps/YOUR_APP_ID_HERE/datasets/YOUR_DATASET_ID_HERE/versions" \
--header "Authorization: Key YOUR_PAT_HERE" \
--header "Content-Type: application/json" \
--data-raw '{
"dataset_version_ids": ["YOUR_DATASET_VERSION_ID_HERE"]
}'
Delete Dataset
You can easily remove a dataset by specifying its ID.
Be certain that you want to delete a particular dataset as the operation cannot be undone.
- Python SDK
- Node.js SDK
- cURL
from clarifai.client.app import App
app = App(app_id="test_app", user_id="user_id")
# Provide the dataset name as parameter in delete_dataset function
app.delete_dataset(dataset_id="demo_dataset")
import { App } from "clarifai-nodejs";
const app = new App({
authConfig: {
pat: process.env.CLARIFAI_PAT!,
userId: process.env.CLARIFAI_USER_ID!,
appId: "test_app",
},
});
await app.deleteDataset({ datasetId: "first_dataset" });
curl --location --request DELETE "https://api.clarifai.com/v2/users/YOUR_USER_ID_HERE/apps/YOUR_APP_ID_HERE/datasets" \
--header "Authorization: Key YOUR_PAT_HERE" \
--header "Content-Type: application/json" \
--data-raw '{
"dataset_ids": ["YOUR_DATASET_ID_HERE"]
}'