Skip to main content

Datasets Management

Use vector search to sort, rank, and retrieve images


Via the UI

Get Datasets

List of Datasets

To get a list of datasets, go to the individual page of your application. Then, select the Datasets option in the collapsible left sidebar.

You'll be redirected to the Datasets manager page, where you can get the already created datasets in your application.

Export a Dataset

You can export the inputs in your dataset, alongside their annotations, to an external storage system.

To do so, start by clicking the icon at the extreme end of a dataset field to select the format you want to use for exporting your dataset. From the list that drops down, you can select any of the following data formats:

  • Clarifai-Data-Protobuf, which is the default — This is a protocol buffer (protobuf) format used by Clarifai to structure data for machine learning tasks. Protocol buffers are a language-agnostic, platform-neutral mechanism for serializing structured data.

  • Clarifai-Data-JSON — This is a JSON (JavaScript Object Notation) format used by Clarifai to structure data. JSON is a text-based, lightweight data-interchange format that's easy to read and write.

  • COCO — This is the COCO (Common Objects in Context) format used by Clarifai to structure data. COCO is a large-scale, popular dataset used in machine learning and computer vision tasks.

After selecting your preferred export format, click the Generate button. Once the export file has been processed, the Generate button will become a Download button, which you can click to download your dataset.

note

The export feature only works after adding inputs to a dataset and creating and selecting a dataset version. Learn how to create dataset version here.

Dataset ID or Version ID

To copy a dataset ID to the clipboard, go to its individual page and click the copy button next to the dataset's ID.

To copy a dataset version ID to the clipboard, click the copy button next to the Selected Version search field.

Update Datasets

Update a Dataset Version

After making some changes to your dataset — such as adding or removing inputs, or adding or removing annotations — you may want to update your dataset version to reflect the changes.

To update a dataset version, go to the individual page of the dataset and select the Refresh Metrics option that drops down after clicking the ellipsis in the upper-right corner of the page.

Finally, click the Update status button.

The updated inputs and annotations in your dataset will be displayed in the Overview tab.

You can also choose the dataset version you'd like to use from the Selected Version drop-down list.

Update Cover Image

To update a dataset's cover image, click the Change cover image button. A window will appear that allows you to upload an image for the dataset.

Merge Datasets

You can merge datasets by transferring inputs and their annotations from a source dataset to a destination dataset. Note that this process does not remove the inputs from the source dataset; they remain intact while being duplicated to the destination.

Start by selecting the Inputs option in your app's collapsible left sidebar. You'll be redirected to the Inputs-Manager page, where the inputs in your app are displayed.

Next, navigate to the Datasets section and select the dataset from which you want to transfer inputs. Once selected, all the available inputs in the dataset will be displayed on the page.

To choose the inputs for transfer, hover over each of them and click the small empty box in the upper-left corner to select them.

multi-select feature
  • Mouse click: Selects a single item or input.
  • Shift + mouse click: Selects a range of inputs between the first and last clicked item.

Next, click the Dataset... button that appears at the bottom section of the page.

The small window that pops up allows you to add or remove inputs from the selected datasets.

Select the Add option, which lets you add inputs to the destination dataset (the option is selected by default). Then, select the destination dataset from the Select Datasets search field.

tip

If you select the Apply to all search results button, all the inputs that are visually similar to the one(s) you've initially selected will also be added. This allows you to merge datasets easily and fast.

If you want to create a new destination dataset:

  • Click the plus sign (+) next to the Select Datasets search field.
  • Type the new dataset name in the search field. The new name you've typed will appear underneath the search field.
  • Click the Add new dataset button to create the dataset. The new dataset will be successfully added to your app and selected as a destination.

Finally, click the Add Inputs button at the bottom of the pop-up window to complete adding the selected inputs to the destination dataset.

Alternatively, you can remove inputs from a dataset by selecting the Remove option, selecting the desired dataset, and clicking the Remove Inputs button.

After merging the datasets, remember to update the dataset version of your destination dataset to ensure the latest version reflects the newly added inputs and annotations.

Delete Datasets

To delete a dataset, go to the individual page of the dataset and select the Delete Dataset option that drops down after clicking the ellipsis in the upper-right corner of the page.

caution

Please proceed with extreme caution, as deleted datasets cannot be recovered.

Via the API

List Datasets

You can list the datasets in your app.

curl --location --request GET "https://api.clarifai.com/v2/users/YOUR_USER_ID_HERE/apps/YOUR_APP_ID_HERE/datasets?page=1&per_page=100" \
--header "Authorization: Key YOUR_PAT_HERE" \
--header "Content-Type: application/json"

List Dataset Versions

You can list all the versions associated with your dataset to view its update history and changes over time.

curl --location -g --request GET "https://api.clarifai.com/v2/users/YOUR_USER_ID_HERE/apps/YOUR_APP_ID_HERE/datasets/YOUR_DATASET_ID_HERE/versions?page=1&per_page=100" \
--header "Authorization: Key YOUR_PAT_HERE" \
--header "Content-Type: application/json"

Get a Dataset

You can retrieve the details of a specific dataset by providing its ID.

curl --location -g --request GET "https://api.clarifai.com/v2/users/YOUR_USER_ID_HERE/apps/YOUR_APP_ID_HERE/datasets/YOUR_DATASET_ID_HERE" \
--header "Authorization: Key YOUR_PAT_HERE" \
--header "Content-Type: application/json"

Get a Dataset Version

You can retrieve the details of a specific dataset version by providing its version ID.

curl --location -g --request GET "https://api.clarifai.com/v2/users/YOUR_USER_ID_HERE/apps/YOUR_APP_ID_HERE/datasets/YOUR_DATASET_ID_HERE/versions/YOUR_DATASET_VERSION_ID_HERE" \
--header "Authorization: Key YOUR_PAT_HERE" \
--header "Content-Type: application/json"

Get a Dataset Input

You can retrieve an input in a dataset by specifying its ID.

curl --location -g --request GET "https://api.clarifai.com/v2/users/YOUR_USER_ID_HERE/apps/YOUR_APP_ID_HERE/datasets/YOUR_DATASET_ID_HERE/inputs/YOUR_INPUT_ID_HERE" \
--header "Authorization: Key YOUR_PAT_HERE" \
--header "Content-Type: application/json"

List Dataset Inputs

You can list the inputs in a dataset by providing the dataset ID.

from clarifai.client.input import Inputs

# Replace your "user_id", "app_id", "pat", and "dataset_id"
input_obj = Inputs(user_id="YOUR_USER_ID_HERE", app_id="YOUR_APP_ID_HERE", pat="YOUR_PAT_HERE")

inputs_generator = input_obj.list_inputs(dataset_id="YOUR_DATASET_ID_HERE")

inputs = list(inputs_generator)

print(inputs)

Export Dataset

You can download your datasets in a compressed ZIP file format for easy storage, sharing, or offline access.

info

The clarifai-data-protobuf.zip file can be downloaded from the export dataset section on the platform UI.

from clarifai.client.dataset import Dataset
import os

os.environ["CLARIFAI_PAT"]="YOUR_PAT"

# The “clarifai-data-protobuf.zip” file can be downloaded from the dataset section in the portal.
Dataset().export(save_path='path to output.zip file', local_archive_path='path to clarifai-data-protobuf.zip file')

SDH Enabled Inputs Download

You can download inputs that have been enhanced or optimized using Secure Data Hosting (SDH) technology.

This feature leverages the power of SDH to deliver a faster, more efficient download experience — offering performance and flexibility tailored to modern computing needs.

from clarifai.client.input import Inputs
input_obj = Inputs( user_id='user_id', app_id='test_app')

#listing inputs
input_generator = input_obj.list_inputs(page_no=1,per_page=1,input_type='image')
inputs_list = list(input_generator)

#downloading_inputs
input_bytes = input_obj.download_inputs(inputs_list)
with open('demo.jpg','wb') as f:
f.write(input_bytes[0])

Patch a Dataset

You can apply patch operations to a dataset — merging, removing, or overwriting data. While all these actions support overwriting by default, they have specific behaviors when handling lists of objects.

  • The merge action replaces a key:valuepair with key:new_value, or appends to an existing list. For dictionaries, it merges entries that share the same id field.
  • The remove action is only used to delete the dataset's cover image on the platform UI.
  • The overwrite action completely replaces an existing object with a new one.

Below is an example of patching a dataset to update its description, notes, and image URL.

from clarifai.client.app import App

app = App(app_id="YOUR_APP_ID_HERE", user_id="YOUR_USER_ID_HERE", pat="YOUR_PAT_HERE")

# Update the dataset by merging the new description and notes
app.patch_dataset(dataset_id='YOUR_DATASET_ID_HERE', action='merge', description='Demo testing', notes="Hi Guys! This note is for Demo")

# Update the dataset's image URL with a new one
app.patch_dataset(dataset_id='YOUR_DATASET_ID_HERE', action='merge', image_url='https://samples.clarifai.com/metro-north.jpg')

# Remove the dataset's image by specifying the 'remove' action
app.patch_dataset(dataset_id='YOUR_DATASET_ID_HERE', action='remove', image_url='https://samples.clarifai.com/metro-north.jpg')

Below is an example of updating dataset's description and metadata.

curl --location --request PATCH "https://api.clarifai.com/v2/users/YOUR_USER_ID_HERE/apps/YOUR_APP_ID_HERE/datasets/" \
--header "Authorization: Key YOUR_PAT_HERE" \
--header "Content-Type: application/json" \
--data-raw '{
"datasets": [
{
"id": "YOUR_DATASET_ID_HERE",
"description": "This is the new foo dataset",
"metadata": {
"foo": "bar"
}
}
],
"action": "overwrite"
}'

Patch With a Default Filter

Below is an example of updating a dataset with a default filter.

curl --location --request PATCH "https://api.clarifai.com/v2/users/YOUR_USER_ID_HERE/apps/YOUR_APP_ID_HERE/datasets" \
--header "Authorization: Key YOUR_PAT_HERE" \
--header "Content-Type: application/json" \
--data-raw '{
"datasets": [
{
"id": "YOUR_DATASET_ID_HERE",
"description": "This is the new foo dataset",
"metadata": {
"foo": "bar"
},
"default_filter_id": "YOUR_DATASET_ID_FILTER_HERE"
}
],
"action": "overwrite"
}'

Patch Dataset Version

Below is an example of updating a dataset version's name.

curl --location -g --request PATCH "https://api.clarifai.com/v2/users/YOUR_USER_ID_HERE/apps/YOUR_APP_ID_HERE/datasets/YOUR_DATASET_ID_HERE/versions" \
--header "Authorization: Key YOUR_PAT_HERE" \
--header "Content-Type: application/json" \
--data-raw '{
"dataset_versions": [
{
"id": "YOUR_DATASET_VERSION_ID_HERE",
"name": "dataset version updated name"
}
],
"action": "overwrite"
}'

Merge Datasets

Here’s an example of merging a dataset with the ID merge_dataset_id into another dataset with the ID dataset_id using the merge_dataset feature from the Dataset class.

Note that all inputs from the source dataset (merge_dataset_id) will be added to the target dataset (dataset_id).

from clarifai.client.dataset import Dataset

# Replace your "user_id", "app_id", "pat", and "dataset_id"
dataset = Dataset(user_id="YOUR_USER_ID_HERE", app_id="YOUR_APP_ID_HERE", dataset_id="dataset_id", pat="YOUR_PAT_HERE")

dataset.merge_dataset(merge_dataset_id="merge_dataset_id")

Delete Dataset Inputs

You can delete the inputs in a dataset by specifying their IDs.

curl --location -g --request DELETE "https://api.clarifai.com/v2/users/YOUR_USER_ID_HERE/apps/YOUR_APP_ID_HERE/datasets/YOUR_DATASET_ID_HERE/inputs/" \
--header "Authorization: Key YOUR_PAT_HERE" \
--header "Content-Type: application/json" \
--data-raw '{
"input_ids": ["YOUR_INPUT_ID_HERE"]
}'

Delete Dataset Version

You can easily remove specific versions of your dataset.

caution

Be certain that you want to delete a particular dataset version as the operation cannot be undone.

from clarifai.client.dataset import Dataset


#Create dataset object
dataset = Dataset(dataset_id='first_dataset', user_id='user_id', app_id='test_app')
#Delete dataset version
dataset.delete_version(version_id='dataset_version')

Delete Dataset

You can easily remove a dataset by specifying its ID.

caution

Be certain that you want to delete a particular dataset as the operation cannot be undone.

from clarifai.client.app import App

app = App(app_id="test_app", user_id="user_id")
# Provide the dataset name as parameter in delete_dataset function
app.delete_dataset(dataset_id="demo_dataset")