Datasets Management
Use vector search to sort, rank, and retrieve images
Via the UI
Get Datasets
List of Datasets
To get a list of datasets, go to the individual page of your application. Then, select the Datasets option in the collapsible left sidebar.
You'll be redirected to the Datasets manager page, where you can get the already created datasets in your application.
Export a Dataset
You can export the inputs in your dataset, alongside their annotations, to an external storage system.
To do so, start by clicking the icon at the extreme end of a dataset field to select the format you want to use for exporting your dataset. From the list that drops down, you can select any of the following data formats:
-
Clarifai-Data-Protobuf, which is the default — This is a protocol buffer (protobuf) format used by Clarifai to structure data for machine learning tasks. Protocol buffers are a language-agnostic, platform-neutral mechanism for serializing structured data.
-
Clarifai-Data-JSON — This is a JSON (JavaScript Object Notation) format used by Clarifai to structure data. JSON is a text-based, lightweight data-interchange format that's easy to read and write.
-
COCO — This is the COCO (Common Objects in Context) format used by Clarifai to structure data. COCO is a large-scale, popular dataset used in machine learning and computer vision tasks.
After selecting your preferred export format, click the Generate button. Once the export file has been processed, the Generate button will become a Download button, which you can click to download your dataset.
The export feature only works after adding inputs to a dataset and creating and selecting a dataset version. Learn how to create dataset version here.
Dataset ID or Version ID
To copy a dataset ID to the clipboard, go to its individual page and click the copy button next to the dataset's ID.
To copy a dataset version ID to the clipboard, click the copy button next to the Selected Version search field.
Update Datasets
Update a Dataset Version
After making some changes to your dataset — such as adding or removing inputs, or adding or removing annotations — you may want to update your dataset version to reflect the changes.
To update a dataset version, go to the individual page of the dataset and select the Refresh Metrics option that drops down after clicking the ellipsis in the upper-right corner of the page.
Finally, click the Update status button.
The updated inputs and annotations in your dataset will be displayed in the Overview tab.
You can also choose the dataset version you'd like to use from the Selected Version drop-down list.
Update Cover Image
To update a dataset's cover image, click the Change cover image button. A window will appear that allows you to upload an image for the dataset.
Merge Datasets
You can merge datasets by transferring inputs and their annotations from a source dataset to a destination dataset. Note that this process does not remove the inputs from the source dataset; they remain intact while being duplicated to the destination.
Start by selecting the Inputs option in your app's collapsible left sidebar. You'll be redirected to the Inputs-Manager page, where the inputs in your app are displayed.
Next, navigate to the Datasets section and select the dataset from which you want to transfer inputs. Once selected, all the available inputs in the dataset will be displayed on the page.
To choose the inputs for transfer, hover over each of them and click the small empty box in the upper-left corner to select them.
- Mouse click: Selects a single item or input.
- Shift + mouse click: Selects a range of inputs between the first and last clicked item.
Next, click the Dataset... button that appears at the bottom section of the page.
The small window that pops up allows you to add or remove inputs from the selected datasets.
Select the Add option, which lets you add inputs to the destination dataset (the option is selected by default). Then, select the destination dataset from the Select Datasets search field.
If you select the Apply to all search results button, all the inputs that are visually similar to the one(s) you've initially selected will also be added. This allows you to merge datasets easily and fast.
If you want to create a new destination dataset:
- Click the plus sign (+) next to the Select Datasets search field.
- Type the new dataset name in the search field. The new name you've typed will appear underneath the search field.
- Click the Add new dataset button to create the dataset. The new dataset will be successfully added to your app and selected as a destination.
Finally, click the Add Inputs button at the bottom of the pop-up window to complete adding the selected inputs to the destination dataset.
Alternatively, you can remove inputs from a dataset by selecting the Remove option, selecting the desired dataset, and clicking the Remove Inputs button.
After merging the datasets, remember to update the dataset version of your destination dataset to ensure the latest version reflects the newly added inputs and annotations.
Delete Datasets
To delete a dataset, go to the individual page of the dataset and select the Delete Dataset option that drops down after clicking the ellipsis in the upper-right corner of the page.
Please proceed with extreme caution, as deleted datasets cannot be recovered.
Via the API
Before using the Python SDK, Node.js SDK, or any of our gRPC clients, ensure they are properly installed on your machine. Refer to their respective installation guides for instructions on how to install and initialize them.
List Datasets
You can list the datasets in your app.
- cURL
curl --location --request GET "https://api.clarifai.com/v2/users/YOUR_USER_ID_HERE/apps/YOUR_APP_ID_HERE/datasets?page=1&per_page=100" \
--header "Authorization: Key YOUR_PAT_HERE" \
--header "Content-Type: application/json"
List Dataset Versions
You can list all the versions associated with your dataset to view its update history and changes over time.
- cURL
curl --location -g --request GET "https://api.clarifai.com/v2/users/YOUR_USER_ID_HERE/apps/YOUR_APP_ID_HERE/datasets/YOUR_DATASET_ID_HERE/versions?page=1&per_page=100" \
--header "Authorization: Key YOUR_PAT_HERE" \
--header "Content-Type: application/json"
Get a Dataset
You can retrieve the details of a specific dataset by providing its ID.
- cURL
curl --location -g --request GET "https://api.clarifai.com/v2/users/YOUR_USER_ID_HERE/apps/YOUR_APP_ID_HERE/datasets/YOUR_DATASET_ID_HERE" \
--header "Authorization: Key YOUR_PAT_HERE" \
--header "Content-Type: application/json"
Get a Dataset Version
You can retrieve the details of a specific dataset version by providing its version ID.
- cURL
curl --location -g --request GET "https://api.clarifai.com/v2/users/YOUR_USER_ID_HERE/apps/YOUR_APP_ID_HERE/datasets/YOUR_DATASET_ID_HERE/versions/YOUR_DATASET_VERSION_ID_HERE" \
--header "Authorization: Key YOUR_PAT_HERE" \
--header "Content-Type: application/json"
Get a Dataset Input
You can retrieve an input in a dataset by specifying its ID.
- cURL
curl --location -g --request GET "https://api.clarifai.com/v2/users/YOUR_USER_ID_HERE/apps/YOUR_APP_ID_HERE/datasets/YOUR_DATASET_ID_HERE/inputs/YOUR_INPUT_ID_HERE" \
--header "Authorization: Key YOUR_PAT_HERE" \
--header "Content-Type: application/json"
List Dataset Inputs
You can list the inputs in a dataset by providing the dataset ID.
- Python SDK
- cURL
from clarifai.client.input import Inputs
# Replace your "user_id", "app_id", "pat", and "dataset_id"
input_obj = Inputs(user_id="YOUR_USER_ID_HERE", app_id="YOUR_APP_ID_HERE", pat="YOUR_PAT_HERE")
inputs_generator = input_obj.list_inputs(dataset_id="YOUR_DATASET_ID_HERE")
inputs = list(inputs_generator)
print(inputs)
curl --location -g --request GET "https://api.clarifai.com/v2/users/YOUR_USER_ID_HERE/apps/YOUR_APP_ID_HERE/datasets/YOUR_DATASET_ID_HERE/inputs?page=1&per_page=100" \
--header "Authorization: Key YOUR_PAT_HERE" \
--header "Content-Type: application/json"
Export Dataset
You can download your datasets in a compressed ZIP file format for easy storage, sharing, or offline access.
The clarifai-data-protobuf.zip
file can be downloaded from the export dataset section on the platform UI.
- Python SDK
from clarifai.client.dataset import Dataset
import os
os.environ["CLARIFAI_PAT"]="YOUR_PAT"
# The “clarifai-data-protobuf.zip” file can be downloaded from the dataset section in the portal.
Dataset().export(save_path='path to output.zip file', local_archive_path='path to clarifai-data-protobuf.zip file')
SDH Enabled Inputs Download
You can download inputs that have been enhanced or optimized using Secure Data Hosting (SDH) technology.
This feature leverages the power of SDH to deliver a faster, more efficient download experience — offering performance and flexibility tailored to modern computing needs.
- Python SDK
from clarifai.client.input import Inputs
input_obj = Inputs( user_id='user_id', app_id='test_app')
#listing inputs
input_generator = input_obj.list_inputs(page_no=1,per_page=1,input_type='image')
inputs_list = list(input_generator)
#downloading_inputs
input_bytes = input_obj.download_inputs(inputs_list)
with open('demo.jpg','wb') as f:
f.write(input_bytes[0])