Skip to main content

Image Annotation Loader

Load existing annotated image datasets and convert between supported formats


The Image Annotation Loader framework, part of the Data Utils library, enables you to load already annotated image datasets and upload them to the Clarifai platform.

This framework eliminates the hurdle of format incompatibility by supporting a wide range of industry-standard annotation formats; you can also use it to convert between different annotation formats.

This allows seamless integration of existing labeled datasets, regardless of the initial annotation tool used. It also facilitates a smooth upload process, enabling you to leverage the Clarifai platform's infrastructure for various use cases.

tip

Run the following command to clone the repository containing various examples for using the Data Utils library: git clone https://github.com/Clarifai/examples.git. After cloning, navigate to the Data_Utils folder to follow along with this tutorial.

Prerequisites

Install Python SDK and Data Utils

Install the latest version of the clarifai Python SDK package. Also, install the Data Utils library.

pip install --upgrade clarifai pip install clarifai-datautils

Install Extra Dependencies

The Image Annotation Loader requires additional libraries to function properly. To keep the core library lightweight, we moved these optional dependencies under annotations. (Python extras allow projects to specify additional dependencies for optional functionality.)

To install them, run:

pip install clarifai-datautils[annotations]

The above command also installs Datumaro, a dataset management framework essential for the Image Annotation Loader. Note that Datumaro requires a Rust compiler to be installed on your machine for a smooth installation.

Get a PAT

You need a PAT (Personal Access Token) key to authenticate your connection to the Clarifai platform. You can generate it in your Personal Settings page by navigating to the Security section.

Then, set it as an environment variable in your script.

import os os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE" # replace with your own PAT key

Create a Dataset

Create a dataset on the Clarifai platform to use for uploading your annotated image datasets.

from clarifai.client.app import App

app = App(app_id="YOUR_APP_ID_HERE", user_id="YOUR_USER_ID_HERE",pat="YOUR_PAT_HERE")
# Provide the dataset name as parameter in the create_dataset function
dataset = app.create_dataset(dataset_id="annotations_dataset")

Utility Features​

Supported Formats

You can retrieve and display all the annotation formats that the Image Annotation Loader framework supports.

from clarifai_datautils.image import ImageAnnotations

formats = ImageAnnotations.list_formats()

print("Supported formats:", formats)

Note that:

  • The ImageAnnotations class is imported from the clarifai_datautils.image package. This class provides utilities for working with annotated image datasets.
Output Example
Supported formats: ['coco_segmentation', 'voc_detection', 'yolo', 'cifar', 'coco_detection', 'cvat', 'imagenet', 'kitti', 'label_me', 'mnist', 'open_images', 'vgg_face2', 'lfw', 'cityscapes', 'ade20k2017', 'clarifai']

Here is a table that illustrates the annotation formats that the framework supports.

Annotation TypeFormatTASK
ImageNetimagenetclassification
CIFAR-10cifarclassification
MNISTmnistclassification
VGGFace2vgg_face2classification
LFWlfwclassification
PASCAL VOCvoc_detectiondetection
YOLOyolodetection
COCOcoco_detectiondetection
CVATcvatdetection
Kittikittidetection
LabelMelabel_medetection
Open Imagesopen_imagesdetection
Clarifaiclarifaidetection
COCO(segmentation)coco_segmentationsegmentation
Cityscapescityscapessegmentation
ADEade20k2017segmentation

Format Detection​

You can identify the annotation format that a dataset uses.

from clarifai_datautils.image import ImageAnnotations

# Defining dataset path
LOCAL_FOLDER_PATH = "./assets/annotation_formats/cifar-10"

# Detecting the format
format = ImageAnnotations.detect_format(LOCAL_FOLDER_PATH)

print(f"Detected format: {format}")

Note that:

  • The LOCAL_FOLDER_PATH parameter specifies the local directory path where the annotated dataset is stored.
Output Example
Detected format: cifar

Dataset Information​

You can get the details of a dataset you want to upload to the Clarifai platform.

from clarifai_datautils.image import ImageAnnotations

# Defining path and annotation format
LOCAL_FOLDER_PATH = "./assets/annotation_formats/imagenet"
ANNOTATION_FORMAT = "imagenet"

# Load dataset from the specified local folder
imagenet_dataset = ImageAnnotations.import_from(path=LOCAL_FOLDER_PATH, format=ANNOTATION_FORMAT)

# Get info about the dataset
info = imagenet_dataset.get_info()
print(info)
# Or, print the dataset details
#print(imagenet_dataset)

# Get detailed dataset information
print(f"Dataset size: {info['size']}")
print(f"Annotation count: {info['annotations_count']}")
print(f"Categories: {info['categories']}")

Note that:

  • The import_from method of the ImageAnnotations class is used to load the dataset from the specified local folder.

  • The format parameter specifies the format of the annotations. You can specify any supported annotation type.

Output Example
{'size': 19, 'source_path': './assets/annotation_formats/imagenet', 'annotated_items_count': 19, 'annotations_count': 19, 'sub_folders': ["default: # of items=19, # of annotated items=19, # of annotations=19, annotation types=['1']\n"], 'categories': ["1: ['bullfrog', 'goldfish', 'kingsnake', 'llama', 'tench']\n"]}
Dataset size: 19
Annotation count: 19
Categories: ["1: ['bullfrog', 'goldfish', 'kingsnake', 'llama', 'tench']\n"]

Uploading to Clarifai

To upload a pre-labeled dataset from your local environment to the Clarifai platform, you need to initialize the Dataset object and specify where the dataset will be uploaded — using the Python SDK library.

Then, call the upload_dataset() method on the Dataset object. This method takes a dataloader as an argument, which iterates over the dataset and yield data in a format compatible with the Clarifai platform.

from clarifai_datautils.image import ImageAnnotations
from clarifai.client.dataset import Dataset
import os

# Set the PAT key
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE" # replace with your own PAT key

# Defining path and annotation format
LOCAL_FOLDER_PATH = "./assets/annotation_formats/imagenet"
ANNOTATION_FORMAT = "imagenet"

# Load dataset from the specified local folder
imagenet_dataset = ImageAnnotations.import_from(path=LOCAL_FOLDER_PATH, format=ANNOTATION_FORMAT)

# Use the Python SDK library to upload
dataset = Dataset(user_id="YOUR_USER_ID_HERE", app_id="YOUR_APP_ID_HERE", dataset_id="YOUR_DATASET_ID_HERE")
# Or, initialize with a dataset URL; example: https://clarifai.com/john/my-app/datasets/annotations_dataset
#dataset = Dataset("DATASET_URL_HERE")

# Upload dataset using the dataloader
dataset.upload_dataset(dataloader=imagenet_dataset.dataloader)

Uploading From Kaggle to Clarifai

You can download a dataset from Kaggle and upload it to the Clarifai platform. To begin, install the opendatasets Python package, which enables direct dataset downloads from Kaggle.

pip install -q opendatasets

Next, download the dataset from Kaggle. For example, here is how you could download this dogs-vs-wolves dataset.

import opendatasets as od

# When prompted, insert your kaggle username and key
od.download("https://www.kaggle.com/datasets/harishvutukuri/dogs-vs-wolves")

Then, you can upload it to Clarifai.

from clarifai_datautils.image import ImageAnnotations
from clarifai.client.dataset import Dataset
import os

# Set the PAT key
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE" # replace with your own PAT key

# Defining path and annotation format
LOCAL_FOLDER_PATH = "./dogs-vs-wolves/data/"
ANNOTATION_FORMAT = "imagenet"

# Load dataset from the specified local folder
kaggle_imagenet_dataset = ImageAnnotations.import_from(path=LOCAL_FOLDER_PATH, format=ANNOTATION_FORMAT)

# Use the Python SDK library to upload
dataset = Dataset(user_id="YOUR_USER_ID_HERE", app_id="YOUR_APP_ID_HERE", dataset_id="YOUR_DATASET_ID_HERE")
# Or, initialize with a dataset URL; example: https://clarifai.com/john/my-app/datasets/annotations_dataset
#dataset = Dataset("DATASET_URL_HERE")

# Upload dataset using the dataloader
dataset.upload_dataset(dataloader=kaggle_imagenet_dataset.dataloader)

Convert Between Supported Formats

You can convert datasets between various annotation formats in your local development environment. For example, you can convert a dataset from COCO format to VOC format.

from clarifai_datautils.image import ImageAnnotations

# Defining import details
IMPORT_LOCAL_FOLDER_PATH = "./assets/annotation_formats/coco"
IMPORT_ANNOTATION_FORMAT = "coco_detection"

coco_dataset = ImageAnnotations.import_from(path=IMPORT_LOCAL_FOLDER_PATH, format=IMPORT_ANNOTATION_FORMAT)

# Defining export details
EXPORT_LOCAL_FOLDER_PATH = "./assets/annotation_formats/coco2voc"
EXPORT_ANNOTATION_FORMAT = "voc_detection"

coco_dataset_export = coco_dataset.export_to(EXPORT_LOCAL_FOLDER_PATH, EXPORT_ANNOTATION_FORMAT)

# save_images param will also save the images
#coco_dataset_export = coco_dataset.export_to(EXPORT_LOCAL_FOLDER_PATH, EXPORT_ANNOTATION_FORMAT, save_images=True)

Export a Clarifai Dataset to Another Format

You can export a dataset version from the Clarifai platform and convert it into various formats. This process involves two simple steps.

First, use the Clarifai SDK to export the dataset from the platform. The dataset will be downloaded as a ZIP file to your specified local directory. If the directory does not already exist, it will be automatically created for you.

from clarifai.client.dataset import Dataset
import os

# Set the PAT key
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE" # replace with your own PAT key

# Initialize Dataset object for Clarifai
dataset = Dataset(user_id="YOUR_USER_ID_HERE", app_id="YOUR_APP_ID_HERE", dataset_id="YOUR_DATASET_ID_HERE", dataset_version_id="YOUR_DATASET_VERSION_HERE")

# Specify the path where the exported dataset will be saved
# Optionally, you can also specify how the exported data will be split. Common splits include train, val, and test
dataset.export(save_path="clarifai_export.zip", split="train")

Next, extract the contents of the ZIP file to a folder. Then, pass the folder path to ImageAnnotations and convert the dataset into your desired format.

# Extract the zip file and pass the folder to ImageAnnotations

from clarifai_datautils.image import ImageAnnotations

# Defining import details
IMPORT_LOCAL_FOLDER_PATH = "./content/train"
IMPORT_ANNOTATION_FORMAT = "clarifai"

coco_dataset = ImageAnnotations.import_from(path=IMPORT_LOCAL_FOLDER_PATH, format=IMPORT_ANNOTATION_FORMAT)

# Defining export details
EXPORT_LOCAL_FOLDER_PATH = "./content"
EXPORT_ANNOTATION_FORMAT = "coco_detection"

coco_dataset_export = coco_dataset.export_to(EXPORT_LOCAL_FOLDER_PATH, EXPORT_ANNOTATION_FORMAT)

# save_images param will also save the images
#coco_dataset_export = coco_dataset.export_to(EXPORT_LOCAL_FOLDER_PATH, EXPORT_ANNOTATION_FORMAT, save_images=True)