Image Annotation Loader
Load existing annotated image datasets and convert between supported formats
The Image Annotation Loader framework, part of the Data Utils library, enables you to load already annotated image datasets and upload them to the Clarifai platform.
This framework eliminates the hurdle of format incompatibility by supporting a wide range of industry-standard annotation formats; you can also use it to convert between different annotation formats.
This allows seamless integration of existing labeled datasets, regardless of the initial annotation tool used. It also facilitates a smooth upload process, enabling you to leverage the Clarifai platform's infrastructure for various use cases.
Run the following command to clone the repository containing various examples for using the Data Utils library: git clone https://github.com/Clarifai/examples.git
. After cloning, navigate to the Data_Utils folder to follow along with this tutorial.
Prerequisites
Install Python SDK and Data Utils
Install the latest version of the clarifai
Python SDK package. Also, install the Data Utils library.
- Bash
pip install --upgrade clarifai
pip install clarifai-datautils
Install Extra Dependencies
The Image Annotation Loader requires additional libraries to function properly. To keep the core library lightweight, we moved these optional dependencies under annotations
. (Python extras allow projects to specify additional dependencies for optional functionality.)
To install them, run:
- Bash
pip install clarifai-datautils[annotations]
The above command also installs Datumaro, a dataset management framework essential for the Image Annotation Loader. Note that Datumaro requires a Rust compiler to be installed on your machine for a smooth installation.
Get a PAT
You need a PAT (Personal Access Token) key to authenticate your connection to the Clarifai platform. You can generate it in your Personal Settings page by navigating to the Security section.
Then, set it as an environment variable in your script.
- Python
import os
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE" # replace with your own PAT key
Create a Dataset
Create a dataset on the Clarifai platform to use for uploading your annotated image datasets.
- Python
from clarifai.client.app import App
app = App(app_id="YOUR_APP_ID_HERE", user_id="YOUR_USER_ID_HERE",pat="YOUR_PAT_HERE")
# Provide the dataset name as parameter in the create_dataset function
dataset = app.create_dataset(dataset_id="annotations_dataset")
Utility Features
Supported Formats
You can retrieve and display all the annotation formats that the Image Annotation Loader framework supports.
- Python
from clarifai_datautils.image import ImageAnnotations
formats = ImageAnnotations.list_formats()
print("Supported formats:", formats)
Note that:
- The
ImageAnnotations
class is imported from theclarifai_datautils.image
package. This class provides utilities for working with annotated image datasets.
Output Example
Supported formats: ['coco_segmentation', 'voc_detection', 'yolo', 'cifar', 'coco_detection', 'cvat', 'imagenet', 'kitti', 'label_me', 'mnist', 'open_images', 'vgg_face2', 'lfw', 'cityscapes', 'ade20k2017', 'clarifai']
Here is a table that illustrates the annotation formats that the framework supports.
Annotation Type | Format | TASK |
---|---|---|
ImageNet | imagenet | classification |
CIFAR-10 | cifar | classification |
MNIST | mnist | classification |
VGGFace2 | vgg_face2 | classification |
LFW | lfw | classification |
PASCAL VOC | voc_detection | detection |
YOLO | yolo | detection |
COCO | coco_detection | detection |
CVAT | cvat | detection |
Kitti | kitti | detection |
LabelMe | label_me | detection |
Open Images | open_images | detection |
Clarifai | clarifai | detection |
COCO(segmentation) | coco_segmentation | segmentation |
Cityscapes | cityscapes | segmentation |
ADE | ade20k2017 | segmentation |
Format Detection
You can identify the annotation format that a dataset uses.
- Python
from clarifai_datautils.image import ImageAnnotations
# Defining dataset path
LOCAL_FOLDER_PATH = "./assets/annotation_formats/cifar-10"
# Detecting the format
format = ImageAnnotations.detect_format(LOCAL_FOLDER_PATH)
print(f"Detected format: {format}")
Note that:
- The
LOCAL_FOLDER_PATH
parameter specifies the local directory path where the annotated dataset is stored.
Output Example
Detected format: cifar
Dataset Information
You can get the details of a dataset you want to upload to the Clarifai platform.
- Python
from clarifai_datautils.image import ImageAnnotations
# Defining path and annotation format
LOCAL_FOLDER_PATH = "./assets/annotation_formats/imagenet"
ANNOTATION_FORMAT = "imagenet"
# Load dataset from the specified local folder
imagenet_dataset = ImageAnnotations.import_from(path=LOCAL_FOLDER_PATH, format=ANNOTATION_FORMAT)
# Get info about the dataset
info = imagenet_dataset.get_info()
print(info)
# Or, print the dataset details
#print(imagenet_dataset)
# Get detailed dataset information
print(f"Dataset size: {info['size']}")
print(f"Annotation count: {info['annotations_count']}")
print(f"Categories: {info['categories']}")
Note that:
-
The
import_from
method of theImageAnnotations
class is used to load the dataset from the specified local folder. -
The
format
parameter specifies the format of the annotations. You can specify any supported annotation type.
Output Example
{'size': 19, 'source_path': './assets/annotation_formats/imagenet', 'annotated_items_count': 19, 'annotations_count': 19, 'sub_folders': ["default: # of items=19, # of annotated items=19, # of annotations=19, annotation types=['1']\n"], 'categories': ["1: ['bullfrog', 'goldfish', 'kingsnake', 'llama', 'tench']\n"]}
Dataset size: 19
Annotation count: 19
Categories: ["1: ['bullfrog', 'goldfish', 'kingsnake', 'llama', 'tench']\n"]
Uploading to Clarifai
To upload a pre-labeled dataset from your local environment to the Clarifai platform, you need to initialize the Dataset
object and specify where the dataset will be uploaded — using the Python SDK library.
Then, call the upload_dataset()
method on the Dataset
object. This method takes a dataloader
as an argument, which iterates over the dataset and yield data in a format compatible with the Clarifai platform.
- Python
from clarifai_datautils.image import ImageAnnotations
from clarifai.client.dataset import Dataset
import os
# Set the PAT key
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE" # replace with your own PAT key
# Defining path and annotation format
LOCAL_FOLDER_PATH = "./assets/annotation_formats/imagenet"
ANNOTATION_FORMAT = "imagenet"
# Load dataset from the specified local folder
imagenet_dataset = ImageAnnotations.import_from(path=LOCAL_FOLDER_PATH, format=ANNOTATION_FORMAT)
# Use the Python SDK library to upload
dataset = Dataset(user_id="YOUR_USER_ID_HERE", app_id="YOUR_APP_ID_HERE", dataset_id="YOUR_DATASET_ID_HERE")
# Or, initialize with a dataset URL; example: https://clarifai.com/john/my-app/datasets/annotations_dataset
#dataset = Dataset("DATASET_URL_HERE")
# Upload dataset using the dataloader
dataset.upload_dataset(dataloader=imagenet_dataset.dataloader)
Uploading From Kaggle to Clarifai
You can download a dataset from Kaggle and upload it to the Clarifai platform. To begin, install the opendatasets
Python package, which enables direct dataset downloads from Kaggle.
- Bash
pip install -q opendatasets
Next, download the dataset from Kaggle. For example, here is how you could download this dogs-vs-wolves dataset.
- Python
import opendatasets as od
# When prompted, insert your kaggle username and key
od.download("https://www.kaggle.com/datasets/harishvutukuri/dogs-vs-wolves")
Then, you can upload it to Clarifai.
- Python
from clarifai_datautils.image import ImageAnnotations
from clarifai.client.dataset import Dataset
import os
# Set the PAT key
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE" # replace with your own PAT key
# Defining path and annotation format
LOCAL_FOLDER_PATH = "./dogs-vs-wolves/data/"
ANNOTATION_FORMAT = "imagenet"
# Load dataset from the specified local folder
kaggle_imagenet_dataset = ImageAnnotations.import_from(path=LOCAL_FOLDER_PATH, format=ANNOTATION_FORMAT)
# Use the Python SDK library to upload
dataset = Dataset(user_id="YOUR_USER_ID_HERE", app_id="YOUR_APP_ID_HERE", dataset_id="YOUR_DATASET_ID_HERE")
# Or, initialize with a dataset URL; example: https://clarifai.com/john/my-app/datasets/annotations_dataset
#dataset = Dataset("DATASET_URL_HERE")
# Upload dataset using the dataloader
dataset.upload_dataset(dataloader=kaggle_imagenet_dataset.dataloader)
Convert Between Supported Formats
You can convert datasets between various annotation formats in your local development environment. For example, you can convert a dataset from COCO format to VOC format.
- Python
from clarifai_datautils.image import ImageAnnotations
# Defining import details
IMPORT_LOCAL_FOLDER_PATH = "./assets/annotation_formats/coco"
IMPORT_ANNOTATION_FORMAT = "coco_detection"
coco_dataset = ImageAnnotations.import_from(path=IMPORT_LOCAL_FOLDER_PATH, format=IMPORT_ANNOTATION_FORMAT)
# Defining export details
EXPORT_LOCAL_FOLDER_PATH = "./assets/annotation_formats/coco2voc"
EXPORT_ANNOTATION_FORMAT = "voc_detection"
coco_dataset_export = coco_dataset.export_to(EXPORT_LOCAL_FOLDER_PATH, EXPORT_ANNOTATION_FORMAT)
# save_images param will also save the images
#coco_dataset_export = coco_dataset.export_to(EXPORT_LOCAL_FOLDER_PATH, EXPORT_ANNOTATION_FORMAT, save_images=True)
Export a Clarifai Dataset to Another Format
You can export a dataset version from the Clarifai platform and convert it into various formats. This process involves two simple steps.
First, use the Clarifai SDK to export the dataset from the platform. The dataset will be downloaded as a ZIP file to your specified local directory. If the directory does not already exist, it will be automatically created for you.
- Python
from clarifai.client.dataset import Dataset
import os
# Set the PAT key
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE" # replace with your own PAT key
# Initialize Dataset object for Clarifai
dataset = Dataset(user_id="YOUR_USER_ID_HERE", app_id="YOUR_APP_ID_HERE", dataset_id="YOUR_DATASET_ID_HERE", dataset_version_id="YOUR_DATASET_VERSION_HERE")
# Specify the path where the exported dataset will be saved
# Optionally, you can also specify how the exported data will be split. Common splits include train, val, and test
dataset.export(save_path="clarifai_export.zip", split="train")
Next, extract the contents of the ZIP file to a folder. Then, pass the folder path to ImageAnnotations
and convert the dataset into your desired format.
- Python
# Extract the zip file and pass the folder to ImageAnnotations
from clarifai_datautils.image import ImageAnnotations
# Defining import details
IMPORT_LOCAL_FOLDER_PATH = "./content/train"
IMPORT_ANNOTATION_FORMAT = "clarifai"
coco_dataset = ImageAnnotations.import_from(path=IMPORT_LOCAL_FOLDER_PATH, format=IMPORT_ANNOTATION_FORMAT)
# Defining export details
EXPORT_LOCAL_FOLDER_PATH = "./content"
EXPORT_ANNOTATION_FORMAT = "coco_detection"
coco_dataset_export = coco_dataset.export_to(EXPORT_LOCAL_FOLDER_PATH, EXPORT_ANNOTATION_FORMAT)
# save_images param will also save the images
#coco_dataset_export = coco_dataset.export_to(EXPORT_LOCAL_FOLDER_PATH, EXPORT_ANNOTATION_FORMAT, save_images=True)