Skip to main content

Data Utils

Learn how to use the Clarifai Data Utils library


Data Utils is an open-source Clarifai library that provides a powerful suite of multimedia data utilities to simplify and accelerate your data management and processing activities.

It addresses several key challenges associated with data preparation, fostering a more efficient and professional workflow. Additionally, this library seamlessly integrates with the Python SDK, empowering you to unlock AI-driven solutions for diverse use cases.

With Data Utils, you can effortlessly extract, transform, and load unstructured data — such as images, videos, and text — into the Clarifai platform. Once uploaded, you can leverage the data for various purposes within the Clarifai platform, such as training a custom image classification model to identify objects in images.

The library offers two key features:

  • Image Annotation Loader — This framework enables you to load various annotated image datasets and upload them directly to the Clarifai platform. It also supports converting between different annotation formats, ensuring compatibility and flexibility for your projects.

  • Data Ingestion Pipelines — This framework provides robust pipelines to pre-process images and text documents (such as PDFs, HTML, and Word docs), transform and chunk the content, and seamlessly ingest them into the Clarifai platform for further processing and analysis.