Skip to main content

Your Data

Clarifai supports the most popular image, video, and text formats for your input data


Upload your inputs into the Clarifai platform for data labeling, training new models, search, or predictions. You can upload images, video, and text from URLs or from a local directory.

Inputs and outputs guide

Example:

When choosing one of Clarifai's pre-built models, you might see something like this from our person-vehicle model:

Input TypeOutput Type
imageregions[...].data.concepts, regions[...].region_info.bounding_box

These inputs and outputs can be clarified with the following table explaining these data types:

Table of uploadable data types:

Data TypeMeaning
textThis is freeform plain text which can be uploaded via raw text or specified with a URI.
imageThis is an image in an accepted format, which currently includes JPG, PNG, TIFF, BMP, WEBP, CSV, and TSV. It can be uploaded via base64 bytes or specified with a URI.
videoThis is video in an accepted format, which currently includes AVI, MP4, WMV, MOV, GIF, and 3GPP. It can be uploaded via base64 bytes or specified with a URI.

All these data formats are read in as raw bytes in base64 format.

Table of single data types passed between models:

Data TypeMeaning
embeddingsVector representions of data passed from model to model. These are not uploadable by users.
clustersThese are IDs that identify clusters. These are primarily used for image search.
conceptsThe list of concepts used in a model. For the general model, these would be the top 20 concepts with classified with the highest confidence.

Table of regions[...] data types:

The notation of [...] means that the variable is a list of things, so regions[...] represents a list of regions of data. This could be parts of an image, text, video, or audio:

Data TypeMeaning
regions[...].region_info.pointThis is a list of points which specify regions of an image.
regions[...].region_info.bounding_boxThis is a list of regions each containing the four corners of a bounding box in a specific region of an image. Each corner coordinate is normalized to [0,1].
regions[...].region_info.maskThe mask is an overlay of the entire image, with the specific concepts pixels set to a certain color.
regions[...].data.textThis is a list of regions and their associated text. This could be OCR data for an image, or subtext within a larger text for NLP.
regions[...].data.embeddingsThis is a list of regions and their associated vector representions.
regions[...].data.conceptsThis is a list of regions and their associated or high confidence concepts.

Table of frames[...] data types:

The notation of [...] means that the variable is a list of things, so frames[...] represents a list of frames of video or audio, and therefore frames[...].data.regions[...] represents a 2D matrix of the number of frames by the number of regions in each frame.

Data TypeMeaning
frames[...].data.regions[...].region_info.bounding_boxThese are the four corners of a bounding box in a specific region of a specific frame of video. Each corner coordinate is normalized to [0,1].
frames[...].data.regions[...].data.conceptsThis is the matrix of frames and regions containing the concepts used in a model. For the general model, these would be the top 20 concepts classified with the highest confidence in a specific region of a specific frame of video.
frames[...].data.regions[...].track_idThis is the matrix of frames and regions containing the tracking ids used to track objects across frames of a video.