Skip to main content

Image as Input

Learn how to perform inference with image as input using Clarifai Python SDK


Clarifai's Python SDK empowers you to seamlessly integrate advanced image recognition functionalities into your applications, using the potential of artificial intelligence. The Clarifai Python SDK utilises different model types that takes the image as inputs for various tasks.. Whether you're building applications for content moderation, object detection, or image classification, our SDK offers a robust foundation to turn images into actionable information.

Visual Classifier

Harnessing the power of Clarifai's Visual Classifier models, you can seamlessly categorize images through the intuitive Predict API for images. This capability enables you to submit input images to a classification model of your choice, providing a straightforward mechanism for obtaining accurate and meaningful predictions. You have the option to supply image data through either URLs or files, enhancing the adaptability of the platform for diverse image classification scenarios.

note

You can send up to 128 images in one API call. The file size of each image input should be less than 20MB.

from clarifai.client.model import Model

# Your PAT (Personal Access Token) can be found in the Account's Security section
# Specify the correct user_id/app_id pairings
# Since you're making inferences outside your app's scope
#USER_ID = "clarifai"
#APP_ID = "main"

# You can set the model using model URL or model ID.
# Change these to whatever model you want to use
# eg : MODEL_ID = "general-image-recognition"
# You can also set a particular model version by specifying the version ID
# eg: MODEL_VERSION_ID = "aa7f35c01e0642fda5cf400f543e7c40"
# Model class objects can be inititalised by providing its URL or also by defining respective user_id, app_id and model_id

# eg : model = Model(user_id="clarifai", app_id="main", model_id=MODEL_ID)

model_url = "https://clarifai.com/clarifai/main/models/general-image-recognition"
image_url = "https://samples.clarifai.com/metro-north.jpg"

# The predict API gives flexibility to generate predictions for data provided through URL,Filepath and bytes format.


# Example for prediction through Bytes:
# model_prediction = model.predict_by_bytes(input_bytes, input_type="image")


# Example for prediction through Filepath:
# model_prediction = Model(model_url).predict_by_filepath(filepath, input_type="image")

model_prediction = Model(url=model_url, pat="YOUR_PAT").predict_by_url(
image_url, input_type="image"
)

# Get the output
print(model_prediction.outputs[0].data)
Output
concepts {

id: "ai_HLmqFqBf"

name: "train"

value: 0.999604881

app_id: "main"

}

concepts {

id: "ai_fvlBqXZR"

name: "railway"

value: 0.999297619

app_id: "main"

}

concepts {

id: "ai_SHNDcmJ3"

name: "subway system"

value: 0.99825567

app_id: "main"

}

concepts {

id: "ai_6kTjGfF6"

name: "station"

value: 0.998010933

app_id: "main"

}

concepts {

id: "ai_RRXLczch"

name: "locomotive"

value: 0.997254908

app_id: "main"

}

concepts {

id: "ai_Xxjc3MhT"

name: "transportation system"

value: 0.996976852

app_id: "main"

}

concepts {

id: "ai_VRmbGVWh"

name: "travel"

value: 0.988967717

app_id: "main"

}

concepts {

id: "ai_jlb9q33b"

name: "commuter"

value: 0.98089534

app_id: "main"

}

concepts {

id: "ai_2gkfMDsM"

name: "platform"

value: 0.980635285

app_id: "main"

}

concepts {

id: "ai_n9vjC1jB"

name: "light"

value: 0.974186838

app_id: "main"

}

concepts {

id: "ai_sQQj52KZ"

name: "train station"

value: 0.96878773

app_id: "main"

}

concepts {

id: "ai_l4WckcJN"

name: "blur"

value: 0.967302203

app_id: "main"

}

concepts {

id: "ai_WBQfVV0p"

name: "city"

value: 0.96151042

app_id: "main"

}

concepts {

id: "ai_TZ3C79C6"

name: "road"

value: 0.961382031

app_id: "main"

}

concepts {

id: "ai_CpFBRWzD"

name: "urban"

value: 0.960375667

app_id: "main"

}

concepts {

id: "ai_tr0MBp64"

name: "traffic"

value: 0.959969819

app_id: "main"

}

concepts {

id: "ai_GjVpxXrs"

name: "street"

value: 0.947492182

app_id: "main"

}

concepts {

id: "ai_mcSHVRfS"

name: "public"

value: 0.934322

app_id: "main"

}

concepts {

id: "ai_J6d1kV8t"

name: "tramway"

value: 0.931958199

app_id: "main"

}

concepts {

id: "ai_6lhccv44"

name: "business"

value: 0.929547548

app_id: "main"

}

Visual Detector - Image

Dive into a richer understanding of image content with Clarifai's Predict API for Visual Detector models. Unlike image classification, where a single label is assigned to the entire image, Visual Detector goes beyond, detecting and outlining multiple objects or regions within an image, associating them with specific classes or labels. Similar to image classification, the Predict API for visual detection accommodates input images through URLs or files.

from clarifai.client.model import Model

# Your PAT (Personal Access Token) can be found in the Account's Security section
# Specify the correct user_id/app_id pairings
# Since you're making inferences outside your app's scope
#USER_ID = "clarifai"
#APP_ID = "main"

# You can set the model using model URL or model ID.
# Change these to whatever model you want to use
# eg : MODEL_ID = 'general-image-detection'
# You can also set a particular model version by specifying the version ID
# eg: MODEL_VERSION_ID = '1580bb1932594c93b7e2e04456af7c6f'

# Model class objects can be inititalised by providing its URL or also by defining respective user_id, app_id and model_id
# eg : model = Model(user_id="clarifai", app_id="main", model_id=MODEL_ID)


DETECTION_IMAGE_URL = "https://s3.amazonaws.com/samples.clarifai.com/people_walking2.jpeg"
model_url = "https://clarifai.com/clarifai/main/models/general-image-detection"
detector_model = Model(
url=model_url,
pat="YOUR_PAT",
)



# The predict API gives flexibility to generate predictions for data provided through URL,Filepath and bytes format.


# Example for prediction through Bytes:
# model_prediction = model.predict_by_bytes(input_bytes, input_type="image")


# Example for prediction through Filepath:
# model_prediction = Model(model_url).predict_by_filepath(filepath, input_type="image")

prediction_response = detector_model.predict_by_url(
DETECTION_IMAGE_URL, input_type="image"
)

# Since we have one input, one output will exist here
regions = prediction_response.outputs[0].data.regions

for region in regions:
# Accessing and rounding the bounding box values
top_row = round(region.region_info.bounding_box.top_row, 3)
left_col = round(region.region_info.bounding_box.left_col, 3)
bottom_row = round(region.region_info.bounding_box.bottom_row, 3)
right_col = round(region.region_info.bounding_box.right_col, 3)

for concept in region.data.concepts:
# Accessing and rounding the concept value
name = concept.name
value = round(concept.value, 4)

print(
(f"{name}: {value} BBox: {top_row}, {left_col}, {bottom_row}, {right_col}")
)
Output
Footwear: 0.9618 BBox: 0.879, 0.305, 0.925, 0.327

Footwear: 0.9593 BBox: 0.882, 0.284, 0.922, 0.305

Footwear: 0.9571 BBox: 0.874, 0.401, 0.923, 0.418

Footwear: 0.9546 BBox: 0.87, 0.712, 0.916, 0.732

Footwear: 0.9518 BBox: 0.882, 0.605, 0.918, 0.623

Footwear: 0.95 BBox: 0.847, 0.587, 0.907, 0.604

Footwear: 0.9349 BBox: 0.878, 0.475, 0.917, 0.492

Tree: 0.9145 BBox: 0.009, 0.019, 0.451, 0.542

Footwear: 0.9127 BBox: 0.858, 0.393, 0.909, 0.407

Footwear: 0.8969 BBox: 0.812, 0.433, 0.844, 0.445

Footwear: 0.8747 BBox: 0.852, 0.49, 0.912, 0.506

Jeans: 0.8699 BBox: 0.511, 0.255, 0.917, 0.336

Footwear: 0.8203 BBox: 0.808, 0.453, 0.833, 0.465

Footwear: 0.8186 BBox: 0.8, 0.378, 0.834, 0.391

Jeans: 0.7921 BBox: 0.715, 0.273, 0.895, 0.326

Tree: 0.7851 BBox: 0.0, 0.512, 0.635, 0.998

Woman: 0.7693 BBox: 0.466, 0.36, 0.915, 0.449

Jeans: 0.7614 BBox: 0.567, 0.567, 0.901, 0.647

Footwear: 0.7287 BBox: 0.847, 0.494, 0.884, 0.51

Tree: 0.7216 BBox: 0.002, 0.005, 0.474, 0.14

Jeans: 0.7098 BBox: 0.493, 0.447, 0.914, 0.528

Footwear: 0.6929 BBox: 0.808, 0.424, 0.839, 0.437

Jeans: 0.6734 BBox: 0.728, 0.464, 0.887, 0.515

Woman: 0.6141 BBox: 0.464, 0.674, 0.922, 0.782

Human leg: 0.6032 BBox: 0.681, 0.577, 0.897, 0.634

...

Footwear: 0.3527 BBox: 0.844, 0.5, 0.875, 0.515

Footwear: 0.3395 BBox: 0.863, 0.396, 0.914, 0.413

Human hair: 0.3358 BBox: 0.443, 0.586, 0.505, 0.622

Tree: 0.3306 BBox: 0.6, 0.759, 0.805, 0.929

Visual Detector - Video

Enhance your capabilities with Clarifai's Predict API, which provides predictions for every frame when processing a video as input. The video Predict API is highly configurable, allowing users to fine-tune requests, including the number of frames processed per second for more control over analysis speed. Choose the most suitable model for your visual detection task.

note

Video length should be at most 10mins in length or 100 MB in size when uploaded through URL.

from clarifai.client.model import Model

# Your PAT (Personal Access Token) can be found in the portal under # Authentification
# Specify the correct user_id/app_id pairings
# Since you're making inferences outside your app's scope
USER_ID = "clarifai"
APP_ID = "main"
# Change these to whatever model and video URL you want to use
MODEL_ID = "general-image-recognition"
# You can also set a particular model version by specifying the version ID
# eg: MODEL_VERSION_ID = 'aa7f35c01e0642fda5cf400f543e7c40'

VIDEO_URL = "https://samples.clarifai.com/beer.mp4"
# Change this to configure the FPS rate (If it's not configured, it defaults to 1 FPS)
# The number must range betweeen 100 and 60000.
# FPS = 1000/sample_ms

SAMPLE_MS = 2000

# Model class objects can be inititalised by providing its URL or also by defining respective user_id, app_id and model_id
# eg: model = Model("https://clarifai.com/clarifai/main/models/general-image-recognition")


model = Model(user_id="clarifai", app_id="main", model_id=MODEL_ID, pat="YOUR_PAT")
output_config = {"sample_ms": SAMPLE_MS} # Run inference every 2 seconds
model_prediction = model.predict_by_url(
BEER_VIDEO_URL, input_type="video", output_config=output_config
)

# The predict API gives flexibility to generate predictions for data provided through filepath, URL and bytes format.

# Example for prediction through Filepath:
# model_prediction = model.predict_by_filepath(video_file_path, input_type="video", output_config=output_config)

# Example for prediction through Bytes:
# model_prediction = model.predict_by_bytes(input_video_bytes, input_type="video", output_config=output_config)


# Print the frame info and the first concept name in each frame
for frame in model_prediction.outputs[0].data.frames:
print(f"Frame Info: {frame.frame_info} Concept: {frame.data.concepts[0].name}\n")
Output
Frame Info: time: 1000

Concept: beer

Frame Info: index: 1

time: 3000

Concept: beer

Frame Info: index: 2

time: 5000

Concept: beer

Frame Info: index: 3

time: 7000

Concept: beer

Frame Info: index: 4

time: 9000

Concept: beer

Visual Segmenter

The Clarifai Predict API offers a powerful capability to generate segmentation masks by providing an image as input to a segmentation model. This functionality allows for the detailed analysis of images, where distinct regions are identified and associated with specific concepts.

from clarifai.client.model import Model

# Your PAT (Personal Access Token) can be found in the Account's Security section
# Specify the correct user_id/app_id pairings
# Since you're making inferences outside your app's scope
#USER_ID = "clarifai"
#APP_ID = "main"

# You can set the model using model URL or model ID.
# Change these to whatever model you want to use
# eg : MODEL_ID = 'image-general-segmentation'
# You can also set a particular model version by specifying the version ID
# eg: MODEL_VERSION_ID = '1581820110264581908ce024b12b4bfb'
# Model class objects can be inititalised by providing its URL or also by defining respective user_id, app_id and model_id

# eg : model = Model(user_id="clarifai", app_id="main", model_id=MODEL_ID)

SEGMENT_IMAGE_URL = "https://s3.amazonaws.com/samples.clarifai.com/people_walking2.jpeg"

# The predict API gives flexibility to generate predictions for data provided through URL,Filepath and bytes format.

# Example for prediction through Bytes:
# model_prediction = model.predict_by_bytes(input_bytes, input_type="image")


# Example for prediction through Filepath:
# model_prediction = Model(model_url).predict_by_filepath(filepath, input_type="image")

model_url = "https://clarifai.com/clarifai/main/models/image-general-segmentation"
segmentor_model = Model(
url=model_url,
pat="YOUR_PAT",
)

prediction_response = segmentor_model.predict_by_url(
SEGMENT_IMAGE_URL, input_type="image"
)

regions = prediction_response.outputs[0].data.regions

for region in regions:
for concept in region.data.concepts:
# Accessing and rounding the concept's percentage of image covered
name = concept.name
value = round(concept.value, 4)
print((f"{name}: {value}"))
Output
tree: 0.4965

person: 0.151

house: 0.0872

pavement: 0.0694

bush: 0.0588

road: 0.0519

sky-other: 0.0401

grass: 0.0296

building-other: 0.0096

unlabeled: 0.0035

roof: 0.0017

teddy bear: 0.0006

Image To Text

Enhance your application by producing descriptive captions for images using the Clarifai Predict API. By providing an image as input to a state-of-the-art image-to-text model, you can extract meaningful textual descriptions effortlessly.

from clarifai.client.model import Model

# Your PAT (Personal Access Token) can be found in the Account's Security section
# Specify the correct user_id/app_id pairings
# Since you're making inferences outside your app's scope
#USER_ID = "salesforce"
#APP_ID = "blip"

# You can set the model using model URL or model ID.
# Change these to whatever model you want to use
# eg : MODEL_ID = "general-english-image-caption-blip"
# You can also set a particular model version by specifying the version ID
# eg: MODEL_VERSION_ID = "cdb690f13e62470ea6723642044f95e4"
# Model class objects can be inititalised by providing its URL or also by defining respective user_id, app_id and model_id

# eg : model = Model(user_id="clarifai", app_id="main", model_id=MODEL_ID)

model_url = (
"https://clarifai.com/salesforce/blip/models/general-english-image-caption-blip"
)
image_url = "https://s3.amazonaws.com/samples.clarifai.com/featured-models/image-captioning-statue-of-liberty.jpeg"

# The Predict API also accepts data through URL, Filepath & Bytes.
# Example for predict by filepath:
# model_prediction = Model(model_url).predict_by_filepath(filepath, input_type="text")

# Example for predict by bytes:
# model_prediction = Model(model_url).predict_by_bytes(image_bytes, input_type="text")

model_prediction = Model(url=model_url, pat="YOUR_PAT").predict_by_url(
image_url, input_type="image"
)

# Get the output
print(model_prediction.outputs[0].data.text.raw)
Output
a photograph of a statue of liberty in front of a blue sky

Image To Image

Elevate the resolution of your images using the Clarifai Predict API, specifically designed for image upscaling. This functionality allows you to enhance the quality of an image using an upscaling model.

from clarifai.client.model import Model

# Your PAT (Personal Access Token) can be found in the Account's Security section
# Specify the correct user_id/app_id pairings
# Since you're making inferences outside your app's scope
#USER_ID = "stability-ai"
#APP_ID = "Upscale"

# You can set the model using model URL or model ID.
# Change these to whatever model you want to use
# eg : MODEL_ID = 'stabilityai-upscale'
# You can also set a particular model version by specifying the version ID
# eg: MODEL_VERSION_ID = 'model_version'
# Model class objects can be inititalised by providing its URL or also by defining respective user_id, app_id and model_id

# eg : model = Model(user_id="clarifai", app_id="main", model_id=MODEL_ID)


inference_params = dict(width=1024)

# The predict API gives the flexibility to generate predictions for data provided through URL, Filepath and bytes format.

# Example for prediction through Bytes:
# model_prediction = model.predict_by_bytes(image_bytes, input_type="image")


# Example for prediction through Filepath:
# model_prediction = Model(model_url).predict_by_filepath(image_filepath, input_type="image")

model_url = "https://clarifai.com/stability-ai/Upscale/models/stabilityai-upscale"


image_url = "https://s3.amazonaws.com/samples.clarifai.com/featured-models/image-captioning-statue-of-liberty.jpeg"
model_prediction = Model(url=model_url, pat="YOUR_PAT").predict_by_url(
image_url, input_type="image", inference_params=inference_params
)

# Get the output
output_base64 = model_prediction.outputs[0].data.image.base64

image_info = model_prediction.outputs[0].data.image.image_info

with open("image.png", "wb") as f:
f.write(output_base64)

Visual Embedder

The Predict API empowers you to leverage image embeddings through an embedding model. Image embeddings are vector representations that encapsulate the semantic content of an image, offering a powerful tool for various applications such as similarity search, recommendation systems, and more.

from clarifai.client.model import Model

# Your PAT (Personal Access Token) can be found in the Account's Security section
# Specify the correct user_id/app_id pairings
# Since you're making inferences outside your app's scope
#USER_ID = "clarifai"
#APP_ID = "main"

# You can set the model using model URL or model ID.
# Change these to whatever model you want to use
# eg : MODEL_ID = 'image-embedder-clip'
# You can also set a particular model version by specifying the version ID
# eg: MODEL_VERSION_ID = 'model_version'
# Model class objects can be inititalised by providing its URL or also by defining respective user_id, app_id and model_id

# eg : model = Model(user_id="clarifai", app_id="main", model_id=MODEL_ID)

image_url = "https://s3.amazonaws.com/samples.clarifai.com/featured-models/general-elephants.jpg"


# The predict API gives the flexibility to generate predictions for data provided through URL, Filepath and bytes format.

# Example for prediction through Bytes:
# model_prediction = model.predict_by_url(input_bytes ,input_type="image")


# Example for prediction through Filepath:
# model_prediction = Model(model_url).predict_by_filepath(image_filepath, input_type="image")


model_url = "https://clarifai.com/clarifai/main/models/image-embedder-clip"

# Model Predict
model_prediction = Model(url=model_url, pat="YOUR_PAT").predict_by_url(
image_url, "image"
)
# print(model_prediction.outputs[0].data.text.raw)

embeddings = model_prediction.outputs[0].data.embeddings[0].vector

num_dimensions = model_prediction.outputs[0].data.embeddings[0].num_dimensions

print(embeddings[:10])
Output
[-0.016209319233894348,

-0.03517452999949455,

0.0031261674594134092,

0.03941042721271515,

0.01166260801255703,

-0.02489173412322998,

0.04667072370648384,

0.006998186931014061,

0.05729646235704422,

0.0077746850438416]