Image as Input
Learn how to perform inference with image as input using Clarifai Python SDK
Clarifai's Python SDK empowers you to seamlessly integrate advanced image recognition functionalities into your applications, using the potential of artificial intelligence. The Clarifai Python SDK utilises different model types that takes the image as inputs for various tasks.. Whether you're building applications for content moderation, object detection, or image classification, our SDK offers a robust foundation to turn images into actionable information.
Visual Classifier
Harnessing the power of Clarifai's Visual Classifier models, you can seamlessly categorize images through the intuitive Predict API for images. This capability enables you to submit input images to a classification model of your choice, providing a straightforward mechanism for obtaining accurate and meaningful predictions. You have the option to supply image data through either URLs or files, enhancing the adaptability of the platform for diverse image classification scenarios.
You can send up to 128 images in one API call. The file size of each image input should be less than 20MB.
- Python
from clarifai.client.model import Model
# Your PAT (Personal Access Token) can be found in the Account's Security section
# Specify the correct user_id/app_id pairings
# Since you're making inferences outside your app's scope
#USER_ID = "clarifai"
#APP_ID = "main"
# You can set the model using model URL or model ID.
# Change these to whatever model you want to use
# eg : MODEL_ID = "general-image-recognition"
# You can also set a particular model version by specifying the version ID
# eg: MODEL_VERSION_ID = "aa7f35c01e0642fda5cf400f543e7c40"
# Model class objects can be inititalised by providing its URL or also by defining respective user_id, app_id and model_id
# eg : model = Model(user_id="clarifai", app_id="main", model_id=MODEL_ID)
model_url = "https://clarifai.com/clarifai/main/models/general-image-recognition"
image_url = "https://samples.clarifai.com/metro-north.jpg"
# The predict API gives flexibility to generate predictions for data provided through URL,Filepath and bytes format.
# Example for prediction through Bytes:
# model_prediction = model.predict_by_bytes(input_bytes, input_type="image")
# Example for prediction through Filepath:
# model_prediction = Model(model_url).predict_by_filepath(filepath, input_type="image")
model_prediction = Model(url=model_url, pat="YOUR_PAT").predict_by_url(
image_url, input_type="image"
)
# Get the output
print(model_prediction.outputs[0].data)
Output
concepts {
id: "ai_HLmqFqBf"
name: "train"
value: 0.999604881
app_id: "main"
}
concepts {
id: "ai_fvlBqXZR"
name: "railway"
value: 0.999297619
app_id: "main"
}
concepts {
id: "ai_SHNDcmJ3"
name: "subway system"
value: 0.99825567
app_id: "main"
}
concepts {
id: "ai_6kTjGfF6"
name: "station"
value: 0.998010933
app_id: "main"
}
concepts {
id: "ai_RRXLczch"
name: "locomotive"
value: 0.997254908
app_id: "main"
}
concepts {
id: "ai_Xxjc3MhT"
name: "transportation system"
value: 0.996976852
app_id: "main"
}
concepts {
id: "ai_VRmbGVWh"
name: "travel"
value: 0.988967717
app_id: "main"
}
concepts {
id: "ai_jlb9q33b"
name: "commuter"
value: 0.98089534
app_id: "main"
}
concepts {
id: "ai_2gkfMDsM"
name: "platform"
value: 0.980635285
app_id: "main"
}
concepts {
id: "ai_n9vjC1jB"
name: "light"
value: 0.974186838
app_id: "main"
}
concepts {
id: "ai_sQQj52KZ"
name: "train station"
value: 0.96878773
app_id: "main"
}
concepts {
id: "ai_l4WckcJN"
name: "blur"
value: 0.967302203
app_id: "main"
}
concepts {
id: "ai_WBQfVV0p"
name: "city"
value: 0.96151042
app_id: "main"
}
concepts {
id: "ai_TZ3C79C6"
name: "road"
value: 0.961382031
app_id: "main"
}
concepts {
id: "ai_CpFBRWzD"
name: "urban"
value: 0.960375667
app_id: "main"
}
concepts {
id: "ai_tr0MBp64"
name: "traffic"
value: 0.959969819
app_id: "main"
}
concepts {
id: "ai_GjVpxXrs"
name: "street"
value: 0.947492182
app_id: "main"
}
concepts {
id: "ai_mcSHVRfS"
name: "public"
value: 0.934322
app_id: "main"
}
concepts {
id: "ai_J6d1kV8t"
name: "tramway"
value: 0.931958199
app_id: "main"
}
concepts {
id: "ai_6lhccv44"
name: "business"
value: 0.929547548
app_id: "main"
}
Visual Detector - Image
Dive into a richer understanding of image content with Clarifai's Predict API for Visual Detector models. Unlike image classification, where a single label is assigned to the entire image, Visual Detector goes beyond, detecting and outlining multiple objects or regions within an image, associating them with specific classes or labels. Similar to image classification, the Predict API for visual detection accommodates input images through URLs or files.
- Python
from clarifai.client.model import Model
# Your PAT (Personal Access Token) can be found in the Account's Security section
# Specify the correct user_id/app_id pairings
# Since you're making inferences outside your app's scope
#USER_ID = "clarifai"
#APP_ID = "main"
# You can set the model using model URL or model ID.
# Change these to whatever model you want to use
# eg : MODEL_ID = 'general-image-detection'
# You can also set a particular model version by specifying the version ID
# eg: MODEL_VERSION_ID = '1580bb1932594c93b7e2e04456af7c6f'
# Model class objects can be inititalised by providing its URL or also by defining respective user_id, app_id and model_id
# eg : model = Model(user_id="clarifai", app_id="main", model_id=MODEL_ID)
DETECTION_IMAGE_URL = "https://s3.amazonaws.com/samples.clarifai.com/people_walking2.jpeg"
model_url = "https://clarifai.com/clarifai/main/models/general-image-detection"
detector_model = Model(
url=model_url,
pat="YOUR_PAT",
)
# The predict API gives flexibility to generate predictions for data provided through URL,Filepath and bytes format.
# Example for prediction through Bytes:
# model_prediction = model.predict_by_bytes(input_bytes, input_type="image")
# Example for prediction through Filepath:
# model_prediction = Model(model_url).predict_by_filepath(filepath, input_type="image")
prediction_response = detector_model.predict_by_url(
DETECTION_IMAGE_URL, input_type="image"
)
# Since we have one input, one output will exist here
regions = prediction_response.outputs[0].data.regions
for region in regions:
# Accessing and rounding the bounding box values
top_row = round(region.region_info.bounding_box.top_row, 3)
left_col = round(region.region_info.bounding_box.left_col, 3)
bottom_row = round(region.region_info.bounding_box.bottom_row, 3)
right_col = round(region.region_info.bounding_box.right_col, 3)
for concept in region.data.concepts:
# Accessing and rounding the concept value
name = concept.name
value = round(concept.value, 4)
print(
(f"{name}: {value} BBox: {top_row}, {left_col}, {bottom_row}, {right_col}")
)
Output
Footwear: 0.9618 BBox: 0.879, 0.305, 0.925, 0.327
Footwear: 0.9593 BBox: 0.882, 0.284, 0.922, 0.305
Footwear: 0.9571 BBox: 0.874, 0.401, 0.923, 0.418
Footwear: 0.9546 BBox: 0.87, 0.712, 0.916, 0.732
Footwear: 0.9518 BBox: 0.882, 0.605, 0.918, 0.623
Footwear: 0.95 BBox: 0.847, 0.587, 0.907, 0.604
Footwear: 0.9349 BBox: 0.878, 0.475, 0.917, 0.492
Tree: 0.9145 BBox: 0.009, 0.019, 0.451, 0.542
Footwear: 0.9127 BBox: 0.858, 0.393, 0.909, 0.407
Footwear: 0.8969 BBox: 0.812, 0.433, 0.844, 0.445
Footwear: 0.8747 BBox: 0.852, 0.49, 0.912, 0.506
Jeans: 0.8699 BBox: 0.511, 0.255, 0.917, 0.336
Footwear: 0.8203 BBox: 0.808, 0.453, 0.833, 0.465
Footwear: 0.8186 BBox: 0.8, 0.378, 0.834, 0.391
Jeans: 0.7921 BBox: 0.715, 0.273, 0.895, 0.326
Tree: 0.7851 BBox: 0.0, 0.512, 0.635, 0.998
Woman: 0.7693 BBox: 0.466, 0.36, 0.915, 0.449
Jeans: 0.7614 BBox: 0.567, 0.567, 0.901, 0.647
Footwear: 0.7287 BBox: 0.847, 0.494, 0.884, 0.51
Tree: 0.7216 BBox: 0.002, 0.005, 0.474, 0.14
Jeans: 0.7098 BBox: 0.493, 0.447, 0.914, 0.528
Footwear: 0.6929 BBox: 0.808, 0.424, 0.839, 0.437
Jeans: 0.6734 BBox: 0.728, 0.464, 0.887, 0.515
Woman: 0.6141 BBox: 0.464, 0.674, 0.922, 0.782
Human leg: 0.6032 BBox: 0.681, 0.577, 0.897, 0.634
...
Footwear: 0.3527 BBox: 0.844, 0.5, 0.875, 0.515
Footwear: 0.3395 BBox: 0.863, 0.396, 0.914, 0.413
Human hair: 0.3358 BBox: 0.443, 0.586, 0.505, 0.622
Tree: 0.3306 BBox: 0.6, 0.759, 0.805, 0.929
Visual Detector - Video
Enhance your capabilities with Clarifai's Predict API, which provides predictions for every frame when processing a video as input. The video Predict API is highly configurable, allowing users to fine-tune requests, including the number of frames processed per second for more control over analysis speed. Choose the most suitable model for your visual detection task.
Video length should be at most 10mins in length or 100 MB in size when uploaded through URL.
- Python
from clarifai.client.model import Model
# Your PAT (Personal Access Token) can be found in the portal under # Authentification
# Specify the correct user_id/app_id pairings
# Since you're making inferences outside your app's scope
USER_ID = "clarifai"
APP_ID = "main"
# Change these to whatever model and video URL you want to use
MODEL_ID = "general-image-recognition"
# You can also set a particular model version by specifying the version ID
# eg: MODEL_VERSION_ID = 'aa7f35c01e0642fda5cf400f543e7c40'
VIDEO_URL = "https://samples.clarifai.com/beer.mp4"
# Change this to configure the FPS rate (If it's not configured, it defaults to 1 FPS)
# The number must range betweeen 100 and 60000.
# FPS = 1000/sample_ms
SAMPLE_MS = 2000
# Model class objects can be inititalised by providing its URL or also by defining respective user_id, app_id and model_id
# eg: model = Model("https://clarifai.com/clarifai/main/models/general-image-recognition")
model = Model(user_id="clarifai", app_id="main", model_id=MODEL_ID, pat="YOUR_PAT")
output_config = {"sample_ms": SAMPLE_MS} # Run inference every 2 seconds
model_prediction = model.predict_by_url(
BEER_VIDEO_URL, input_type="video", output_config=output_config
)
# The predict API gives flexibility to generate predictions for data provided through filepath, URL and bytes format.
# Example for prediction through Filepath:
# model_prediction = model.predict_by_filepath(video_file_path, input_type="video", output_config=output_config)
# Example for prediction through Bytes:
# model_prediction = model.predict_by_bytes(input_video_bytes, input_type="video", output_config=output_config)
# Print the frame info and the first concept name in each frame
for frame in model_prediction.outputs[0].data.frames:
print(f"Frame Info: {frame.frame_info} Concept: {frame.data.concepts[0].name}\n")
Output
Frame Info: time: 1000
Concept: beer
Frame Info: index: 1
time: 3000
Concept: beer
Frame Info: index: 2
time: 5000
Concept: beer
Frame Info: index: 3
time: 7000
Concept: beer
Frame Info: index: 4
time: 9000
Concept: beer
Visual Segmenter
The Clarifai Predict API offers a powerful capability to generate segmentation masks by providing an image as input to a segmentation model. This functionality allows for the detailed analysis of images, where distinct regions are identified and associated with specific concepts.
- Python
from clarifai.client.model import Model
# Your PAT (Personal Access Token) can be found in the Account's Security section
# Specify the correct user_id/app_id pairings
# Since you're making inferences outside your app's scope
#USER_ID = "clarifai"
#APP_ID = "main"
# You can set the model using model URL or model ID.
# Change these to whatever model you want to use
# eg : MODEL_ID = 'image-general-segmentation'
# You can also set a particular model version by specifying the version ID
# eg: MODEL_VERSION_ID = '1581820110264581908ce024b12b4bfb'
# Model class objects can be inititalised by providing its URL or also by defining respective user_id, app_id and model_id
# eg : model = Model(user_id="clarifai", app_id="main", model_id=MODEL_ID)
SEGMENT_IMAGE_URL = "https://s3.amazonaws.com/samples.clarifai.com/people_walking2.jpeg"
# The predict API gives flexibility to generate predictions for data provided through URL,Filepath and bytes format.
# Example for prediction through Bytes:
# model_prediction = model.predict_by_bytes(input_bytes, input_type="image")
# Example for prediction through Filepath:
# model_prediction = Model(model_url).predict_by_filepath(filepath, input_type="image")
model_url = "https://clarifai.com/clarifai/main/models/image-general-segmentation"
segmentor_model = Model(
url=model_url,
pat="YOUR_PAT",
)
prediction_response = segmentor_model.predict_by_url(
SEGMENT_IMAGE_URL, input_type="image"
)
regions = prediction_response.outputs[0].data.regions
for region in regions:
for concept in region.data.concepts:
# Accessing and rounding the concept's percentage of image covered
name = concept.name
value = round(concept.value, 4)
print((f"{name}: {value}"))
Output
tree: 0.4965
person: 0.151
house: 0.0872
pavement: 0.0694
bush: 0.0588
road: 0.0519
sky-other: 0.0401
grass: 0.0296
building-other: 0.0096
unlabeled: 0.0035
roof: 0.0017
teddy bear: 0.0006
Image To Text
Enhance your application by producing descriptive captions for images using the Clarifai Predict API. By providing an image as input to a state-of-the-art image-to-text model, you can extract meaningful textual descriptions effortlessly.
- Python
from clarifai.client.model import Model
# Your PAT (Personal Access Token) can be found in the Account's Security section
# Specify the correct user_id/app_id pairings
# Since you're making inferences outside your app's scope
#USER_ID = "salesforce"
#APP_ID = "blip"
# You can set the model using model URL or model ID.
# Change these to whatever model you want to use
# eg : MODEL_ID = "general-english-image-caption-blip"
# You can also set a particular model version by specifying the version ID
# eg: MODEL_VERSION_ID = "cdb690f13e62470ea6723642044f95e4"
# Model class objects can be inititalised by providing its URL or also by defining respective user_id, app_id and model_id
# eg : model = Model(user_id="clarifai", app_id="main", model_id=MODEL_ID)
model_url = (
"https://clarifai.com/salesforce/blip/models/general-english-image-caption-blip"
)
image_url = "https://s3.amazonaws.com/samples.clarifai.com/featured-models/image-captioning-statue-of-liberty.jpeg"
# The Predict API also accepts data through URL, Filepath & Bytes.
# Example for predict by filepath:
# model_prediction = Model(model_url).predict_by_filepath(filepath, input_type="text")
# Example for predict by bytes:
# model_prediction = Model(model_url).predict_by_bytes(image_bytes, input_type="text")
model_prediction = Model(url=model_url, pat="YOUR_PAT").predict_by_url(
image_url, input_type="image"
)
# Get the output
print(model_prediction.outputs[0].data.text.raw)
Output
a photograph of a statue of liberty in front of a blue sky
Image To Image
Elevate the resolution of your images using the Clarifai Predict API, specifically designed for image upscaling. This functionality allows you to enhance the quality of an image using an upscaling model.
- Python
from clarifai.client.model import Model
# Your PAT (Personal Access Token) can be found in the Account's Security section
# Specify the correct user_id/app_id pairings
# Since you're making inferences outside your app's scope
#USER_ID = "stability-ai"
#APP_ID = "Upscale"
# You can set the model using model URL or model ID.
# Change these to whatever model you want to use
# eg : MODEL_ID = 'stabilityai-upscale'
# You can also set a particular model version by specifying the version ID
# eg: MODEL_VERSION_ID = 'model_version'
# Model class objects can be inititalised by providing its URL or also by defining respective user_id, app_id and model_id
# eg : model = Model(user_id="clarifai", app_id="main", model_id=MODEL_ID)
inference_params = dict(width=1024)
# The predict API gives the flexibility to generate predictions for data provided through URL, Filepath and bytes format.
# Example for prediction through Bytes:
# model_prediction = model.predict_by_bytes(image_bytes, input_type="image")
# Example for prediction through Filepath:
# model_prediction = Model(model_url).predict_by_filepath(image_filepath, input_type="image")
model_url = "https://clarifai.com/stability-ai/Upscale/models/stabilityai-upscale"
image_url = "https://s3.amazonaws.com/samples.clarifai.com/featured-models/image-captioning-statue-of-liberty.jpeg"
model_prediction = Model(url=model_url, pat="YOUR_PAT").predict_by_url(
image_url, input_type="image", inference_params=inference_params
)
# Get the output
output_base64 = model_prediction.outputs[0].data.image.base64
image_info = model_prediction.outputs[0].data.image.image_info
with open("image.png", "wb") as f:
f.write(output_base64)
Visual Embedder
The Predict API empowers you to leverage image embeddings through an embedding model. Image embeddings are vector representations that encapsulate the semantic content of an image, offering a powerful tool for various applications such as similarity search, recommendation systems, and more.
- Python
from clarifai.client.model import Model
# Your PAT (Personal Access Token) can be found in the Account's Security section
# Specify the correct user_id/app_id pairings
# Since you're making inferences outside your app's scope
#USER_ID = "clarifai"
#APP_ID = "main"
# You can set the model using model URL or model ID.
# Change these to whatever model you want to use
# eg : MODEL_ID = 'image-embedder-clip'
# You can also set a particular model version by specifying the version ID
# eg: MODEL_VERSION_ID = 'model_version'
# Model class objects can be inititalised by providing its URL or also by defining respective user_id, app_id and model_id
# eg : model = Model(user_id="clarifai", app_id="main", model_id=MODEL_ID)
image_url = "https://s3.amazonaws.com/samples.clarifai.com/featured-models/general-elephants.jpg"
# The predict API gives the flexibility to generate predictions for data provided through URL, Filepath and bytes format.
# Example for prediction through Bytes:
# model_prediction = model.predict_by_url(input_bytes ,input_type="image")
# Example for prediction through Filepath:
# model_prediction = Model(model_url).predict_by_filepath(image_filepath, input_type="image")
model_url = "https://clarifai.com/clarifai/main/models/image-embedder-clip"
# Model Predict
model_prediction = Model(url=model_url, pat="YOUR_PAT").predict_by_url(
image_url, "image"
)
# print(model_prediction.outputs[0].data.text.raw)
embeddings = model_prediction.outputs[0].data.embeddings[0].vector
num_dimensions = model_prediction.outputs[0].data.embeddings[0].num_dimensions
print(embeddings[:10])
Output
[-0.016209319233894348,
-0.03517452999949455,
0.0031261674594134092,
0.03941042721271515,
0.01166260801255703,
-0.02489173412322998,
0.04667072370648384,
0.006998186931014061,
0.05729646235704422,
0.0077746850438416]