Skip to main content

Images

Make predictions on image inputs


To get predictions for a given image input, you need to supply the image along with the specific model from which you wish to receive predictions. You can supply the image via a publicly accessible URL or by directly sending bytes.

You can send up to 128 images in one API call. The file size of each image input should be less than 20MB.

You need to specify your choice of model for prediction by utilizing the MODEL_ID parameter.

tip

When you take an image with a digital device (such as a smartphone camera) the image's meta-information (such as the orientation value for how the camera is held) is stored in the image's Exif's data. And when you use a photo viewer to check the image on your computer, the photo viewer will respect that orientation value and automatically rotate the image to present it the way it was viewed. This allows you to see a correctly-oriented image no matter how the camera was held.

So, when you want to make predictions from an image taken with a digital device, you need to strip the Exif data from the image. Since the Clarifai platform does not account for the Exif data, removing it allows you to make accurate predictions using images in their desired rotation.

Visual Classification

Input: Image

Output: Concepts

Visual classification, also known as image classification, is the process of categorizing images into predefined classes based on their visual content. Machine learning models are employed to recognize patterns within images and assign them to the appropriate class.

After training, these models can classify new, unseen images by analyzing their visual content and assigning them to predefined categories based on what they've learned during training.

Predict via URL

Below is an example of how you would send image URLs and receive predictions from Clarifai's general-image-recognition model.

info

The initialization code used in the following examples is outlined in detail on the client installation page.

##################################################################################################
# In this section, we set the user authentication, user and app ID, model details, and the URL
# of the image we want as an input. Change these strings to run your own example.
#################################################################################################

# Your PAT (Personal Access Token) can be found in the Account's Security section
PAT = 'YOUR_PAT_HERE'
# Specify the correct user_id/app_id pairings
# Since you're making inferences outside your app's scope
USER_ID = 'clarifai'
APP_ID = 'main'
# Change these to whatever model and image URL you want to use
MODEL_ID = 'general-image-recognition'
MODEL_VERSION_ID = 'aa7f35c01e0642fda5cf400f543e7c40'
IMAGE_URL = 'https://samples.clarifai.com/metro-north.jpg'

############################################################################
# YOU DO NOT NEED TO CHANGE ANYTHING BELOW THIS LINE TO RUN THIS EXAMPLE
############################################################################

from clarifai_grpc.channel.clarifai_channel import ClarifaiChannel
from clarifai_grpc.grpc.api import resources_pb2, service_pb2, service_pb2_grpc
from clarifai_grpc.grpc.api.status import status_code_pb2

channel = ClarifaiChannel.get_grpc_channel()
stub = service_pb2_grpc.V2Stub(channel)

metadata = (('authorization', 'Key ' + PAT),)

userDataObject = resources_pb2.UserAppIDSet(user_id=USER_ID, app_id=APP_ID)

post_model_outputs_response = stub.PostModelOutputs(
service_pb2.PostModelOutputsRequest(
user_app_id=userDataObject, # The userDataObject is created in the overview and is required when using a PAT
model_id=MODEL_ID,
version_id=MODEL_VERSION_ID, # This is optional. Defaults to the latest model version
inputs=[
resources_pb2.Input(
data=resources_pb2.Data(
image=resources_pb2.Image(
url=IMAGE_URL
)
)
)
]
),
metadata=metadata
)
if post_model_outputs_response.status.code != status_code_pb2.SUCCESS:
print(post_model_outputs_response.status)
raise Exception("Post model outputs failed, status: " + post_model_outputs_response.status.description)

# Since we have one input, one output will exist here
output = post_model_outputs_response.outputs[0]

print("Predicted concepts:")
for concept in output.data.concepts:
print("%s %.2f" % (concept.name, concept.value))

# Uncomment this line to print the full Response JSON
#print(output)
Text Output Example
Predicted concepts:
train 1.00
railway 1.00
subway system 1.00
station 1.00
locomotive 1.00
transportation system 1.00
travel 0.99
commuter 0.98
platform 0.98
light 0.97
train station 0.97
blur 0.97
city 0.96
road 0.96
urban 0.96
traffic 0.96
street 0.95
public 0.93
tramway 0.93
business 0.93
JSON Output Example
id: "ca9767e3dab44da2b7fa811ce6e382f0"
status {
code: SUCCESS
description: "Ok"
}
created_at {
seconds: 1701796564
nanos: 495388804
}
model {
id: "general-image-recognition"
name: "Image Recognition"
created_at {
seconds: 1457543499
nanos: 608845000
}
modified_at {
seconds: 1694180313
nanos: 148401000
}
app_id: "main"
model_version {
id: "aa7f35c01e0642fda5cf400f543e7c40"
created_at {
seconds: 1520370624
nanos: 454834000
}
status {
code: MODEL_TRAINED
description: "Model is trained and ready"
}
visibility {
gettable: PUBLIC
}
app_id: "main"
user_id: "clarifai"
metadata {
}
}
user_id: "clarifai"
model_type_id: "visual-classifier"
visibility {
gettable: PUBLIC
}
workflow_recommended {
}
}
input {
id: "855b331a54554660adb83d56088da511"
data {
image {
url: "https://samples.clarifai.com/metro-north.jpg"
}
}
}
data {
concepts {
id: "ai_HLmqFqBf"
name: "train"
value: 0.999605358
app_id: "main"
}
concepts {
id: "ai_fvlBqXZR"
name: "railway"
value: 0.999298692
app_id: "main"
}
concepts {
id: "ai_SHNDcmJ3"
name: "subway system"
value: 0.998257935
app_id: "main"
}
concepts {
id: "ai_6kTjGfF6"
name: "station"
value: 0.998012304
app_id: "main"
}
concepts {
id: "ai_RRXLczch"
name: "locomotive"
value: 0.99726069
app_id: "main"
}
concepts {
id: "ai_Xxjc3MhT"
name: "transportation system"
value: 0.996979594
app_id: "main"
}
concepts {
id: "ai_VRmbGVWh"
name: "travel"
value: 0.988970637
app_id: "main"
}
concepts {
id: "ai_jlb9q33b"
name: "commuter"
value: 0.980911732
app_id: "main"
}
concepts {
id: "ai_2gkfMDsM"
name: "platform"
value: 0.980664492
app_id: "main"
}
concepts {
id: "ai_n9vjC1jB"
name: "light"
value: 0.974198043
app_id: "main"
}
concepts {
id: "ai_sQQj52KZ"
name: "train station"
value: 0.968836844
app_id: "main"
}
concepts {
id: "ai_l4WckcJN"
name: "blur"
value: 0.967306197
app_id: "main"
}
concepts {
id: "ai_WBQfVV0p"
name: "city"
value: 0.961521745
app_id: "main"
}
concepts {
id: "ai_TZ3C79C6"
name: "road"
value: 0.961392581
app_id: "main"
}
concepts {
id: "ai_CpFBRWzD"
name: "urban"
value: 0.960395515
app_id: "main"
}
concepts {
id: "ai_tr0MBp64"
name: "traffic"
value: 0.959996164
app_id: "main"
}
concepts {
id: "ai_GjVpxXrs"
name: "street"
value: 0.947550297
app_id: "main"
}
concepts {
id: "ai_mcSHVRfS"
name: "public"
value: 0.934354544
app_id: "main"
}
concepts {
id: "ai_J6d1kV8t"
name: "tramway"
value: 0.932101309
app_id: "main"
}
concepts {
id: "ai_6lhccv44"
name: "business"
value: 0.929465771
app_id: "main"
}
}

Predict via Bytes

Below is an example of how you would send the bytes of an image and receive predictions from Clarifai's general-image-recognition model.

######################################################################################################
# In this section, we set the user authentication, user and app ID, model details, and the location
# of the image we want as an input. Change these strings to run your own example.
#####################################################################################################

# Your PAT (Personal Access Token) can be found in the Account's Security section
PAT = 'YOUR_PAT_HERE'
# Specify the correct user_id/app_id pairings
# Since you're making inferences outside your app's scope
USER_ID = 'clarifai'
APP_ID = 'main'
# Change these to whatever model and image input you want to use
MODEL_ID = 'general-image-recognition'
MODEL_VERSION_ID = 'aa7f35c01e0642fda5cf400f543e7c40'
IMAGE_FILE_LOCATION = 'YOUR_IMAGE_FILE_LOCATION_HERE'

############################################################################
# YOU DO NOT NEED TO CHANGE ANYTHING BELOW THIS LINE TO RUN THIS EXAMPLE
############################################################################

from clarifai_grpc.channel.clarifai_channel import ClarifaiChannel
from clarifai_grpc.grpc.api import resources_pb2, service_pb2, service_pb2_grpc
from clarifai_grpc.grpc.api.status import status_code_pb2

channel = ClarifaiChannel.get_grpc_channel()
stub = service_pb2_grpc.V2Stub(channel)

metadata = (('authorization', 'Key ' + PAT),)

userDataObject = resources_pb2.UserAppIDSet(user_id=USER_ID, app_id=APP_ID)

with open(IMAGE_FILE_LOCATION, "rb") as f:
file_bytes = f.read()

post_model_outputs_response = stub.PostModelOutputs(
service_pb2.PostModelOutputsRequest(
user_app_id=userDataObject, # The userDataObject is created in the overview and is required when using a PAT
model_id=MODEL_ID,
version_id=MODEL_VERSION_ID, # This is optional. Defaults to the latest model version
inputs=[
resources_pb2.Input(
data=resources_pb2.Data(
image=resources_pb2.Image(
base64=file_bytes
)
)
)
]
),
metadata=metadata
)
if post_model_outputs_response.status.code != status_code_pb2.SUCCESS:
print(post_model_outputs_response.status)
raise Exception("Post model outputs failed, status: " + post_model_outputs_response.status.description)

# Since we have one input, one output will exist here
output = post_model_outputs_response.outputs[0]

print("Predicted concepts:")
for concept in output.data.concepts:
print("%s %.2f" % (concept.name, concept.value))

# Uncomment this line to print the full Response JSON
#print(output)

Predict Multiple Inputs

To predict multiple inputs at once and avoid the need for numerous API calls, you can use the following approach. Note that these examples are provided for cURL and Python, but the same concept is applicable to any supported programming language.

curl -X POST "https://api.clarifai.com/v2/users/clarifai/apps/main/models/general-image-recognition/versions/aa7f35c01e0642fda5cf400f543e7c40/outputs" \
-H "Authorization: Key YOUR_PAT_HERE" \
-H "Content-Type: application/json" \
-d '{
"inputs": [
{
"data": {
"image": {
"url": "https://samples.clarifai.com/metro-north.jpg"
}
}
},
{
"data": {
"image": {
"url": "...any other valid image url..."
}
}
},
# ... and so on
]
}'

Visual Detection

Input: Image

Output: Regions[...].data.concepts,regions[...].region_info

Visual detection, also known as object detection, involves identifying and locating objects or specific regions of interest within images.

Unlike image classification, which assigns a single label or category to the entire image, visual detection provides more detailed information by detecting and outlining multiple objects or regions within the image, associating them with specific classes or labels.

Visual detection models are trained on labeled datasets with class labels and bounding box coordinates, enabling them to recognize object patterns and positions during inferencing.

Below is an example of how you would perform visual detection using the Clarifai's general-image-detection model.

##################################################################################################
# In this section, we set the user authentication, user and app ID, model details, and the URL
# of the image we want as an input. Change these strings to run your own example.
#################################################################################################

# Your PAT (Personal Access Token) can be found in the Account's Security section
PAT = 'YOUR_PAT_HERE'
# Specify the correct user_id/app_id pairings
# Since you're making inferences outside your app's scope
USER_ID = 'clarifai'
APP_ID = 'main'
# Change these to whatever model and image URL you want to use
MODEL_ID = 'general-image-detection'
MODEL_VERSION_ID = '1580bb1932594c93b7e2e04456af7c6f'
IMAGE_URL = 'https://samples.clarifai.com/metro-north.jpg'
# To use a local file, assign the location variable
# IMAGE_FILE_LOCATION = 'YOUR_IMAGE_FILE_LOCATION_HERE'

############################################################################
# YOU DO NOT NEED TO CHANGE ANYTHING BELOW THIS LINE TO RUN THIS EXAMPLE
############################################################################

from clarifai_grpc.channel.clarifai_channel import ClarifaiChannel
from clarifai_grpc.grpc.api import resources_pb2, service_pb2, service_pb2_grpc
from clarifai_grpc.grpc.api.status import status_code_pb2

channel = ClarifaiChannel.get_grpc_channel()
stub = service_pb2_grpc.V2Stub(channel)

metadata = (('authorization', 'Key ' + PAT),)

userDataObject = resources_pb2.UserAppIDSet(user_id=USER_ID, app_id=APP_ID)

# To use a local file, uncomment the following lines
# with open(IMAGE_FILE_LOCATION, "rb") as f:
# file_bytes = f.read()

post_model_outputs_response = stub.PostModelOutputs(
service_pb2.PostModelOutputsRequest(
user_app_id=userDataObject, # The userDataObject is created in the overview and is required when using a PAT
model_id=MODEL_ID,
version_id=MODEL_VERSION_ID, # This is optional. Defaults to the latest model version
inputs=[
resources_pb2.Input(
data=resources_pb2.Data(
image=resources_pb2.Image(
url=IMAGE_URL
# base64=file_bytes
)
)
)
]
),
metadata=metadata
)
if post_model_outputs_response.status.code != status_code_pb2.SUCCESS:
print(post_model_outputs_response.status)
raise Exception("Post model outputs failed, status: " + post_model_outputs_response.status.description)

regions = post_model_outputs_response.outputs[0].data.regions

for region in regions:
# Accessing and rounding the bounding box values
top_row = round(region.region_info.bounding_box.top_row, 3)
left_col = round(region.region_info.bounding_box.left_col, 3)
bottom_row = round(region.region_info.bounding_box.bottom_row, 3)
right_col = round(region.region_info.bounding_box.right_col, 3)

for concept in region.data.concepts:
# Accessing and rounding the concept value
name = concept.name
value = round(concept.value, 4)

print((f"{name}: {value} BBox: {top_row}, {left_col}, {bottom_row}, {right_col}"))
Text Output Example
Building: 0.9396 BBox: 0.216, 0.002, 0.552, 0.25
Person: 0.8321 BBox: 0.497, 0.647, 0.669, 0.697
Tree: 0.6975 BBox: 0.392, 0.365, 0.507, 0.511
Building: 0.6604 BBox: 0.003, 0.305, 0.974, 0.999
Tree: 0.5274 BBox: 0.378, 0.932, 0.46, 0.998
Bench: 0.4542 BBox: 0.743, 0.822, 0.987, 0.999
Land vehicle: 0.4328 BBox: 0.512, 0.61, 0.573, 0.644
Person: 0.3903 BBox: 0.522, 0.039, 0.586, 0.058
Train: 0.3746 BBox: 0.471, 0.29, 0.543, 0.472
Waste container: 0.3714 BBox: 0.539, 0.738, 0.849, 0.893
Person: 0.3326 BBox: 0.532, 0.072, 0.578, 0.106

Visual Segmentation

Input: Image

Output: Regions[...].region_info.mask,regions[...].data.con

Visual segmentation, or image segmentation, involves partitioning an image into distinct regions, each representing a meaningful object or component. Its purpose is to break down an image into meaningful parts, making it easier to analyze and understand.

This is achieved by assigning labels to individual pixels based on shared characteristics. Image segmentation is commonly used to locate objects and boundaries in images, resulting in a set of segments that cover the entire image or a set of extracted contours.

Below is an example of how you would perform visual segmentation using the Clarifai's image-general-segmentation model.

##################################################################################################
# In this section, we set the user authentication, user and app ID, model details, and the URL
# of the image we want as an input. Change these strings to run your own example.
#################################################################################################

# Your PAT (Personal Access Token) can be found in the Account's Security section
PAT = 'YOUR_PAT_HERE'
# Specify the correct user_id/app_id pairings
# Since you're making inferences outside your app's scope
USER_ID = 'clarifai'
APP_ID = 'main'
# Change these to whatever model and image URL you want to use
MODEL_ID = 'image-general-segmentation'
MODEL_VERSION_ID = '1581820110264581908ce024b12b4bfb'
IMAGE_URL = 'https://samples.clarifai.com/metro-north.jpg'
# To use a local file, assign the location variable
# IMAGE_FILE_LOCATION = 'YOUR_IMAGE_FILE_LOCATION_HERE'

############################################################################
# YOU DO NOT NEED TO CHANGE ANYTHING BELOW THIS LINE TO RUN THIS EXAMPLE
############################################################################

from clarifai_grpc.channel.clarifai_channel import ClarifaiChannel
from clarifai_grpc.grpc.api import resources_pb2, service_pb2, service_pb2_grpc
from clarifai_grpc.grpc.api.status import status_code_pb2

channel = ClarifaiChannel.get_grpc_channel()
stub = service_pb2_grpc.V2Stub(channel)

metadata = (('authorization', 'Key ' + PAT),)

# To use a local file, uncomment the following lines
# with open(IMAGE_FILE_LOCATION, "rb") as f:
# file_bytes = f.read()

userDataObject = resources_pb2.UserAppIDSet(user_id=USER_ID, app_id=APP_ID)

post_model_outputs_response = stub.PostModelOutputs(
service_pb2.PostModelOutputsRequest(
user_app_id=userDataObject, # The userDataObject is created in the overview and is required when using a PAT
model_id=MODEL_ID,
version_id=MODEL_VERSION_ID, # This is optional. Defaults to the latest model version
inputs=[
resources_pb2.Input(
data=resources_pb2.Data(
image=resources_pb2.Image(
url=IMAGE_URL
# base64=file_bytes
)
)
)
]
),
metadata=metadata
)
if post_model_outputs_response.status.code != status_code_pb2.SUCCESS:
print(post_model_outputs_response.status)
raise Exception("Post model outputs failed, status: " + post_model_outputs_response.status.description)

regions = post_model_outputs_response.outputs[0].data.regions

for region in regions:
for concept in region.data.concepts:
# Accessing and rounding the concept's percentage of image covered
name = concept.name
value = round(concept.value, 4)
print((f"{name}: {value}"))
Text Output Example
sky-other: 0.2198
railroad: 0.1943
platform: 0.1773
ceiling-other: 0.1658
building-other: 0.1185
train: 0.0939
tree: 0.0098
person: 0.008
unlabeled: 0.0077
wall-concrete: 0.0047
fence: 0.0001

Image-to-Text

Input: Image

Output: Text

Image-to-text generation, also known as image captioning, refers to the process of generating textual descriptions or captions for images.

It involves using a model to analyze the content of an image and then generate a coherent and relevant textual description that describes what is happening in the image—similar to how humans would describe it.

Below is an example of how you would perform image-to-text generation using the general-english-image-caption-blip model.

##################################################################################################
# In this section, we set the user authentication, user and app ID, model details, and the URL
# of the image we want as an input. Change these strings to run your own example.
#################################################################################################

# Your PAT (Personal Access Token) can be found in the Account's Security section
PAT = "YOUR_PAT_HERE"
# Specify the correct user_id/app_id pairings
# Since you're making inferences outside your app's scope
USER_ID = "salesforce"
APP_ID = "blip"
# Change these to whatever model and image URL you want to use
MODEL_ID = "general-english-image-caption-blip"
MODEL_VERSION_ID = "cdb690f13e62470ea6723642044f95e4"
IMAGE_URL = "https://samples.clarifai.com/metro-north.jpg"
# To use a local file, assign the location variable
# IMAGE_FILE_LOCATION = "YOUR_IMAGE_FILE_LOCATION_HERE"

############################################################################
# YOU DO NOT NEED TO CHANGE ANYTHING BELOW THIS LINE TO RUN THIS EXAMPLE
############################################################################

from clarifai_grpc.channel.clarifai_channel import ClarifaiChannel
from clarifai_grpc.grpc.api import resources_pb2, service_pb2, service_pb2_grpc
from clarifai_grpc.grpc.api.status import status_code_pb2

channel = ClarifaiChannel.get_grpc_channel()
stub = service_pb2_grpc.V2Stub(channel)

metadata = (("authorization", "Key " + PAT),)

# To use a local file, uncomment the following lines
# with open(IMAGE_FILE_LOCATION, "rb") as f:
# file_bytes = f.read()

userDataObject = resources_pb2.UserAppIDSet(user_id=USER_ID, app_id=APP_ID)

post_model_outputs_response = stub.PostModelOutputs(
service_pb2.PostModelOutputsRequest(
user_app_id=userDataObject, # The userDataObject is created in the overview and is required when using a PAT
model_id=MODEL_ID,
version_id=MODEL_VERSION_ID, # This is optional. Defaults to the latest model version
inputs=[
resources_pb2.Input(
data=resources_pb2.Data(
image=resources_pb2.Image(
url=IMAGE_URL
# base64=file_bytes
)
)
)
],
),
metadata=metadata,
)
if post_model_outputs_response.status.code != status_code_pb2.SUCCESS:
print(post_model_outputs_response.status)
raise Exception(
"Post model outputs failed, status: "
+ post_model_outputs_response.status.description
)

# Since we have one input, one output will exist here
output = post_model_outputs_response.outputs[0]

print("Image caption:")
print(output.data.text.raw)

# Uncomment this line to print the full Response JSON
# print(output)
Text Output Example
Image caption:
a photograph of a train station with a train on the tracks
JSON Output Example
id: "d13769d2c8da461d9c806246ed925047"
status {
code: SUCCESS
description: "Ok"
}
created_at {
seconds: 1700936312
nanos: 267440491
}
model {
id: "general-english-image-caption-blip"
name: "Image Captioner"
created_at {
seconds: 1650312092
nanos: 67938000
}
modified_at {
seconds: 1660226508
nanos: 93093000
}
app_id: "blip"
model_version {
id: "cdb690f13e62470ea6723642044f95e4"
created_at {
seconds: 1681249232
nanos: 444463000
}
status {
code: MODEL_TRAINED
description: "Model is trained and ready"
}
visibility {
gettable: PUBLIC
}
app_id: "blip"
user_id: "salesforce"
metadata {
}
}
user_id: "salesforce"
model_type_id: "image-to-text"
visibility {
gettable: PUBLIC
}
workflow_recommended {
}
}
input {
id: "361e341bdc2541339fcf8b5c7c9e1452"
data {
image {
url: "https://samples.clarifai.com/metro-north.jpg"
}
}
}
data {
text {
raw: "a photograph of a train station with a train on the tracks"
text_info {
encoding: "UnknownTextEnc"
}
}
}