Skip to main content

Inference via API

Generate predictions using your deployed models


Our new inference technique provides an efficient, scalable, and streamlined way to perform predictions with models.

Built with a Python-first, user-centric design, this flexible approach simplifies the process of working with models — enabling users to focus more on building and iterating, and less on navigating API mechanics.

Prerequisites

Install Clarifai Packages

  • Install the latest version of the Clarifai Python SDK package:
 pip install --upgrade clarifai 
  • Install the latest version of the Clarifai Node.js SDK package:
 npm install clarifai-nodejs 

Get a PAT Key

You need a PAT (Personal Access Token) key to authenticate your connection to the Clarifai platform. You can generate the PAT key in your personal settings page by navigating to the Security section.

You can then set the PAT as an environment variable using CLARIFAI_PAT:

 export CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE 
tip

On Windows, the Clarifai Python SDK expects a HOME environment variable, which isn’t set by default. To ensure compatibility with file paths used by the SDK, set HOME to the value of your USERPROFILE. You can set it in your Command Prompt this way: set HOME=%USERPROFILE%.

Structure of Prediction Methods

Before making a prediction with a model, it’s important to understand how its prediction methods are structured.

You can learn more about the structure of model prediction methods here.

Initializing the Model Client
from clarifai.client import Model

# Initialize with explicit IDs
model = Model(user_id="model_user_id", app_id="model_app_id", model_id="model_id")

# Or initialize with model URL
model = Model(url="https://clarifai.com/model_user_id/model_app_id/models/model_id")

Get Available Methods

You can list all the methods implemented in the model's configuration that are available for performing model inference.

import os
from clarifai.client import Model

# Set your Personal Access Token (PAT)
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"

# Initialize with model URL
model = Model(url="https://clarifai.com/openai/chat-completion/models/o4-mini")

model_methods = model.available_methods()

print(model_methods)
Example Output
dict_keys(['predict', 'generate', 'chat'])

Get Method Signature

You can retrieve the method signature of a specified model's method to identify all its arguments and their type annotations, which are essential for performing model inference.

A method signature defines the method's name, its input parameters (with types and default values), and the return type, helping you understand how to properly call the method.

import os
from clarifai.client import Model

# Set your Personal Access Token (PAT)
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"

# Initialize with model URL
model = Model(url="https://clarifai.com/openai/chat-completion/models/o4-mini")

method_name = "predict" # Or, "generate", "chat", etc

method_signature = model.method_signature(method_name= method_name)

print(method_signature)
Example Output
def predict(prompt: str, image: data_types.Image, images: Any, chat_history: Any, max_tokens: float = 512.0, temperature: float = 1.0, top_p: float = 0.8, reasoning_effort: str = '"low"') -> str:

Generate Example Code

You can generate a sample code snippet to better understand how to perform inference using a model.

import os
from clarifai.client import Model

# Set your Personal Access Token (PAT)
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"

# Initialize with model URL
model = Model(url="https://clarifai.com/openai/chat-completion/models/o4-mini")

model_script = model.generate_client_script()

print(model_script)
Example Output
# Clarifai Model Client Script
# Set the environment variables `CLARIFAI_DEPLOYMENT_ID` and `CLARIFAI_PAT` to run this script.
# Example usage:
from clarifai.runners.utils import data_types
import os
from clarifai.client import Model

model = Model("www.clarifai.com/openai/chat-completion/o4-mini",
deployment_id = os.environ['CLARIFAI_DEPLOYMENT_ID'], # Only needed for dedicated deployed models
)

# Example model prediction from different model methods:

response = model.predict(prompt='What is the future of AI?', image=Image(url='https://samples.clarifai.com/metro-north.jpg'), images=None, chat_history=None, max_tokens=512.0, temperature=1.0, top_p=0.8, reasoning_effort='"low"')
print(response)

response = model.generate(prompt='What is the future of AI?', image=Image(url='https://samples.clarifai.com/metro-north.jpg'), images=None, chat_history=None, max_tokens=512.0, temperature=0.7, top_p=0.8, reasoning_effort='"low"')
for res in response:
print(res)

response = model.chat(messages=None, max_tokens=750.0, temperature=0.7, top_p=0.8, reasoning_effort='"low"')
for res in response:
print(res)
Set up a deployment

To use Clarifai’s Compute Orchestration capabilities, ensure your model is deployed, as described previously. Then, configure the deployment_id parameter — alternatively, you can specify compute_cluster_id and nodepool_id. If none of these are set, the prediction will default to the Clarifai Shared deployment type.

model = Model(
url="MODEL_URL_HERE",
deployment_id="DEPLOYMENT_ID_HERE",
# Or, set cluster and nodepool
# compute_cluster_id = "COMPUTE_CLUSTER_ID_HERE",
# nodepool_id = "NODEPOOL_ID_HERE"
)

Unary-Unary Predict Call

This is the simplest form of prediction: a single input is sent to the model, and a single response is returned. It’s ideal for quick, non-streaming tasks, such as classifying an image or analyzing a short piece of text.

NOTE: Streaming means that the response is streamed back token by token, rather than waiting for the entire completion to be generated before returning. This is useful for building interactive applications where you want to display the response as it's being generated.

Text Inputs

Here is an example of a model signature configured on the server side for handling text inputs:

@ModelClass.method
def predict(self, prompt: str = "") -> str:

Here’s how you can make a corresponding unary-unary predict call from the client side:

import os
from clarifai.client import Model

# Set your Personal Access Token (PAT)
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"

# Initialize with model URL
model = Model(
url="https://clarifai.com/openai/chat-completion/models/o4-mini",
# deployment_id="DEPLOYMENT_ID_HERE"
)

response = model.predict("What is photosynthesis?")
# Or
# response = model.predict(prompt="What is photosynthesis?")

print(response)
Example Output
Photosynthesis is the process by which certain organisms—primarily plants, algae, and some bacteria—convert light energy (usually from the sun) into chemical energy stored in sugars. In essence, these organisms capture carbon dioxide (CO₂) from the air and water (H₂O) from the soil, then use sunlight to drive a series of reactions that produce oxygen (O₂) as a by-product and synthesize glucose (C₆H₁₂O₆) or related carbohydrates.

Key points:

1. Light absorption
• Chlorophyll and other pigments in chloroplasts (in plants and algae) absorb photons, elevating electrons to higher energy states.

2. Light-dependent reactions (in thylakoid membranes)
• Convert light energy into chemical energy in the form of ATP and NADPH.
• Split water molecules, releasing O₂.

3. Calvin cycle (light-independent reactions, in the stroma)
• Use ATP and NADPH to fix CO₂ into organic molecules.
• Produce glyceraldehyde-3-phosphate (G3P), which can be converted into glucose and other carbs.

Overall simplified equation:
6 CO₂ + 6 H₂O + light energy → C₆H₁₂O₆ + 6 O₂

Importance:
• Generates the oxygen we breathe.
• Forms the base of most food chains by producing organic matter.
• Plays a critical role in the global carbon cycle and helps mitigate CO₂ in the atmosphere.

Image Inputs

Here is an example of a model signature configured on the server side for handling image inputs:

@ModelClass.method
def predict(self, image: Image) -> str:

Here’s how you can make a corresponding unary-unary predict call from the client side:

import os
from clarifai.client import Model
from clarifai.runners.utils.data_types import Image

# Set your Personal Access Token (PAT)
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"

# Initialize with model URL
model = Model(
url="https://clarifai.com/openai/chat-completion/models/o4-mini",
# deployment_id="DEPLOYMENT_ID_HERE"
)

response = model.predict(
prompt="Describe the image",
image=Image(url="https://samples.clarifai.com/cat1.jpeg")
)

print(response)

"""
# --- Predict using an image uploaded from a local machine ---

# 1. Specify the path to your local image file
local_image_path = "path/to/your/image.jpg" # Replace with the actual path to your image

# 2. Read the image file into bytes
with open(local_image_path, "rb") as f:
image_bytes = f.read()

response = model.predict(
prompt="Describe the image",
# Provide Image as bytes
image=Image(bytes=image_bytes)
)

print(response)

# You can also convert a Pillow (PIL) Image object into a Clarifai Image data type
# image=Image.from_pil(pil_image)

"""
Example Output
The image shows a young ginger tabby cat lying on its side against what looks like a rough, earth-toned wall. Its coat is a warm orange with classic darker orange stripe markings. The cat’s front paw is tucked in, and its head rests on the surface below, with its large amber eyes gazing directly toward the viewer. The lighting is soft, highlighting the cat’s whiskers, ear fur, and the texture of its velvety coat. Overall, the scene feels calm and slightly curious, as if the cat has paused mid-nap to watch something interesting.
tip

Click here to explore how to make predictions with other data types.

Unary-Stream Predict Call

This call sends a single input to the model but returns a stream of responses. This is especially useful for tasks that produce multiple outputs from one input, such as generating text completions or progressive predictions from a prompt.

Text Inputs

Here is an example of a model signature configured on the server side for handling text inputs:

@ModelClass.method
def generate(self, prompt: str) -> Iterator[str]:

Here’s how you can make a corresponding unary-stream predict call from the client side:

import os

from clarifai.client import Model

# Set your Personal Access Token (PAT)
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"

# Initialize with model URL
model = Model(
url="https://clarifai.com/openai/chat-completion/models/o4-mini",
# deployment_id="DEPLOYMENT_ID_HERE"
)

response_stream = model.generate(
prompt="Explain quantum computing in simple terms"
)

for text_chunk in response_stream:
print(text_chunk, end="", flush=True)

"""
# --- Load prompt text from URL ---

prompt_from_url = requests.get("https://samples.clarifai.com/featured-models/redpajama-economic.txt") # Remember to import requests
prompt_text = prompt_from_url.text.strip()

response_stream = model.generate(
prompt=prompt_text
)

for text_chunk in response_stream:
print(text_chunk, end="", flush=True)

"""

Image Inputs

Here is an example of a model signature configured on the server side for handling image inputs:

@ModelClass.method
def generate(self, image: Image) -> Iterator[str]:

Here’s how you can make a corresponding unary-stream predict call from the client side:

import os
from clarifai.client import Model
from clarifai.runners.utils.data_types import Image

# Set your Personal Access Token (PAT)
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"

# Initialize with model URL
model = Model(
url="https://clarifai.com/openai/chat-completion/models/o4-mini",
# deployment_id="DEPLOYMENT_ID_HERE"
)

response_stream = model.generate(
prompt="Describe the image",
image=Image(url="https://samples.clarifai.com/cat1.jpeg")
)

for text_chunk in response_stream:
print(text_chunk, end="", flush=True)


"""
# --- Predict using an image uploaded from a local machine ---

# 1. Specify the path to your local image file
local_image_path = "path/to/your/image.jpg" # Replace with the actual path to your image

# 2. Read the image file into bytes
with open(local_image_path, "rb") as f:
image_bytes = f.read()

response_stream = model.generate(
prompt="Describe the image",
# Provide Image as bytes
image=Image(bytes=image_bytes)
)

for text_chunk in response_stream:
print(text_chunk, end="", flush=True)

# You can also convert a Pillow (PIL) Image object into a Clarifai Image data type
# image=Image.from_pil(pil_image)

"""

Stream-Stream Predict Call

This call enables bidirectional streaming of both inputs and outputs, making it ideal for real-time applications and processing large datasets.

In this setup, multiple inputs can be continuously streamed to the model, while predictions are returned in real time. It’s especially useful for use cases like live video analysis or streaming sensor data.

Text Inputs

Here is an example of a model signature configured on the server side for handling text inputs:

@ModelClass.method
def stream(self, input_iterator: Iterator[str]) -> Iterator[str]:

Here’s how you can make a corresponding stream-stream predict call from the client side:

import os
from clarifai.client import Model
from clarifai.runners.utils.data_types import Text

# Set your Personal Access Token (PAT)
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"

# Initialize with model URL
model = Model(
url="MODEL_URL_HERE",
# deployment_id="YOUR_DEPLOYMENT_ID_HERE"
)

# Create a list of input Texts to simulate a stream
input_texts = iter([
Text(text="First input."),
Text(text="Second input."),
Text(text="Third input.")
])

# Call the stream method and process outputs
response_iterator = model.stream(input_texts)

# Print streamed results
print("Streaming output:\n")
for response in response_iterator:
print(response.text)

Audio Inputs

Here is an example of a model signature configured on the server side for handling audio inputs:

@ModelClass.method
def transcribe_audio(self, audio: Iterator[Audio]) -> Iterator[Text]:

Here’s how you can make a corresponding stream-stream predict call from the client side:

import os
from clarifai.client import Model
from clarifai.runners.utils.data_types import Audio

# Set your Personal Access Token (PAT)
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"

# Initialize with model URL
model = Model(
url="MODEL_URL_HERE",
# deployment_id="DEPLOYMENT_ID_HERE"
)

# client-side streaming
response_stream = model.transcribe_audio(
audio=iter(Audio(bytes=b''))
# Or, provide audio as URL
# audio=Audio(url="https://example.com/audio.mp3")
)

for text_chunk in response_stream:
print(text_chunk.text, end="", flush=True)

Dynamic Batch Prediction Handling

Clarifai’s model framework seamlessly supports both single and batch predictions through a unified interface. It dynamically adapts to the input format, so no code changes are needed.

The system automatically detects the type of input provided:

  • If you pass a single input, it’s treated as a singleton batch;

  • If you pass multiple inputs as a list, they are handled as a parallel batch.

This means you can pass either a single input or a list of inputs, and the system will automatically process them appropriately — making your code cleaner and more flexible.

Image Inputs

Here is an example of a model signature configured on the server side for handling image inputs:

@ModelClass.method
def predict_image(self, image: Image) -> Dict[str, float]:

Here’s how you can perform batch predictions with image inputs from the client side:

import os
from clarifai.client import Model
from clarifai.runners.utils.data_types import Image

# Set your Personal Access Token (PAT)
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"

# Initialize with model URL
model = Model(
url="MODEL_URL_HERE",
# deployment_id="DEPLOYMENT_ID_HERE"
)

# Batch processing (automatically handled)
batch_results = model.predict_image([
{"image": Image(url="https://samples.clarifai.com/cat1.jpeg")},
{"image": Image(url="https://samples.clarifai.com/cat2.jpeg")},
])

for i, pred in enumerate(batch_results):
print(f"Image {i+1} cat confidence: {pred['cat']:.2%}")

Text Inputs

Here is an example of a model signature configured on the server side for handling text inputs:

class TextClassifier(ModelClass):
@ModelClass.method
def predict(self, text: Text) -> float:
"""Single text classification (automatically batched)"""
return self.model(text.text)

Here’s how you can perform batch predictions with text inputs from the client side:

import os
from clarifai.client import Model
from clarifai.runners.utils.data_types import Text

# Set your Personal Access Token (PAT)
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"

# Initialize with model URL
model = Model(
url="MODEL_URL_HERE",
# deployment_id="DEPLOYMENT_ID_HERE"
)

# Batch prediction
batch_results = model.predict([
{"text": Text("Positive review")},
{"text": Text("Positive review")},
{"text": Text("Positive review")},
])

Multimodal Predictions

You can make predictions using models that support multimodal inputs, such as a combination of images and text.

Additionally, you can configure various inference parameters to customize your prediction requests to better suit your use case.

tip

Click here to learn more how to make multimodal predictions, including how to use parameters like chat_history and messages.

Here is an example:

import os
from clarifai.client import Model
from clarifai.runners.utils.data_types import Image

# Set your Personal Access Token (PAT)
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"

# Initialize model
model = Model(
url="MODEL_URL_HERE",
# deployment_id="DEPLOYMENT_ID_HERE"
)

# Perform prediction with prompt and image
result = model.predict(
prompt="What is the future of AI?",
image=Image(url="https://samples.clarifai.com/metro-north.jpg"),
max_tokens=512,
temperature=0.7,
top_p=0.8
)

# Print the prediction result
print(result)

Tool Calling

Tool calling in LLMs is a capability that allows models to autonomously decide when and how to call external tools, functions, or APIs during a conversation — based on the user’s input and the context.

You can learn more about it here.

from clarifai.client import Model
import os

# Set your Personal Access Token (PAT)
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"

# Initialize with model URL
model = Model(url="https://clarifai.com/anthropic/completion/models/claude-sonnet-4")

# Define tools
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current temperature for a given location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and country e.g. Bogotá, Colombia"
},
"units": {
"type": "string",
"description": "Temperature units, e.g. Celsius or Fahrenheit",
"enum": ["Celsius", "Fahrenheit"]
}
},
"required": ["location"],
"additionalProperties": False
},
"strict": True
}
}
]

response = model.generate(
prompt="What is the temperature in Tokyo in Celsius?",
tools=tools,
tool_choice='auto',
max_tokens=1024,
temperature=0.5,
)

# Print response summary
print("Iterate or print response as needed:\n", response)