Inference via the API
Perform predictions using your deployed models
Our new inference technique provides an efficient, scalable, and streamlined way to perform predictions with models.
Prerequisites
Install Clarifai Package
Install the latest version of the clarifai
Python SDK package.
- Bash
pip install --upgrade clarifai
Get a PAT Key
You need a PAT (Personal Access Token) key to authenticate your connection to the Clarifai platform. You can generate the PAT key in your personal settings page by navigating to the Security section.
You can then set the PAT as an environment variable using CLARIFAI_PAT
.
On Windows, the Clarifai Python SDK expects a HOME
environment variable, which isn’t set by default. To ensure compatibility with file paths used by the SDK, set HOME
to the value of your USERPROFILE
. You can set it in your Command Prompt this way: set HOME=%USERPROFILE%
.
Structure of Prediction Methods
Before making a prediction with a model, it’s important to understand how its prediction methods are structured.
You can learn more about the structure of model prediction methods here.
Get Available Methods
You can list all the methods implemented in the model's configuration that are available for performing model inference.
- Python SDK
import os
from clarifai.client import Model
# Set your Personal Access Token (PAT)
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"
# Initialize with model URL
model = Model(url="MODEL_URL_HERE")
model_methods = model.available_methods()
print(model_methods)
Example Output
dict_keys(['predict', 'generate', 'chat'])
When instantiating a model, you can use either its explicit IDs or its URL.
# Initialize with explicit IDs
model = Model(user_id="model_user_id", app_id="model_app_id", model_id="model_id"
)
# Or initialize with model URL
model = Model(url="https://clarifai.com/model_user_id/model_app_id/models/model_id")
Get Method Signature
You can retrieve the method signature of a specified model's method to identify all its arguments and their type annotations, which are essential for performing model inference.
A method signature defines the method's name, its input parameters (with types and default values), and the return type, helping you understand how to properly call the method.
- Python SDK
import os
from clarifai.client import Model
# Set your Personal Access Token (PAT)
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"
# Initialize with model URL
model = Model(url="MODEL_URL_HERE")
method_name = "predict" # Or, "generate", "chat", etc
method_signature = model.method_signature(method_name= method_name)
print(method_signature)
Example Output
def predict(prompt: str, image: data_types.Image, images: List[data_types.Image], chat_history: List[data_types.JSON], max_tokens: int = 512, temperature: float = 0.7, top_p: float = 0.8) -> str:
Generate Example Code
You can generate a sample code snippet to better understand how to perform inference using a model.
- Python SDK
import os
from clarifai.client import Model
# Set your Personal Access Token (PAT)
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"
# Initialize with model URL
model = Model(url="MODEL_URL_HERE")
model_script = model.generate_client_script()
print(model_script)
Example Output
# Clarifai Model Client Script
# Set the environment variables `CLARIFAI_DEPLOYMENT_ID` and `CLARIFAI_PAT` to run this script.
# Example usage:
from clarifai.runners.utils import data_types
import os
from clarifai.client import Model
model = Model("www.clarifai.com/luv_2261/test-upload/gemma-3-4b-it",
deployment_id = os.environ['CLARIFAI_DEPLOYMENT_ID'], # Only needed for dedicated deployed models
)
# Example model prediction from different model methods:
response = model.predict(prompt='What is the future of AI?', image=Image(url='https://samples.clarifai.com/metro-north.jpg'), images=[Image(url='https://samples.clarifai.com/metro-north.jpg')], chat_history=None, max_tokens=512, temperature=0.7, top_p=0.8)
print(response)
response = model.generate(prompt='What is the future of AI?', image=Image(url='https://samples.clarifai.com/metro-north.jpg'), images=[Image(url='https://samples.clarifai.com/metro-north.jpg')], chat_history=None, max_tokens=512, temperature=0.7, top_p=0.8)
for res in response:
print(res)
response = model.chat(messages=None, tools=None, max_tokens=512, temperature=0.7, top_p=0.8)
for res in response:
print(res)
To use Clarifai’s Compute Orchestration capabilities, ensure your model is deployed, as described previously.
Then, configure the deployment_id
parameter — alternatively, you can specify compute_cluster_id
and nodepool_id
. If none of these are set, the prediction will default to the Clarifai Shared
deployment type.
model = Model(
url="MODEL_URL_HERE",
deployment_id="DEPLOYMENT_ID_HERE",
# Or, set cluster and nodepool
# compute_cluster_id = "COMPUTE_CLUSTER_ID_HERE",
# nodepool_id = "NODEPOOL_ID_HERE"
)
Unary-Unary Predict Call
This is the simplest form of prediction: a single input is sent to the model, and a single response is returned. It’s ideal for quick, non-streaming tasks, such as classifying an image or analyzing a short piece of text.
NOTE: Streaming refers to the continuous flow of data between a client and a model, rather than sending or receiving all the data at once.
Here is an example of a model signature configured on the server side for handling image inputs:
- Python
@ModelClass.method
def predict_image(self, image: Image) -> Dict[str, float]:
Here’s how you can make a corresponding unary-unary predict call from the client side:
- Python SDK
import os
from clarifai.client import Model
from clarifai.runners.utils.data_types import Image
# Set your Personal Access Token (PAT)
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"
# Initialize with model URL
model = Model(
url="MODEL_URL_HERE",
deployment_id="DEPLOYMENT_ID_HERE"
)
result = model.predict_image(
image=Image(url="https://samples.clarifai.com/cat1.jpeg")
# Or, provide Image as bytes. You can also convert a Pillow (PIL) Image object into a Clarifai Image data type
# image=Image(bytes=b"")
# image=Image.from_pil(pil_image)
)
print(f"Cat confidence: {result['cat']:.2%}")
Click here to explore how to make predictions with other data types.
Unary-Stream Predict Call
This call sends a single input to the model but returns a stream of responses. This is especially useful for tasks that produce multiple outputs from one input, such as generating text completions or progressive predictions from a prompt.
Here is an example of a model signature configured on the server side for handling text inputs:
- Python
@ModelClass.method
def generate(self, prompt: Text) -> Iterator[Text]:
Here’s how you can make a corresponding unary-stream predict call from the client side:
- Python SDK
import os
from clarifai.client import Model
from clarifai.runners.utils.data_types import Text
# Set your Personal Access Token (PAT)
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"
# Initialize with model URL
model = Model(
url="MODEL_URL_HERE",
deployment_id="DEPLOYMENT_ID_HERE"
)
response_stream = model.generate(
prompt=Text("Explain quantum computing in simple terms")
# Or, provide text as URL
# prompt=Text(url="https://example.com/text.txt")
)
for text_chunk in response_stream:
print(text_chunk.text, end="", flush=True)
Stream-Stream Predict Call
This call enables bidirectional streaming of both inputs and outputs, making it ideal for real-time applications and processing large datasets.
In this setup, multiple inputs can be continuously streamed to the model, while predictions are returned in real time. It’s especially useful for use cases like live video analysis or streaming sensor data.
Here is an example of a model signature configured on the server side for handling audio inputs:
- Python
@ModelClass.method
def transcribe_audio(self, audio: Iterator[Audio]) -> Iterator[Text]:
Here’s how you can make a corresponding stream-stream predict call from the client side:
- Python SDK
import os
from clarifai.client import Model
from clarifai.runners.utils.data_types import Audio
# Set your Personal Access Token (PAT)
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"
# Initialize with model URL
model = Model(
url="MODEL_URL_HERE",
deployment_id="DEPLOYMENT_ID_HERE"
)
# client-side streaming
response_stream = model.transcribe_audio(
audio=iter(Audio(bytes=b''))
# Or, provide audio as URL
# audio=Audio(url="https://example.com/audio.mp3")
)
for text_chunk in response_stream:
print(text_chunk.text, end="", flush=True)
Dynamic Batch Prediction Handling
Clarifai’s model framework seamlessly supports both single and batch predictions through a unified interface. It dynamically adapts to the input format, so no code changes are needed.
The system automatically detects the type of input provided:
-
If you pass a single input, it’s treated as a singleton batch;
-
If you pass multiple inputs as a list, they are handled as a parallel batch.
This means you can pass either a single input or a list of inputs, and the system will automatically process them appropriately — making your code cleaner and more flexible.
Image Inputs
Here is an example of a model signature configured on the server side for handling image inputs:
- Python
@ModelClass.method
def predict_image(self, image: Image) -> Dict[str, float]:
Here’s how you can perform batch predictions with image inputs from the client side:
- Python SDK
import os
from clarifai.client import Model
from clarifai.runners.utils.data_types import Image
# Set your Personal Access Token (PAT)
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"
# Initialize with model URL
model = Model(
url="MODEL_URL_HERE",
deployment_id="DEPLOYMENT_ID_HERE"
)
# Batch processing (automatically handled)
batch_results = model.predict_image([
{"image": Image(url="https://samples.clarifai.com/cat1.jpeg")},
{"image": Image(url="https://samples.clarifai.com/cat2.jpeg")},
])
for i, pred in enumerate(batch_results):
print(f"Image {i+1} cat confidence: {pred['cat']:.2%}")
Text Inputs
Here is an example of a model signature configured on the server side for handling text inputs:
- Python
class TextClassifier(ModelClass):
@ModelClass.method
def predict(self, text: Text) -> float:
"""Single text classification (automatically batched)"""
return self.model(text.text)
Here’s how you can perform batch predictions with text inputs from the client side:
- Python SDK
import os
from clarifai.client import Model
from clarifai.runners.utils.data_types import Text
# Set your Personal Access Token (PAT)
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"
# Initialize with model URL
model = Model(
url="MODEL_URL_HERE",
deployment_id="DEPLOYMENT_ID_HERE"
)
# Batch prediction
batch_results = model.predict([
{"text": Text("Positive review")},
{"text": Text("Positive review")},
{"text": Text("Positive review")},
])
Multimodal Predictions
You can make predictions using models that support multimodal inputs, such as a combination of images and text.
Additionally, you can configure various inference parameters to customize your prediction requests to better suit your use case.
Click here to learn more how to make multimodal predictions, including how to use parameters like chat_history
and messages
.
Here is an example:
- Python SDK
import os
from clarifai.client import Model
from clarifai.runners.utils.data_types import Image
# Set your Personal Access Token (PAT)
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"
# Initialize model
model = Model(
url="MODEL_URL_HERE",
deployment_id="DEPLOYMENT_ID_HERE"
)
# Perform prediction with prompt and image
result = model.predict(
prompt="What is the future of AI?",
image=Image(url="https://samples.clarifai.com/metro-north.jpg"),
max_tokens=512,
temperature=0.7,
top_p=0.8
)
# Print the prediction result
print(result)