Model Inference
Perform predictions using your deployed models
Clarifai's Compute Orchestration capabilities provide efficient ways to make prediction calls to suit various use cases. Once your model is deployed, you can use it to perform inferences seamlessly.
To ensure proper routing and execution, you must specify the deployment_id
parameter. This parameter is essential in directing prediction requests to the appropriate cluster and nodepool. For example, you can route requests to a GCP cluster by selecting a corresponding deployment ID, use a different deployment ID for an AWS cluster, and yet another for an on-premises deployment. This gives you full control over performance, costs, and security, allowing you to focus on building cutting-edge AI solutions while we handle the infrastructure complexity.
Via the API
Unary-Unary Predict Call
This is the simplest type of prediction. In this method, a single input is sent to the model, and it returns a single response. This is ideal for tasks where a quick, non-streaming prediction is required, such as classifying an image.
It supports the following prediction methods:
predict_by_url
— Use a publicly accessible URL for the input.predict_by_bytes
— Pass raw input data directly.predict_by_filepath
— Provide the local file path for the input.
- Python
- CLI
from clarifai.client.model import Model
model_url = "https://clarifai.com/stepfun-ai/ocr/models/got-ocr-2_0"
# URL of the image to analyze
image_url = "https://samples.clarifai.com/featured-models/model-ocr-scene-text-las-vegas-sign.png"
# Initialize the model
model = Model(
url=model_url,
pat="YOUR_PAT_HERE"
)
# Make a prediction using the model with the specified compute cluster and nodepool
model_prediction = model.predict_by_url(
image_url,
input_type="image",
deployment_id="test-deployment"
)
# Print the output
print(model_prediction.outputs[0].data.text.raw)
clarifai model predict --model_id got-ocr-2_0 --user_id stepfun-ai --app_id ocr --url "https://samples.clarifai.com/featured-models/ocr-woman-holding-sold-sign.jpg" --input_type image --deployment_id "test-deployment"
Unary-Stream Predict Call
The Unary-Stream predict call processes a single input, but returns a stream of responses. It is particularly useful for tasks where multiple outputs are generated from a single input, such as generating text completions from a prompt.
It supports the following prediction methods:
generate_by_url
— Provide a publicly accessible URL and handle the streamed responses iteratively.generate_by_bytes
— Use raw input data.generate_by_filepath
— Use a local file path for the input.
- Python
from clarifai.client.model import Model
model_url = "https://clarifai.com/meta/Llama-3/models/llama-3_2-3b-instruct"
# URL of the prompt text
text_url = "https://samples.clarifai.com/featured-models/falcon-instruction-guidance.txt"
# Initialize the model
model = Model(
url=model_url,
pat="YOUR_PAT_HERE"
)
# Perform unary-stream prediction with the specified compute cluster and nodepool
stream_response = model.generate_by_url(
text_url,
input_type="text",
deployment_id="test-deployment"
)
# Handle the stream of responses
list_stream_response = [response for response in stream_response]
Stream-Stream Predict Call
The stream-stream predict call enables bidirectional streaming of both inputs and outputs, making it highly effective for processing large datasets or real-time applications.
In this setup, multiple inputs can be continuously sent to the model, and the corresponding multiple predictions are streamed back in real-time. This is ideal for tasks like real-time video processing/predictions or live sensor data analysis.
It supports the following prediction methods:
stream_by_url
— Stream a list of publicly accessible URLs and receive a stream of predictions. It takes an iterator of inputs and returns a stream of predictions.stream_by_bytes
— Stream raw input data.stream_by_filepath
— Stream inputs from local file paths.
- Python
from clarifai.client.model import Model
model_url = "https://clarifai.com/meta/Llama-3/models/llama-3_2-3b-instruct"
# URL of the prompt text
text_url = "https://samples.clarifai.com/featured-models/falcon-instruction-guidance.txt"
# Initialize the model
model = Model(
url=model_url,
pat="YOUR_PAT_HERE"
)
# Perform stream-stream prediction with the specified compute cluster and nodepool
stream_response = model.stream_by_url(
iter([text_url]),
input_type="text",
deployment_id="test-deployment"
)
# Handle the stream of responses
list_stream_response = [response for response in stream_response]
Via the UI
Model Playground
To access your deployments, navigate to the model’s playground page and select the Deployments tab.
Here, you’ll find a Deployments & Usage table listing all deployments associated with the model, including details such as the cluster and nodepool. You can also sort the table alphabetically (A–Z or Z–A) based on your preferences.
To select a deployment, click the Deployment button. A dropdown list will appear, showing your available deployments. Choose the one you want to use to direct traffic to a specific cluster and nodepool.
If no selection is made, the default Clarifai Shared deployment will be used.
Once you’ve selected a deployment ID, go to the Overview pane to use it for making prediction requests.
When inferencing using a deployed model, the request is routed to the nodepool within the cloud region specified in the cluster, and the model’s predictions are returned as output in real time.
Predictions Within Input-Viewer
The single Input-Viewer is the main page that showcases the details of a single input available in your app. If you click an input listed on the Inputs-Manager page, you'll be redirected to the viewer page for that input, where you can view and interact with it.
To make predictions on an input, switch to predict mode by toggling the Predict button located in the top-right corner of the page. Next, click the Choose a model or workflow button in the right-hand sidebar to select the model you want to use.
In the window that appears, choose your desired model and then select a deployment from the Deployment dropdown. If needed, you can also create a new deployment from this window.
Lastly, click the Predict button at the bottom of the sidebar. The model will process the input and return predictions in real time, allowing you to immediately view the results within the Input-Viewer screen.