LiteLLM
Run inferences on Clarifai models using LiteLLM
LiteLLM provides a universal interface that simplifies working with LLMs across multiple providers. It offers a single, consistent API for making inferences, allowing you to interact with a wide range of models using the same method, regardless of the underlying provider.
LiteLLM natively supports OpenAI-compatible APIs, making it easy to run inferences on Clarifai-hosted models with minimal setup.
Click here for additional examples on how to perform model predictions using various SDKs — such as the Clarifai SDK, OpenAI client, and LiteLLM. The examples demonstrate various model types and include both streaming and non-streaming modes, as well as tool calling capabilities.
Prerequisites
Install LiteLLM
Install the litellm package.
- Python
 pip install litellm 
Get a PAT Key
You need a PAT key to authenticate your connection to the Clarifai platform. You can get one by navigating to Settings in the collapsible left sidebar, selecting Secrets, and creating or copying an existing token from there.
You can then set the PAT as an environment variable using CLARIFAI_PAT:
- Unix-Like Systems
- Windows
 export CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE 
 set CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE 
Get a Clarifai Model
Go to the Clarifai Community platform and select the model you want to use for making predictions.
Note: When specifying a Clarifai model in LiteLLM, use the model path prefixed with
openai/, followed by the full Clarifai model URL. For example:openai/https://clarifai.com/openai/chat-completion/models/o4-mini.
Chat Completions
In LiteLLM, the completion() function is the primary method for interacting with language models that follow the OpenAI Chat API format. It supports both traditional completions and chat-based interactions by accepting a list of messages — similar to OpenAI’s chat.completions.create().
- Python SDK
import os
import litellm
response = litellm.completion(
    model="openai/https://clarifai.com/anthropic/completion/models/claude-sonnet-4",
    api_key=os.environ["CLARIFAI_PAT"],  # Ensure CLARIFAI_PAT is set as an environment variable
    api_base="https://api.clarifai.com/v2/ext/openai/v1",
    # Message formatting is consistent with OpenAI's schema ({"role": ..., "content": ...}).
    messages=[
        {"role": "system", "content": "You are a friendly assistant."},
        {"role": "user", "content": "Hey, how's it going?"}
    ],
    # You can add OpenAI-compatible parameters here
    temperature=0.7,         # Optional: controls randomness
    max_tokens=100           # Optional: limits response length
)
# Print the assistant's reply
print(response['choices'][0]['message']['content'])
Example Output
Hey there! I'm doing well, thanks for asking! How are you doing today? Is there anything I can help you with or would you like to chat about something?
Streaming
When streaming is enabled by setting stream=True, the completion method returns an iterator that yields partial responses in real time as the model generates them, instead of a single complete dictionary.
- Python SDK
import os
import litellm
# Enable streaming
response_stream = litellm.completion(
    model="openai/https://clarifai.com/anthropic/completion/models/claude-sonnet-4",
    api_key=os.environ["CLARIFAI_PAT"],  # Ensure CLARIFAI_PAT is set as an environment variable
    api_base="https://api.clarifai.com/v2/ext/openai/v1",
    messages=[
        {"role": "system", "content": "You are a friendly assistant."},
        {"role": "user", "content": "Hey, how's it going? Tell me a short story about a space-faring cat."}
    ],
    stream=True  # Enable streaming
)
# Print the streamed output in real time
for chunk in response_stream:
    content = chunk.get("choices", [{}])[0].get("delta", {}).get("content")
    if content:
        print(content, end="", flush=True)
Tool Calling
Clarifai models accessed via LiteLLM fully support tool calling, enabling advanced interactions such as function execution during a conversation.
- Python SDK
import os
import litellm
# Define the tool (function) the model can call
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Retrieve the current temperature for a given location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City and country, e.g., 'Tokyo, Japan'"
                    }
                },
                "required": ["location"],
                "additionalProperties": False
            }
        }
    }
]
# Send the request to a Clarifai-hosted model using LiteLLM
response = litellm.completion(
    model="openai/https://clarifai.com/openai/chat-completion/models/o4-mini",
    api_key=os.environ["CLARIFAI_PAT"],  # Ensure CLARIFAI_PAT is set as an environment variable
    api_base="https://api.clarifai.com/v2/ext/openai/v1",
    messages=[
        {"role": "user", "content": "What is the weather in Paris today?"}
    ],
    tools=tools
)
# Output the tool call suggested by the model (if any)
print(response.choices[0].message.tool_calls)
Tool Calling Implementation Example
import os
import json
import litellm
# Step 1: Define the tool schema
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Retrieve the current temperature for a given location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City and country, e.g., 'Tokyo, Japan'"
                    }
                },
                "required": ["location"],
                "additionalProperties": False
            }
        }
    }
]
# Step 2: Define a function that simulates tool execution
def get_weather(location: str) -> str:
    # In a real app, you'd query a weather API here
    return f"The current temperature in {location} is 22°C."
# Step 3: Make the initial request to trigger the tool
response = litellm.completion(
    model="openai/https://clarifai.com/openai/chat-completion/models/o4-mini",
    api_key=os.environ["CLARIFAI_PAT"],
    api_base="https://api.clarifai.com/v2/ext/openai/v1",
    messages=[
        {"role": "user", "content": "What is the weather in Paris today?"}
    ],
    tools=tools
)
tool_calls = response.choices[0].message.tool_calls
# Step 4: Parse the tool call and run the function
if tool_calls:
    for tool_call in tool_calls:
        tool_name = tool_call.function.name
        arguments = json.loads(tool_call.function.arguments)
        
        if tool_name == "get_weather":
            result = get_weather(arguments["location"])
            
            # Step 5: Send the function result back to the model
            follow_up = litellm.completion(
                model="openai/https://clarifai.com/openai/chat-completion/models/o4-mini",
                api_key=os.environ["CLARIFAI_PAT"], # Ensure CLARIFAI_PAT is set as an environment variable
                api_base="https://api.clarifai.com/v2/ext/openai/v1",
                messages=[
                    {"role": "user", "content": "What is the weather in Paris today?"},
                    {"role": "assistant", "tool_calls": [tool_call]},
                    {"role": "tool", "tool_call_id": tool_call.id, "content": result}
                ]
            )
            
            # Print the final assistant message
            print(follow_up.choices[0].message.content)
else:
    print("No tool was called.")