LiteLLM

Run inferences on Clarifai models using LiteLLM

LiteLLM provides a universal interface that simplifies working with LLMs across multiple providers. It offers a single, consistent API for making inferences, allowing you to interact with a wide range of models using the same method, regardless of the underlying provider.

LiteLLM natively supports OpenAI-compatible APIs, making it easy to run inferences on Clarifai-hosted models with minimal setup.

tip

Click here for additional examples on how to perform model predictions using various SDKs — such as the Clarifai SDK, OpenAI client, and LiteLLM. The examples demonstrate various model types and include both streaming and non-streaming modes, as well as tool calling capabilities.

Prerequisites

Install LiteLLM

Install the litellm package.

Python

 pip install litellm 

Get a PAT Key

You need a PAT key to authenticate your connection to the Clarifai platform. You can generate the PAT key in your personal settings page by navigating to the Security section.

You can then set the PAT as an environment variable using CLARIFAI_PAT:

Unix-Like Systems
Windows

 export CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE 

 set CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE 

Get a Clarifai Model

Go to the Clarifai Community platform and select the model you want to use for making predictions.

Note: When specifying a Clarifai model in LiteLLM, use the model path prefixed with openai/, followed by the full Clarifai model URL. For example: openai/https://clarifai.com/openai/chat-completion/models/o4-mini.

Chat Completions

In LiteLLM, the completion() function is the primary method for interacting with language models that follow the OpenAI Chat API format. It supports both traditional completions and chat-based interactions by accepting a list of messages — similar to OpenAI’s chat.completions.create().

Python SDK

import os
import litellm

response = litellm.completion(
    model="openai/https://clarifai.com/anthropic/completion/models/claude-sonnet-4",
    api_key=os.environ["CLARIFAI_PAT"],  # Ensure CLARIFAI_PAT is set as an environment variable
    api_base="https://api.clarifai.com/v2/ext/openai/v1",
    # Message formatting is consistent with OpenAI's schema ({"role": ..., "content": ...}).
    messages=[
        {"role": "system", "content": "You are a friendly assistant."},
        {"role": "user", "content": "Hey, how's it going?"}
    ],
    # You can add OpenAI-compatible parameters here
    temperature=0.7,         # Optional: controls randomness
    max_tokens=100           # Optional: limits response length
)

# Print the assistant's reply
print(response['choices'][0]['message']['content'])

Example Output

Hey there! I'm doing well, thanks for asking! How are you doing today? Is there anything I can help you with or would you like to chat about something?

Streaming

When streaming is enabled by setting stream=True, the completion method returns an iterator that yields partial responses in real time as the model generates them, instead of a single complete dictionary.

Python SDK

import os
import litellm

# Enable streaming
response_stream = litellm.completion(
    model="openai/https://clarifai.com/anthropic/completion/models/claude-sonnet-4",
    api_key=os.environ["CLARIFAI_PAT"],  # Ensure CLARIFAI_PAT is set as an environment variable
    api_base="https://api.clarifai.com/v2/ext/openai/v1",
    messages=[
        {"role": "system", "content": "You are a friendly assistant."},
        {"role": "user", "content": "Hey, how's it going? Tell me a short story about a space-faring cat."}
    ],
    stream=True  # Enable streaming
)

# Print the streamed output in real time
for chunk in response_stream:
    content = chunk.get("choices", [{}])[0].get("delta", {}).get("content")
    if content:
        print(content, end="", flush=True)

Tool Calling

Clarifai models accessed via LiteLLM fully support tool calling, enabling advanced interactions such as function execution during a conversation.

Python SDK

import os
import litellm

# Define the tool (function) the model can call
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Retrieve the current temperature for a given location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City and country, e.g., 'Tokyo, Japan'"
                    }
                },
                "required": ["location"],
                "additionalProperties": False
            }
        }
    }
]

# Send the request to a Clarifai-hosted model using LiteLLM
response = litellm.completion(
    model="openai/https://clarifai.com/openai/chat-completion/models/o4-mini",
    api_key=os.environ["CLARIFAI_PAT"],  # Ensure CLARIFAI_PAT is set as an environment variable
    api_base="https://api.clarifai.com/v2/ext/openai/v1",
    messages=[
        {"role": "user", "content": "What is the weather in Paris today?"}
    ],
    tools=tools
)

# Output the tool call suggested by the model (if any)
print(response.choices[0].message.tool_calls)

Tool Calling Implementation Example

import os
import json
import litellm

# Step 1: Define the tool schema
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Retrieve the current temperature for a given location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City and country, e.g., 'Tokyo, Japan'"
                    }
                },
                "required": ["location"],
                "additionalProperties": False
            }
        }
    }
]

# Step 2: Define a function that simulates tool execution
def get_weather(location: str) -> str:
    # In a real app, you'd query a weather API here
    return f"The current temperature in {location} is 22°C."

# Step 3: Make the initial request to trigger the tool
response = litellm.completion(
    model="openai/https://clarifai.com/openai/chat-completion/models/o4-mini",
    api_key=os.environ["CLARIFAI_PAT"],
    api_base="https://api.clarifai.com/v2/ext/openai/v1",
    messages=[
        {"role": "user", "content": "What is the weather in Paris today?"}
    ],
    tools=tools
)

tool_calls = response.choices[0].message.tool_calls

# Step 4: Parse the tool call and run the function
if tool_calls:
    for tool_call in tool_calls:
        tool_name = tool_call.function.name
        arguments = json.loads(tool_call.function.arguments)
        
        if tool_name == "get_weather":
            result = get_weather(arguments["location"])
            
            # Step 5: Send the function result back to the model
            follow_up = litellm.completion(
                model="openai/https://clarifai.com/openai/chat-completion/models/o4-mini",
                api_key=os.environ["CLARIFAI_PAT"], # Ensure CLARIFAI_PAT is set as an environment variable
                api_base="https://api.clarifai.com/v2/ext/openai/v1",
                messages=[
                    {"role": "user", "content": "What is the weather in Paris today?"},
                    {"role": "assistant", "tool_calls": [tool_call]},
                    {"role": "tool", "tool_call_id": tool_call.id, "content": result}
                ]
            )
            
            # Print the final assistant message
            print(follow_up.choices[0].message.content)
else:
    print("No tool was called.")

Prerequisites​

Install LiteLLM​

Get a PAT Key​

Get a Clarifai Model​

Chat Completions​

Streaming​

Tool Calling​