LiteLLM
Run inferences on Clarifai models using LiteLLM
LiteLLM provides a universal interface that simplifies working with LLMs across multiple providers. It offers a single, consistent API for making inferences, allowing you to interact with a wide range of models using the same method, regardless of the underlying provider.
LiteLLM natively supports OpenAI-compatible APIs, making it easy to run inferences on Clarifai-hosted models with minimal setup.
Prerequisites
Install LiteLLM
Install the litellm
package.
- Python
pip install litellm
Get a PAT Key
You need a PAT key to authenticate your connection to the Clarifai platform. You can generate the PAT key in your personal settings page by navigating to the Security section.
You can then set the PAT as an environment variable using CLARIFAI_PAT
:
- Unix-Like Systems
- Windows
export CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE
set CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE
Get a Clarifai Model
Go to the Clarifai Community platform and select the model you want to use for making predictions.
Note: When specifying a Clarifai model in LiteLLM, use the model path prefixed with
openai/
, followed by the full Clarifai model URL. For example:openai/https://clarifai.com/openai/chat-completion/models/o4-mini
.
Chat Completions
In LiteLLM, the completion()
function is the primary method for interacting with language models that follow the OpenAI Chat API format. It supports both traditional completions and chat-based interactions by accepting a list of messages — similar to OpenAI’s chat.completions.create()
.
- Python SDK
import os
import litellm
response = litellm.completion(
model="openai/https://clarifai.com/anthropic/completion/models/claude-sonnet-4",
api_key=os.environ["CLARIFAI_PAT"], # Ensure CLARIFAI_PAT is set as an environment variable
api_base="https://api.clarifai.com/v2/ext/openai/v1",
# Message formatting is consistent with OpenAI's schema ({"role": ..., "content": ...}).
messages=[
{"role": "system", "content": "You are a friendly assistant."},
{"role": "user", "content": "Hey, how's it going?"}
],
# You can add OpenAI-compatible parameters here
temperature=0.7, # Optional: controls randomness
max_tokens=100 # Optional: limits response length
)
# Print the assistant's reply
print(response['choices'][0]['message']['content'])
Example Output
Hey there! I'm doing well, thanks for asking! How are you doing today? Is there anything I can help you with or would you like to chat about something?
Streaming
When streaming is enabled by setting stream=True
, the completion
method returns an iterator that yields partial responses in real time as the model generates them, instead of a single complete dictionary.
- Python SDK
import os
import litellm
# Enable streaming
response_stream = litellm.completion(
model="openai/https://clarifai.com/anthropic/completion/models/claude-sonnet-4",
api_key=os.environ["CLARIFAI_PAT"], # Ensure CLARIFAI_PAT is set as an environment variable
api_base="https://api.clarifai.com/v2/ext/openai/v1",
messages=[
{"role": "system", "content": "You are a friendly assistant."},
{"role": "user", "content": "Hey, how's it going? Tell me a short story about a space-faring cat."}
],
stream=True # Enable streaming
)
# Print the streamed output in real time
for chunk in response_stream:
content = chunk.get("choices", [{}])[0].get("delta", {}).get("content")
if content:
print(content, end="", flush=True)
Tool Calling
Clarifai models accessed via LiteLLM fully support tool calling, enabling advanced interactions such as function execution during a conversation.
- Python SDK
import os
import litellm
# Define the tool (function) the model can call
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Retrieve the current temperature for a given location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and country, e.g., 'Tokyo, Japan'"
}
},
"required": ["location"],
"additionalProperties": False
}
}
}
]
# Send the request to a Clarifai-hosted model using LiteLLM
response = litellm.completion(
model="openai/https://clarifai.com/openai/chat-completion/models/o4-mini",
api_key=os.environ["CLARIFAI_PAT"], # Ensure CLARIFAI_PAT is set as an environment variable
api_base="https://api.clarifai.com/v2/ext/openai/v1",
messages=[
{"role": "user", "content": "What is the weather in Paris today?"}
],
tools=tools
)
# Output the tool call suggested by the model (if any)
print(response.choices[0].message.tool_calls)
Tool Calling Implementation Example
import os
import json
import litellm
# Step 1: Define the tool schema
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Retrieve the current temperature for a given location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and country, e.g., 'Tokyo, Japan'"
}
},
"required": ["location"],
"additionalProperties": False
}
}
}
]
# Step 2: Define a function that simulates tool execution
def get_weather(location: str) -> str:
# In a real app, you'd query a weather API here
return f"The current temperature in {location} is 22°C."
# Step 3: Make the initial request to trigger the tool
response = litellm.completion(
model="openai/https://clarifai.com/openai/chat-completion/models/o4-mini",
api_key=os.environ["CLARIFAI_PAT"],
api_base="https://api.clarifai.com/v2/ext/openai/v1",
messages=[
{"role": "user", "content": "What is the weather in Paris today?"}
],
tools=tools
)
tool_calls = response.choices[0].message.tool_calls
# Step 4: Parse the tool call and run the function
if tool_calls:
for tool_call in tool_calls:
tool_name = tool_call.function.name
arguments = json.loads(tool_call.function.arguments)
if tool_name == "get_weather":
result = get_weather(arguments["location"])
# Step 5: Send the function result back to the model
follow_up = litellm.completion(
model="openai/https://clarifai.com/openai/chat-completion/models/o4-mini",
api_key=os.environ["CLARIFAI_PAT"], # Ensure CLARIFAI_PAT is set as an environment variable
api_base="https://api.clarifai.com/v2/ext/openai/v1",
messages=[
{"role": "user", "content": "What is the weather in Paris today?"},
{"role": "assistant", "tool_calls": [tool_call]},
{"role": "tool", "tool_call_id": tool_call.id, "content": result}
]
)
# Print the final assistant message
print(follow_up.choices[0].message.content)
else:
print("No tool was called.")