LiteLLM
Run inferences on Clarifai models using LiteLLM
LiteLLM provides a universal interface that simplifies working with LLMs across multiple providers. It offers a single, consistent API for making inferences, allowing you to interact with a wide range of models using the same method, regardless of the underlying provider.
LiteLLM natively supports OpenAI-compatible APIs, making it easy to run inferences on Clarifai-hosted models with minimal setup.
Click here for additional examples on how to perform model predictions using various SDKs — such as the Clarifai SDK, OpenAI client, and LiteLLM. The examples demonstrate various model types and include both streaming and non-streaming modes, as well as tool calling capabilities.
Prerequisites
Install LiteLLM
Install the litellm package.
- Python
pip install litellm
Get a PAT Key
You need a PAT key to authenticate your connection to the Clarifai platform. You can get one by navigating to Settings in the collapsible left sidebar, selecting Secrets, and creating or copying an existing token from there.
You can then set the PAT as an environment variable using CLARIFAI_PAT:
- Unix-Like Systems
- Windows
export CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE
set CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE
Get a Clarifai Model
Go to the Clarifai Community platform and select the model you want to use for predictions. LiteLLM supports all models in the Clarifai community.
When using a Clarifai model with LiteLLM, reference it using the clarifai/-prefixed model ID in the following format: clarifai/<user_id>.<app_id>.<model_id>. For example: clarifai/openai.chat-completion.gpt-oss-20b.
Chat Completions
In LiteLLM, the completion() function is the primary method for interacting with language models that follow the OpenAI Chat API format. It supports both traditional completions and chat-based interactions by accepting a list of messages — similar to OpenAI’s chat.completions.create().
- Python SDK
import os
from litellm import completion
response = completion(
model="clarifai/openai.chat-completion.gpt-oss-120b",
api_key=os.environ["CLARIFAI_PAT"], # Ensure CLARIFAI_PAT is set as an environment variable
# Message formatting follows OpenAI's schema: {"role": ..., "content": ...}
messages=[
{"role": "system", "content": "You are a friendly assistant."},
{"role": "user", "content": "Hey, how's it going?"}
],
# Optional OpenAI-compatible parameters
temperature=0.7, # Controls randomness
max_tokens=100 # Limits response length
)
print(response['choices'][0]['message']['content'])
Example Output
Hey there! I'm doing great—thanks for asking. How about you? Anything fun or interesting on your mind today?
Streaming
When streaming is enabled by setting stream=True, the completion method returns an iterator that yields partial responses in real time as the model generates them, instead of a single complete dictionary.
- Python SDK
import os
import litellm
for chunk in litellm.completion(
model="clarifai/openai.chat-completion.gpt-oss-120b",
api_key=os.environ["CLARIFAI_PAT"], # Ensure CLARIFAI_PAT is set as an environment variable
# Message formatting follows OpenAI's schema: {"role": ..., "content": ...}
messages=[
{"role": "system", "content": "You are a friendly assistant."},
{"role": "user", "content": "Tell me a fun fact about space."}
],
stream=True, # Enable streaming responses
# Optional OpenAI-compatible parameters
temperature=0.7, # Controls randomness
max_tokens=100 # Limits response length
):
# Print incremental text as it arrives
print(chunk.choices[0].delta.get("content", ""), end="", flush=True)
Tool Calling
Clarifai models accessed via LiteLLM fully support tool calling, enabling advanced interactions such as function execution during a conversation.
- Python SDK
import os
from litellm import completion
# Define tools the model can call
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and state, e.g., 'San Francisco, CA'"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"]
}
},
"required": ["location", "unit"]
}
}
}
]
# Make the completion request via LiteLLM
response = completion(
model="clarifai/openai.chat-completion.gpt-oss-120b",
api_key=os.environ["CLARIFAI_PAT"], # Ensure CLARIFAI_PAT is set as an environment variable
messages=[{"role": "user", "content": "What's the weather like in San Francisco?"}],
tools=tools
)
# Print any tool calls suggested by the model
tool_calls = response.choices[0].message.tool_calls
if tool_calls:
print("Tool call suggested by the model:", tool_calls)
else:
print("No tool call was made by the model.")
Tool Calling Implementation Example
import os
import json
from litellm import completion
# -----------------------------------------
# Step 1: Define the tool schema
# -----------------------------------------
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Retrieve the current temperature for a given location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and country, e.g., 'Tokyo, Japan'"
}
},
"required": ["location"],
"additionalProperties": False
}
}
}
]
# -----------------------------------------
# Step 2: Implement the tool logic
# -----------------------------------------
def get_weather(location: str) -> str:
# In a real app, you'd call a weather API here
return f"The current temperature in {location} is 22°C."
# -----------------------------------------
# Step 3: Request a model completion that may trigger a tool call
# -----------------------------------------
response = completion(
model="clarifai/openai.chat-completion.gpt-oss-120b",
api_key=os.environ["CLARIFAI_PAT"], # Ensure CLARIFAI_PAT is set
messages=[
{"role": "user", "content": "What is the weather in Paris today?"}
],
tools=tools
)
tool_calls = response.choices[0].message.tool_calls
# -----------------------------------------
# Step 4: Parse and execute the tool call
# -----------------------------------------
if tool_calls:
for tool_call in tool_calls:
tool_name = tool_call.function.name
arguments = json.loads(tool_call.function.arguments)
if tool_name == "get_weather":
result = get_weather(arguments["location"])
# -----------------------------------------
# Step 5: Send tool result back to the model
# -----------------------------------------
follow_up = completion(
model="clarifai/openai.chat-completion.gpt-oss-120b",
api_key=os.environ["CLARIFAI_PAT"],
messages=[
{"role": "user", "content": "What is the weather in Paris today?"},
{"role": "assistant", "tool_calls": [tool_call]},
{
"role": "tool",
"tool_call_id": tool_call.id,
"content": result
}
]
)
# Print the assistant's final response
print(follow_up.choices[0].message.content)
else:
print("No tool was called.")
Usage with LiteLLM Proxy
Here’s how to call Clarifai models through the LiteLLM Proxy Server.
Install LiteLLM with Proxy Support
- Python
pip install 'litellm[proxy]'
Set Key
Set your Clarifai PAT as an environment variable, as illustrated above.
Start the Proxy
Create a config.yaml.
model_list:
- model_name: clarifai-model
litellm_params:
model: clarifai/openai.chat-completion.gpt-oss-20b
api_key: ${CLARIFAI_PAT}
Then, start the LiteLLM proxy:
litellm --config /path/to/config.yaml
The server will run at:
http://0.0.0.0:4000
Test the Proxy
- cURL
- Python (OpenAI)
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "clarifai-model",
"messages": [
{"role": "user", "content": "what llm are you"}
]
}'
import openai
client = openai.OpenAI(
api_key="anything", # LiteLLM proxy accepts any key
base_url="http://0.0.0.0:4000"
)
response = client.chat.completions.create(
model="clarifai-model",
messages=[
{
"role": "user",
"content": "this is a test request, write a short poem"
}
]
)
print(response)