Skip to main content

LiteLLM

Run inferences on Clarifai models using LiteLLM


LiteLLM provides a universal interface that simplifies working with LLMs across multiple providers. It offers a single, consistent API for making inferences, allowing you to interact with a wide range of models using the same method, regardless of the underlying provider.

LiteLLM natively supports OpenAI-compatible APIs, making it easy to run inferences on Clarifai-hosted models with minimal setup.

tip

Click here for additional examples on how to perform model predictions using various SDKs — such as the Clarifai SDK, OpenAI client, and LiteLLM. The examples demonstrate various model types and include both streaming and non-streaming modes, as well as tool calling capabilities.

Prerequisites

Install LiteLLM

Install the litellm package.

pip install litellm

Get a PAT Key

You need a PAT key to authenticate your connection to the Clarifai platform. You can get one by navigating to Settings in the collapsible left sidebar, selecting Secrets, and creating or copying an existing token from there.

You can then set the PAT as an environment variable using CLARIFAI_PAT:

export CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE

Get a Clarifai Model

Go to the Clarifai Community platform and select the model you want to use for predictions. LiteLLM supports all models in the Clarifai community.

note

When using a Clarifai model with LiteLLM, reference it using the clarifai/-prefixed model ID in the following format: clarifai/<user_id>.<app_id>.<model_id>. For example: clarifai/openai.chat-completion.gpt-oss-20b.

Chat Completions

In LiteLLM, the completion() function is the primary method for interacting with language models that follow the OpenAI Chat API format. It supports both traditional completions and chat-based interactions by accepting a list of messages — similar to OpenAI’s chat.completions.create().

import os
from litellm import completion

response = completion(
model="clarifai/openai.chat-completion.gpt-oss-120b",
api_key=os.environ["CLARIFAI_PAT"], # Ensure CLARIFAI_PAT is set as an environment variable

# Message formatting follows OpenAI's schema: {"role": ..., "content": ...}
messages=[
{"role": "system", "content": "You are a friendly assistant."},
{"role": "user", "content": "Hey, how's it going?"}
],

# Optional OpenAI-compatible parameters
temperature=0.7, # Controls randomness
max_tokens=100 # Limits response length
)

print(response['choices'][0]['message']['content'])
Example Output
Hey there! I'm doing great—thanks for asking. How about you? Anything fun or interesting on your mind today?

Streaming

When streaming is enabled by setting stream=True, the completion method returns an iterator that yields partial responses in real time as the model generates them, instead of a single complete dictionary.

import os
import litellm

for chunk in litellm.completion(
model="clarifai/openai.chat-completion.gpt-oss-120b",
api_key=os.environ["CLARIFAI_PAT"], # Ensure CLARIFAI_PAT is set as an environment variable

# Message formatting follows OpenAI's schema: {"role": ..., "content": ...}
messages=[
{"role": "system", "content": "You are a friendly assistant."},
{"role": "user", "content": "Tell me a fun fact about space."}
],

stream=True, # Enable streaming responses
# Optional OpenAI-compatible parameters
temperature=0.7, # Controls randomness
max_tokens=100 # Limits response length
):
# Print incremental text as it arrives
print(chunk.choices[0].delta.get("content", ""), end="", flush=True)

Tool Calling

Clarifai models accessed via LiteLLM fully support tool calling, enabling advanced interactions such as function execution during a conversation.

import os
from litellm import completion

# Define tools the model can call
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and state, e.g., 'San Francisco, CA'"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"]
}
},
"required": ["location", "unit"]
}
}
}
]

# Make the completion request via LiteLLM
response = completion(
model="clarifai/openai.chat-completion.gpt-oss-120b",
api_key=os.environ["CLARIFAI_PAT"], # Ensure CLARIFAI_PAT is set as an environment variable
messages=[{"role": "user", "content": "What's the weather like in San Francisco?"}],
tools=tools
)

# Print any tool calls suggested by the model
tool_calls = response.choices[0].message.tool_calls
if tool_calls:
print("Tool call suggested by the model:", tool_calls)
else:
print("No tool call was made by the model.")
Tool Calling Implementation Example
import os
import json
from litellm import completion

# -----------------------------------------
# Step 1: Define the tool schema
# -----------------------------------------
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Retrieve the current temperature for a given location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and country, e.g., 'Tokyo, Japan'"
}
},
"required": ["location"],
"additionalProperties": False
}
}
}
]

# -----------------------------------------
# Step 2: Implement the tool logic
# -----------------------------------------
def get_weather(location: str) -> str:
# In a real app, you'd call a weather API here
return f"The current temperature in {location} is 22°C."

# -----------------------------------------
# Step 3: Request a model completion that may trigger a tool call
# -----------------------------------------
response = completion(
model="clarifai/openai.chat-completion.gpt-oss-120b",
api_key=os.environ["CLARIFAI_PAT"], # Ensure CLARIFAI_PAT is set
messages=[
{"role": "user", "content": "What is the weather in Paris today?"}
],
tools=tools
)

tool_calls = response.choices[0].message.tool_calls

# -----------------------------------------
# Step 4: Parse and execute the tool call
# -----------------------------------------
if tool_calls:
for tool_call in tool_calls:
tool_name = tool_call.function.name
arguments = json.loads(tool_call.function.arguments)

if tool_name == "get_weather":
result = get_weather(arguments["location"])

# -----------------------------------------
# Step 5: Send tool result back to the model
# -----------------------------------------
follow_up = completion(
model="clarifai/openai.chat-completion.gpt-oss-120b",
api_key=os.environ["CLARIFAI_PAT"],
messages=[
{"role": "user", "content": "What is the weather in Paris today?"},
{"role": "assistant", "tool_calls": [tool_call]},
{
"role": "tool",
"tool_call_id": tool_call.id,
"content": result
}
]
)

# Print the assistant's final response
print(follow_up.choices[0].message.content)
else:
print("No tool was called.")

Usage with LiteLLM Proxy

Here’s how to call Clarifai models through the LiteLLM Proxy Server.

Install LiteLLM with Proxy Support

pip install 'litellm[proxy]'

Set Key

Set your Clarifai PAT as an environment variable, as illustrated above.

Start the Proxy

Create a config.yaml.

model_list:
- model_name: clarifai-model
litellm_params:
model: clarifai/openai.chat-completion.gpt-oss-20b
api_key: ${CLARIFAI_PAT}

Then, start the LiteLLM proxy:

litellm --config /path/to/config.yaml

The server will run at:

http://0.0.0.0:4000

Test the Proxy

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "clarifai-model",
"messages": [
{"role": "user", "content": "what llm are you"}
]
}'