LiteLLM

Run inferences on Clarifai models using LiteLLM

LiteLLM provides a universal interface that simplifies working with LLMs across multiple providers. It offers a single, consistent API for making inferences, allowing you to interact with a wide range of models using the same method, regardless of the underlying provider.

LiteLLM natively supports OpenAI-compatible APIs, making it easy to run inferences on Clarifai-hosted models with minimal setup.

tip

Click here for additional examples on how to perform model predictions using various SDKs — such as the Clarifai SDK, OpenAI client, and LiteLLM. The examples demonstrate various model types and include both streaming and non-streaming modes, as well as tool calling capabilities.

Prerequisites

Install LiteLLM

Install the litellm package.

Python

pip install litellm

Get a PAT Key

You need a PAT key to authenticate your connection to the Clarifai platform. You can get one by navigating to Settings in the collapsible left sidebar, selecting Secrets, and creating or copying an existing token from there.

You can then set the PAT as an environment variable using CLARIFAI_PAT:

Unix-Like Systems
Windows

export CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE

set CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE

Get a Clarifai Model

Go to the Clarifai Community platform and select the model you want to use for predictions. LiteLLM supports all models in the Clarifai community.

note

When using a Clarifai model with LiteLLM, reference it using the clarifai/-prefixed model ID in the following format: clarifai/<user_id>.<app_id>.<model_id>. For example: clarifai/openai.chat-completion.gpt-oss-20b.

Chat Completions

In LiteLLM, the completion() function is the primary method for interacting with language models that follow the OpenAI Chat API format. It supports both traditional completions and chat-based interactions by accepting a list of messages — similar to OpenAI’s chat.completions.create().

Python SDK

import os
from litellm import completion

response = completion(
    model="clarifai/openai.chat-completion.gpt-oss-120b",
    api_key=os.environ["CLARIFAI_PAT"],  # Ensure CLARIFAI_PAT is set as an environment variable

    # Message formatting follows OpenAI's schema: {"role": ..., "content": ...}
    messages=[
        {"role": "system", "content": "You are a friendly assistant."},
        {"role": "user", "content": "Hey, how's it going?"}
    ],

    # Optional OpenAI-compatible parameters
    temperature=0.7,  # Controls randomness
    max_tokens=100    # Limits response length
)

print(response['choices'][0]['message']['content'])

Example Output

Hey there! I'm doing great—thanks for asking. How about you? Anything fun or interesting on your mind today?

Streaming

When streaming is enabled by setting stream=True, the completion method returns an iterator that yields partial responses in real time as the model generates them, instead of a single complete dictionary.

Python SDK

import os
import litellm

for chunk in litellm.completion(
    model="clarifai/openai.chat-completion.gpt-oss-120b",
    api_key=os.environ["CLARIFAI_PAT"],  # Ensure CLARIFAI_PAT is set as an environment variable

    # Message formatting follows OpenAI's schema: {"role": ..., "content": ...}
    messages=[
        {"role": "system", "content": "You are a friendly assistant."},
        {"role": "user", "content": "Tell me a fun fact about space."}
    ],

    stream=True,       # Enable streaming responses
    # Optional OpenAI-compatible parameters
    temperature=0.7,   # Controls randomness
    max_tokens=100     # Limits response length
):
    # Print incremental text as it arrives
    print(chunk.choices[0].delta.get("content", ""), end="", flush=True)

Tool Calling

Clarifai models accessed via LiteLLM fully support tool calling, enabling advanced interactions such as function execution during a conversation.

Python SDK

import os
from litellm import completion

# Define tools the model can call
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City and state, e.g., 'San Francisco, CA'"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"]
                    }
                },
                "required": ["location", "unit"]
            }
        }
    }
]

# Make the completion request via LiteLLM
response = completion(
    model="clarifai/openai.chat-completion.gpt-oss-120b",
    api_key=os.environ["CLARIFAI_PAT"],  # Ensure CLARIFAI_PAT is set as an environment variable
    messages=[{"role": "user", "content": "What's the weather like in San Francisco?"}],
    tools=tools
)

# Print any tool calls suggested by the model
tool_calls = response.choices[0].message.tool_calls
if tool_calls:
    print("Tool call suggested by the model:", tool_calls)
else:
    print("No tool call was made by the model.")

Tool Calling Implementation Example

import os
import json
from litellm import completion

# -----------------------------------------
# Step 1: Define the tool schema
# -----------------------------------------
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Retrieve the current temperature for a given location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City and country, e.g., 'Tokyo, Japan'"
                    }
                },
                "required": ["location"],
                "additionalProperties": False
            }
        }
    }
]

# -----------------------------------------
# Step 2: Implement the tool logic
# -----------------------------------------
def get_weather(location: str) -> str:
    # In a real app, you'd call a weather API here
    return f"The current temperature in {location} is 22°C."

# -----------------------------------------
# Step 3: Request a model completion that may trigger a tool call
# -----------------------------------------
response = completion(
    model="clarifai/openai.chat-completion.gpt-oss-120b",
    api_key=os.environ["CLARIFAI_PAT"],  # Ensure CLARIFAI_PAT is set
    messages=[
        {"role": "user", "content": "What is the weather in Paris today?"}
    ],
    tools=tools
)

tool_calls = response.choices[0].message.tool_calls

# -----------------------------------------
# Step 4: Parse and execute the tool call
# -----------------------------------------
if tool_calls:
    for tool_call in tool_calls:
        tool_name = tool_call.function.name
        arguments = json.loads(tool_call.function.arguments)

        if tool_name == "get_weather":
            result = get_weather(arguments["location"])

            # -----------------------------------------
            # Step 5: Send tool result back to the model
            # -----------------------------------------
            follow_up = completion(
                model="clarifai/openai.chat-completion.gpt-oss-120b",
                api_key=os.environ["CLARIFAI_PAT"],
                messages=[
                    {"role": "user", "content": "What is the weather in Paris today?"},
                    {"role": "assistant", "tool_calls": [tool_call]},
                    {
                        "role": "tool",
                        "tool_call_id": tool_call.id,
                        "content": result
                    }
                ]
            )

            # Print the assistant's final response
            print(follow_up.choices[0].message.content)
else:
    print("No tool was called.")

Usage with LiteLLM Proxy

Here’s how to call Clarifai models through the LiteLLM Proxy Server.

Install LiteLLM with Proxy Support

Python

pip install 'litellm[proxy]'

Set Key

Set your Clarifai PAT as an environment variable, as illustrated above.

Start the Proxy

Create a config.yaml.

model_list:
  - model_name: clarifai-model
    litellm_params:
      model: clarifai/openai.chat-completion.gpt-oss-20b
      api_key: ${CLARIFAI_PAT}

Then, start the LiteLLM proxy:

litellm --config /path/to/config.yaml

The server will run at:

http://0.0.0.0:4000

Test the Proxy

cURL
Python (OpenAI)

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
  "model": "clarifai-model",
  "messages": [
    {"role": "user", "content": "what llm are you"}
  ]
}'

import openai

client = openai.OpenAI(
    api_key="anything",  # LiteLLM proxy accepts any key
    base_url="http://0.0.0.0:4000"
)

response = client.chat.completions.create(
    model="clarifai-model",
    messages=[
        {
            "role": "user",
            "content": "this is a test request, write a short poem"
        }
    ]
)

print(response)

Prerequisites​

Install LiteLLM​

Get a PAT Key​

Get a Clarifai Model​

Chat Completions​

Streaming​

Tool Calling​

Usage with LiteLLM Proxy​

Install LiteLLM with Proxy Support​

Set Key​

Start the Proxy​

Test the Proxy​