MCP Servers

Extend agentic models inference capabilities with MCP Servers

Model Context Protocol (MCP) servers enable LLMs with agentic capabilities on Clarifai to dynamically discover and invoke external tools during inference.

By integrating MCP servers with agentic-enabled models, you can extend model behavior beyond normal uses to include tool usage, data retrieval, and action execution — all within a single inference flow.

This allows AI agents on Clarifai to reason, call tools exposed by MCP servers, and iteratively use their outputs to produce more accurate and capable responses.

Prerequisites

Get an Agentic Model

For a model to support agentic behavior through MCP servers on the Clarifai platform, it must extend the standard OpenAIModelClass with the AgenticModelClass.

This extension enables the following capabilities:

Tool discovery and execution managed by the agentic model class
Iterative tool calling within a single predict or generate request
Compatibility with Clarifai SDKs and the OpenAI-compatible API
Support for both streaming and non-streaming inference modes

You can see an example implementation of AgenticModelClass in this 1/model.py file.

Note: To upload a model with agentic capabilities, simply use AgenticModelClass. All other steps and functionality remain the same as when uploading a standard model on Clarifai. You can follow this example to get started.

These are some example models with agentic capabilities enabled:

Get an MCP Server

The Clarifai platform provides MCP servers that you can use out of the box. Here are some examples:

Weather Server — Provides weather information
Browser Server — Enables web browsing capabilities

You can also build your own MCP server or use an open-source MCP server and deploy it on the Clarifai platform.

Note: You can specify multiple MCP servers in the mcp_servers list to give the model access to multiple tool sets.

Install Packages

You need to install the following Python packages:

clarifai – Install the latest version of the Clarifai Python SDK package. This also installs the Command Line Interface (CLI).
openai — This leverage Clarifai’s OpenAI-compatible endpoint endpoint to run inferences using the OpenAI client library
fastmcp — The core framework for interacting with MCP servers.

You can run the following command to install them:

Bash

pip install --upgrade clarifai openai fastmcp

Get a PAT Key

You need a PAT (Personal Access Token) key to authenticate your connection to the Clarifai platform. You can get one by navigating to Settings in the collapsible left sidebar, selecting Secrets, and creating or copying an existing token from there.

You can then set the PAT as an environment variable using CLARIFAI_PAT. This also authenticates your session when using the Clarifai’s CLI.

Unix-Like Systems
Windows

 export CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE 

 set CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE 

Integrate with LLMs

You can pass an MCP server as a tool source to an agentic LLM on Clarifai. The model will automatically discover available tools and call them as needed during completion.

Chat Completions (Non-Streaming)

The OpenAI Chat Completions API endpoint lets you produce a model response by providing a list of messages that constitute a conversation.

Python

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.clarifai.com/v2/ext/openai/v1",
    api_key=os.environ['CLARIFAI_PAT']
)

# Define MCP servers to make available to the agentic model
mcp_servers = [
    "https://clarifai.com/clarifai/mcp/models/weather-mcp-server",
    "https://clarifai.com/clarifai/mcp/models/browser-mcp-server"
]

# Create a chat completion with MCP servers
completion = client.chat.completions.create(
    model="https://clarifai.com/qwen/qwenLM/models/Qwen3-30B-A3B-Instruct-2507",
    messages=[{"role": "user", "content": "What was the weather in Los Angeles, California yesterday?"}],
    extra_body={"mcp_servers": mcp_servers},
    max_completion_tokens=500,
    stream=False
)

print(completion.choices[0].message.content)

Example Output

The weather in Los Angeles, California yesterday (as per the forecast data provided) was as follows:

- **Temperature**: High of 57°F, low of 41°F.
- **Conditions**: 
  - Partly cloudy with a chance of light rain in the evening.
  - Rain showers were expected during the day, followed by mostly sunny conditions later.
- **Wind**: 10 to 15 mph from the west-southwest during the day, decreasing to 5–15 mph overnight.

Overall, it was a mild day with a mix of rain showers and sunshine, typical of Los Angeles' coastal climate.

The above snippet demonstrates how to:

Initialize the OpenAI client and point it to Clarifai’s OpenAI-compatible endpoint, instead of OpenAI’s servers.
Specify which MCP servers the model can use. During inference, the model can discover and call these tools autonomously if it decides they’re useful.
Create a chat completion conversation with MCP support:
- An agentic model is used to discover tools from MCP servers, execute them, and iterate on tool calls if needed — for example, weather lookup or browsing.
- extra_body={"mcp_servers": mcp_servers} tells Clarifai which MCP servers to make available to the model for this request.

Chat Completions (Streaming)

You can enable token-by-token streaming by setting stream=True. This allows the model’s response to arrive incrementally, providing partial results in real time instead of waiting for the full completion.

Python

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.clarifai.com/v2/ext/openai/v1",
    api_key=os.environ['CLARIFAI_PAT']
)

# Define MCP servers to make available to the agentic model
mcp_servers = [
    "https://clarifai.com/clarifai/mcp/models/weather-mcp-server",
    "https://clarifai.com/clarifai/mcp/models/browser-mcp-server"
]

# Create a chat completion with MCP servers
completion = client.chat.completions.create(
    model="https://clarifai.com/qwen/qwenLM/models/Qwen3-30B-A3B-Instruct-2507",
    messages=[{"role": "user", "content": "What was the weather in Los Angeles, California yesterday?"}],
    extra_body={"mcp_servers": mcp_servers},
    max_completion_tokens=500,
    stream=True
)

# Stream the response
for chunk in completion:
    if chunk.choices and len(chunk.choices) > 0 and hasattr(chunk.choices[0].delta, 'content') and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
    elif chunk.choices and len(chunk.choices) > 0 and hasattr(chunk.choices[0].delta, 'reasoning_content') and chunk.choices[0].delta.reasoning_content:
        print(chunk.choices[0].delta.reasoning_content, end="", flush=True)

Use with FastMCP Client

You can call MCP tools directly using the FastMCP client without involving an LLM, giving you full programmatic control over tool execution.

Python

import asyncio
import os
from fastmcp import Client
from fastmcp.client.transports import StreamableHttpTransport

transport = StreamableHttpTransport( 
    url="https://api.clarifai.com/v2/ext/mcp/v1/users/clarifai/apps/mcp/models/browser-mcp-server",
    headers={
        "Authorization": f"Bearer {os.environ['CLARIFAI_PAT']}",
    },
)

async def main():
    async with Client(transport) as client:
        tools = await client.list_tools()
        print("Available tools:")
        for tool in tools:
            print(f"- {tool.name}")

        result = await client.call_tool(
            "search",
            {
                "query": "latest AI breakthroughs",
                "max_results": 5,
            },
        )

        print("\nSearch result:")
        print(result.content[0].text)


if __name__ == "__main__":
    asyncio.run(main())

Prerequisites​

Get an Agentic Model​

Get an MCP Server​

Install Packages​

Get a PAT Key​

Integrate with LLMs​

Chat Completions (Non-Streaming)​

Chat Completions (Streaming)​

Use with FastMCP Client​