MCP Servers
Extend agentic models inference capabilities with MCP Servers
Model Context Protocol (MCP) servers enable LLMs with agentic capabilities on Clarifai to dynamically discover and invoke external tools during inference.
By integrating MCP servers with agentic-enabled models, you can extend model behavior beyond normal uses to include tool usage, data retrieval, and action execution — all within a single inference flow.
This allows AI agents on Clarifai to reason, call tools exposed by MCP servers, and iteratively use their outputs to produce more accurate and capable responses.
Prerequisites
Get an Agentic Model
For a model to support agentic behavior through MCP servers on the Clarifai platform, it must extend the standard OpenAIModelClass with the AgenticModelClass.
This extension enables the following capabilities:
- Tool discovery and execution managed by the agentic model class
- Iterative tool calling within a single predict or generate request
- Compatibility with Clarifai SDKs and the OpenAI-compatible API
- Support for both streaming and non-streaming inference modes
You can see an example implementation of AgenticModelClass in this 1/model.py file.
Note: To upload a model with agentic capabilities, simply use
AgenticModelClass. All other steps and functionality remain the same as when uploading a standard model on Clarifai. You can follow this example to get started.
These are some example models with agentic capabilities enabled:
Get an MCP Server
The Clarifai platform provides MCP servers that you can use out of the box. Here are some examples:
- Weather Server — Provides weather information
- Browser Server — Enables web browsing capabilities
You can also build your own MCP server or use an open-source MCP server and deploy it on the Clarifai platform.
Note: You can specify multiple MCP servers in the
mcp_serverslist to give the model access to multiple tool sets.
Install Packages
You need to install the following Python packages:
clarifai– Install the latest version of the Clarifai Python SDK package. This also installs the Command Line Interface (CLI).openai— This leverage Clarifai’s OpenAI-compatible endpoint endpoint to run inferences using the OpenAI client libraryfastmcp— The core framework for interacting with MCP servers.
You can run the following command to install them:
- Bash
pip install --upgrade clarifai openai fastmcp
Get a PAT Key
You need a PAT (Personal Access Token) key to authenticate your connection to the Clarifai platform. You can get one by navigating to Settings in the collapsible left sidebar, selecting Secrets, and creating or copying an existing token from there.
You can then set the PAT as an environment variable using CLARIFAI_PAT. This also authenticates your session when using the Clarifai’s CLI.
- Unix-Like Systems
- Windows
export CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE
set CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE
Integrate with LLMs
You can pass an MCP server as a tool source to an agentic LLM on Clarifai. The model will automatically discover available tools and call them as needed during completion.
Chat Completions (Non-Streaming)
The OpenAI Chat Completions API endpoint lets you produce a model response by providing a list of messages that constitute a conversation.
- Python
from openai import OpenAI
import os
client = OpenAI(
base_url="https://api.clarifai.com/v2/ext/openai/v1",
api_key=os.environ['CLARIFAI_PAT']
)
# Define MCP servers to make available to the agentic model
mcp_servers = [
"https://clarifai.com/clarifai/mcp/models/weather-mcp-server",
"https://clarifai.com/clarifai/mcp/models/browser-mcp-server"
]
# Create a chat completion with MCP servers
completion = client.chat.completions.create(
model="https://clarifai.com/qwen/qwenLM/models/Qwen3-30B-A3B-Instruct-2507",
messages=[{"role": "user", "content": "What was the weather in Los Angeles, California yesterday?"}],
extra_body={"mcp_servers": mcp_servers},
max_completion_tokens=500,
stream=False
)
print(completion.choices[0].message.content)
Example Output
The weather in Los Angeles, California yesterday (as per the forecast data provided) was as follows:
- **Temperature**: High of 57°F, low of 41°F.
- **Conditions**:
- Partly cloudy with a chance of light rain in the evening.
- Rain showers were expected during the day, followed by mostly sunny conditions later.
- **Wind**: 10 to 15 mph from the west-southwest during the day, decreasing to 5–15 mph overnight.
Overall, it was a mild day with a mix of rain showers and sunshine, typical of Los Angeles' coastal climate.
The above snippet demonstrates how to:
- Initialize the OpenAI client and point it to Clarifai’s OpenAI-compatible endpoint, instead of OpenAI’s servers.
- Specify which MCP servers the model can use. During inference, the model can discover and call these tools autonomously if it decides they’re useful.
- Create a chat completion conversation with MCP support:
- An agentic model is used to discover tools from MCP servers, execute them, and iterate on tool calls if needed — for example, weather lookup or browsing.
extra_body={"mcp_servers": mcp_servers}tells Clarifai which MCP servers to make available to the model for this request.
Chat Completions (Streaming)
You can enable token-by-token streaming by setting stream=True. This allows the model’s response to arrive incrementally, providing partial results in real time instead of waiting for the full completion.
- Python
from openai import OpenAI
import os
client = OpenAI(
base_url="https://api.clarifai.com/v2/ext/openai/v1",
api_key=os.environ['CLARIFAI_PAT']
)
# Define MCP servers to make available to the agentic model
mcp_servers = [
"https://clarifai.com/clarifai/mcp/models/weather-mcp-server",
"https://clarifai.com/clarifai/mcp/models/browser-mcp-server"
]
# Create a chat completion with MCP servers
completion = client.chat.completions.create(
model="https://clarifai.com/qwen/qwenLM/models/Qwen3-30B-A3B-Instruct-2507",
messages=[{"role": "user", "content": "What was the weather in Los Angeles, California yesterday?"}],
extra_body={"mcp_servers": mcp_servers},
max_completion_tokens=500,
stream=True
)
# Stream the response
for chunk in completion:
if chunk.choices and len(chunk.choices) > 0 and hasattr(chunk.choices[0].delta, 'content') and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
elif chunk.choices and len(chunk.choices) > 0 and hasattr(chunk.choices[0].delta, 'reasoning_content') and chunk.choices[0].delta.reasoning_content:
print(chunk.choices[0].delta.reasoning_content, end="", flush=True)
Use with FastMCP Client
You can call MCP tools directly using the FastMCP client without involving an LLM, giving you full programmatic control over tool execution.
- Python
import asyncio
import os
from fastmcp import Client
from fastmcp.client.transports import StreamableHttpTransport
transport = StreamableHttpTransport(
url="https://api.clarifai.com/v2/ext/mcp/v1/users/clarifai/apps/mcp/models/browser-mcp-server",
headers={
"Authorization": f"Bearer {os.environ['CLARIFAI_PAT']}",
},
)
async def main():
async with Client(transport) as client:
tools = await client.list_tools()
print("Available tools:")
for tool in tools:
print(f"- {tool.name}")
result = await client.call_tool(
"search",
{
"query": "latest AI breakthroughs",
"max_results": 5,
},
)
print("\nSearch result:")
print(result.content[0].text)
if __name__ == "__main__":
asyncio.run(main())