Run Ollama Models Locally
Run Ollama models locally and make them available via a public API
Ollama is an open-source tool that allows you to download, run, and manage large language models (LLMs) directly on your local machine.
When combined with Clarifai’s Local Runners, it enables you to run Ollama models on your machine, expose them securely via a public URL, and tap into Clarifai’s powerful platform — all while keeping the speed, privacy, and control of local deployment.
Step 1: Perform Prerequisites
Install Ollama
Go to the Ollama website and choose the appropriate installer for your system (macOS, Windows, or Linux).
Note: If you're using Windows, make sure to restart your machine after installing Ollama to ensure that the updated environment variables are properly applied.
Sign Up or Log In
Start by logging in to your existing Clarifai account or signing up for a new one. Once logged in, you'll need the following credentials for setup:
-
User ID – Navigate to your personal settings and find your user ID under the Account section.
-
Personal Access Token (PAT) – In the same personal settings page, go to the Security section to generate or copy your PAT. This token is used to securely authenticate your connection to the Clarifai platform.
You can then set the PAT as an environment variable using CLARIFAI_PAT
, which is important when running inference with your models.
- Unix-Like Systems
- Windows
export CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE
set CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE
Install the Clarifai CLI
Install the latest version of the Clarifai CLI, which includes built-in support for Local Runners.
- Bash
pip install --upgrade clarifai
Note: You must have Python 3.10 or higher installed to use Local Runners.
Install OpenAI Package
Install the openai
package, which is required when performing inference with models using the OpenAI-compatible format.
- Python
pip install openai
Step 2: Initialize a Model From Ollama
You can use the Clarifai CLI to download and initialize any model available in the Ollama library directly into your local environment.
For example, here's how to initialize the llama3.2
model in your current directory:
- CLI
clarifai model init --toolkit ollama
Note: The above command will create a new model directory structure that is compatible with the Clarifai platform. You can customize or optimize the generated model by modifying the
1/model.py
file as needed.
You can customize model initialization from the Ollama library using the Clarifai CLI with the following options:
--model-name
– Name of the Ollama model to use (default:llama3.2
). This lets you specify any model from the Ollama library--port
– Port to run the model on (default:23333
)--context-length
– Context window size for the model in tokens (default:8192
)--verbose
– Enables detailed Ollama logs during execution. By default, logs are suppressed unless this flag is provided.
Learn more about setting up a model with Ollama here.
Here is a quickstart for Ollama models and their common use cases:
llama3.2-vision:latest
– For multimodal tasks (text + image), like image captioning or visual Q&A.llama3-groq-tool-use:latest
– Ideal for tool calling and function execution in agent tasks.devstral:latest
– Best for code generation, debugging, and development assistant use cases.
Step 3: Log In to Clarifai
Use the following command to log in to the Clarifai platform to create a configuration context and establish a connection:
- CLI
clarifai login
After running the command, you'll be prompted to provide a few details for authentication:
- CLI
context name (default: "default"):
user id:
personal access token value (default: "ENVVAR" to get our env var rather than config):
Here’s what each field means:
- Context name – You can assign a custom name to this configuration context, or simply press Enter to use the default name,
"default"
. This is useful if you manage multiple environments or configurations. - User ID – Enter your Clarifai user ID.
- Personal Access Token (PAT) – Paste your Clarifai PAT here. If you've already set the
CLARIFAI_PAT
environment variable, you can just press Enter to use it automatically.
Step 4: Start Your Local Runner
Start a local runner using the following command:
- CLI
clarifai model local-runner
If the necessary context configurations aren’t detected, the CLI will guide you through creating them using default values.
This setup ensures all required components — such as compute clusters, nodepools, and deployments — are properly included in your configuration context, which are described here.
Simply review each prompt and confirm to proceed.
Step 5: Run Inference
Once your local runner starts successfully, it will display a public URL where your model is hosted and accessible.
The CLI also generates an example client code snippet to help you quickly test the model. Simply run the snippet in a separate terminal (within the same directory) to receive the model’s response output.
Below is an example of running inference using the OpenAI-compatible format:
- Python
import os
from openai import OpenAI
# Initialize the OpenAI client with Clarifai's OpenAI-compatible endpoint
client = OpenAI(
base_url="https://api.clarifai.com/v2/ext/openai/v1",
api_key=os.environ['CLARIFAI_PAT'],
)
# Replace 'user-id' with your actual Clarifai user ID
response = client.chat.completions.create(
model="https://clarifai.com/user-id/local-runner-app/models/local-runner-model",
messages=[
{"role": "system", "content": "Talk like a pirate."},
{"role": "user", "content": "How do I check if a Python object is an instance of a class?"},
],
temperature=0.7,
stream=False, # Set to True for streaming responses
)
# Print the full response
print(response)
# Example for handling a streaming response:
# if stream=True, uncomment below to print chunks as they arrive
# for chunk in response:
# print(chunk.choices[0].message['content'], end='')
When you're done, just close the terminal running the local runner to shut it down.