Run Ollama Models Locally
Run Ollama models locally and make them available via a public API
Ollama is an open-source tool that allows you to download, run, and manage large language models (LLMs) directly on your local machine.
When combined with Clarifai’s Local Runners, it enables you to run Ollama models on your machine, expose them securely via a public URL, and tap into Clarifai’s powerful platform — all while keeping the speed, privacy, and control of local deployment.
Step 1: Perform Prerequisites
Install Ollama
Go to the Ollama website and choose the appropriate installer for your system (macOS, Windows, or Linux).
Note: If you're using Windows, make sure to restart your machine after installing Ollama to ensure that the updated environment variables are properly applied.
Sign Up or Log In
Start by logging in to your existing Clarifai account or signing up for a new one. Once logged in, you'll need the following credentials for setup:
-
User ID – Navigate to your personal settings and find your user ID under the Account section.
-
Personal Access Token (PAT) – In the same personal settings page, go to the Security section to generate or copy your PAT. This token is used to securely authenticate your connection to the Clarifai platform.
You can then set the PAT as an environment variable using CLARIFAI_PAT
, which is important when running inference with your models.
- Unix-Like Systems
- Windows
export CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE
set CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE
Install the Clarifai CLI
Install the latest version of the Clarifai CLI, which includes built-in support for Local Runners.
- Bash
pip install --upgrade clarifai
Note: You must have Python 3.11 or 3.12 installed to use Local Runners.
Install OpenAI Package
Install the openai
package, which is required when performing inference with models using the OpenAI-compatible format.
- Python
pip install openai
Step 2: Initialize a Model From Ollama
You can use the Clarifai CLI to download and initialize any model available in the Ollama library directly into your local environment.
For example, here's how to initialize the llama3.2
model in your current directory:
- CLI
clarifai model init --toolkit ollama
Note: The above command will create a new model directory structure that is compatible with the Clarifai platform. You can customize or optimize the generated model by modifying the
1/model.py
file as needed.
Example Output
clarifai model init --toolkit ollama
[INFO] 15:58:15.587351 Parsed GitHub repository: owner=Clarifai, repo=runners-examples, branch=ollama, folder_path= | thread=8800297152
[INFO] 15:58:16.827976 Files to be downloaded are:
1. 1/model.py
2. config.yaml
3. requirements.txt | thread=8800297152
Press Enter to continue...
[INFO] 15:58:24.007602 Initializing model from GitHub repository: https://github.com/Clarifai/runners-examples | thread=8800297152
[INFO] 15:58:31.263139 Successfully cloned repository from https://github.com/Clarifai/runners-examples (branch: ollama) | thread=8800297152
[INFO] 15:58:31.270469 Model initialization complete with GitHub repository | thread=8800297152
[INFO] 15:58:31.270527 Next steps: | thread=8800297152
[INFO] 15:58:31.270560 1. Review the model configuration | thread=8800297152
[INFO] 15:58:31.270584 2. Install any required dependencies manually | thread=8800297152
[INFO] 15:58:31.270608 3. Test the model locally using 'clarifai model local-test' | thread=8800297152
You can customize model initialization from the Ollama library using the Clarifai CLI with the following options:
--model-name
– Name of the Ollama model to use (default:llama3.2
). This lets you specify any model from the Ollama library. Example:clarifai model init --toolkit ollama --model-name gpt-oss:20b
--port
– Port to run the model on (default:23333
)--context-length
– Context window size for the model in tokens (default:8192
)--verbose
– Enables detailed Ollama logs during execution. By default, logs are suppressed unless this flag is provided.
You can use Ollama commands such as ollama list
to list downloaded models and ollama rm
to remove a model. Run ollama --help
to see the full list of available commands.
Step 3: Log In to Clarifai
Use the following command to log in to the Clarifai platform to create a configuration context and establish a connection:
- CLI
clarifai login
After running the command, you'll be prompted to provide a few details for authentication:
- CLI
context name (default: "default"):
user id:
personal access token value (default: "ENVVAR" to get our env var rather than config):
Here’s what each field means:
- Context name – You can assign a custom name to this configuration context, or simply press Enter to use the default name,
"default"
. This is useful if you manage multiple environments or configurations. - User ID – Enter your Clarifai user ID.
- Personal Access Token (PAT) – Paste your Clarifai PAT here. If you've already set the
CLARIFAI_PAT
environment variable, you can just press Enter to use it automatically.
Step 4: Start Your Local Runner
Start a local runner using the following command:
- CLI
clarifai model local-runner
If the necessary context configurations aren’t detected, the CLI will guide you through creating them using default values.
This setup ensures all required components — such as compute clusters, nodepools, and deployments — are properly included in your configuration context, which are described here. Simply review each prompt and confirm to proceed.
Note: Use the
--verbose
option to show detailed logs from the Ollama server, which is helpful for debugging:clarifai model local-runner --verbose
.
Example Output
clarifai model local-runner
[INFO] 16:01:28.904230 > Checking local runner requirements... | thread=8800297152
[INFO] 16:01:28.928129 Checking 2 dependencies... | thread=8800297152
[INFO] 16:01:28.928672 ✅ All 2 dependencies are installed! | thread=8800297152
[INFO] 16:01:28.928886 Verifying Ollama installation... | thread=8800297152
[INFO] 16:01:29.004234 > Verifying local runner setup... | thread=8800297152
[INFO] 16:01:29.004427 Current context: default | thread=8800297152
[INFO] 16:01:29.004463 Current user_id: alfrick | thread=8800297152
[INFO] 16:01:29.004490 Current PAT: d6570**** | thread=8800297152
[INFO] 16:01:29.005945 Current compute_cluster_id: local-runner-compute-cluster | thread=8800297152
[WARNING] 16:01:35.936440 Failed to get compute cluster with ID 'local-runner-compute-cluster':
code: CONN_DOES_NOT_EXIST
description: "Resource does not exist"
details: "ComputeCluster with ID \'local-runner-compute-cluster\' not found. Check your request fields."
req_id: "sdk-python-11.8.2-75ca9226003a4b34a770885b119d5814"
| thread=8800297152
Compute cluster not found. Do you want to create a new compute cluster alfrick/local-runner-compute-cluster? (y/n): y
[INFO] 16:01:58.382096 Compute Cluster with ID 'local-runner-compute-cluster' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.8.2-a0474b8ba93c4e069804500a188694db"
| thread=8800297152
[INFO] 16:01:58.391571 Current nodepool_id: local-runner-nodepool | thread=8800297152
[WARNING] 16:02:00.633687 Failed to get nodepool with ID 'local-runner-nodepool':
code: CONN_DOES_NOT_EXIST
description: "Resource does not exist"
details: "Nodepool not found. Check your request fields."
req_id: "sdk-python-11.8.2-62149d46e7104d35bcb2a36546710329"
| thread=8800297152
Nodepool not found. Do you want to create a new nodepool alfrick/local-runner-compute-cluster/local-runner-nodepool? (y/n): y
[INFO] 16:02:03.909005 Nodepool with ID 'local-runner-nodepool' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.8.2-05836a431940495b94b9c3691f6c6d4d"
| thread=8800297152
[INFO] 16:02:03.921694 Current app_id: local-runner-app | thread=8800297152
[INFO] 16:02:04.203774 Current model_id: local-runner-model | thread=8800297152
[WARNING] 16:02:10.933734 Attempting to patch latest version: 9d38bb9398944de4bdef699835f17ec9 | thread=8800297152
[INFO] 16:02:14.195999 Successfully patched version 9d38bb9398944de4bdef699835f17ec9 | thread=8800297152
[INFO] 16:02:14.197924 Current model version 9d38bb9398944de4bdef699835f17ec9 | thread=8800297152
[WARNING] 16:02:18.679567 Failed to get runner with ID 'f3c46913186449ba99dedd38123d47a3':
code: CONN_DOES_NOT_EXIST
description: "Resource does not exist"
details: "Runner not found. Check your request fields."
req_id: "sdk-python-11.8.2-0b3b241c76ef429290c8c54a318f2f21"
| thread=8800297152
[INFO] 16:02:18.679913 Creating the local runner tying this 'alfrick/local-runner-app/models/local-runner-model' model (version: 9d38bb9398944de4bdef699835f17ec9) to the 'alfrick/local-runner-compute-cluster/local-runner-nodepool' nodepool. | thread=8800297152
[INFO] 16:02:19.757117 Runner with ID '2f84d7194ee8464fad485fd058663fe5' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.8.2-8ac525dc13ec47629213f0283e89c6a7"
| thread=8800297152
[INFO] 16:02:19.765198 Current runner_id: 2f84d7194ee8464fad485fd058663fe5 | thread=8800297152
[WARNING] 16:02:20.331980 Failed to get deployment with ID local-runner-deployment:
code: CONN_DOES_NOT_EXIST
description: "Resource does not exist"
details: "Deployment with ID \'local-runner-deployment\' not found. Check your request fields."
req_id: "sdk-python-11.8.2-864f55e6bc554614894ee031cc30cdb9"
| thread=8800297152
Deployment not found. Do you want to create a new deployment alfrick/local-runner-compute-cluster/local-runner-nodepool/local-runner-deployment? (y/n): y
[INFO] 16:02:25.016935 Deployment with ID 'local-runner-deployment' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.8.2-b6bf9aad5aa545f8a041113608fc9365"
| thread=8800297152
[INFO] 16:02:25.024579 Current deployment_id: local-runner-deployment | thread=8800297152
[INFO] 16:02:25.027108 Current model section of config.yaml: {'app_id': 'local-dev-runner-app', 'id': 'local-dev-model', 'model_type_id': 'text-to-text', 'user_id': 'clarifai-user-id'} | thread=8800297152
Do you want to backup config.yaml to config.yaml.bk then update the config.yaml with the new model information? (y/n): y
[INFO] 16:02:27.407724 Checking 2 dependencies... | thread=8800297152
[INFO] 16:02:27.408555 ✅ All 2 dependencies are installed! | thread=8800297152
[INFO] 16:02:27.451117 Customizing Ollama model with provided parameters... | thread=8800297152
[INFO] 16:02:27.451785 ✅ Starting local runner... | thread=8800297152
[INFO] 16:02:27.451852 No secrets path configured, running without secrets | thread=8800297152
[INFO] 16:02:30.020253 Detected OpenAI chat completions for Clarifai model streaming - validating stream_options... | thread=8800297152
[INFO] 16:02:30.027464 Starting Ollama server in the host: 127.0.0.1:23333 | thread=8800297152
[INFO] 16:02:30.040882 Model llama3.2 pulled successfully. | thread=8800297152
[INFO] 16:02:30.041191 Ollama server started successfully on 127.0.0.1:23333 | thread=8800297152
[INFO] 16:02:30.096053 Ollama model loaded successfully: llama3.2 | thread=8800297152
[INFO] 16:02:30.096133 ModelServer initialized successfully | thread=8800297152
Exception in thread Thread-1 (serve_health):
Traceback (most recent call last):
File "/opt/homebrew/Cellar/python@3.12/3.12.11/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py", line 1075, in _bootstrap_inner
[INFO] 16:02:30.100802 ✅ Your model is running locally and is ready for requests from the API...
| thread=8800297152
[INFO] 16:02:30.100873 > Code Snippet: To call your model via the API, use this code snippet:
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.clarifai.com/v2/ext/openai/v1",
api_key=os.environ['CLARIFAI_PAT'],
)
response = client.chat.completions.create(
model="https://clarifai.com/alfrick/local-runner-app/models/local-runner-model",
messages=[
{"role": "system", "content": "Talk like a pirate."},
{
"role": "user",
"content": "How do I check if a Python object is an instance of a class?",
},
],
temperature=1.0,
stream=False, # stream=True also works, just iterator over the response
)
print(response)
| thread=8800297152
[INFO] 16:02:30.100944 > Playground: To chat with your model, visit: https://clarifai.com/playground?model=local-runner-model__9d38bb9398944de4bdef699835f17ec9&user_id=alfrick&app_id=local-runner-app
| thread=8800297152
self.run()
[INFO] 16:02:30.101006 > API URL: To call your model via the API, use this model URL: https://clarifai.com/alfrick/local-runner-app/models/local-runner-model
| thread=8800297152
File "/opt/homebrew/Cellar/python@3.12/3.12.11/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py", line 1012, in run
[INFO] 16:02:30.101070 Press CTRL+C to stop the runner.
| thread=8800297152
[INFO] 16:02:30.101117 Starting 32 threads... | thread=8800297152
Step 5: Test Your Runner
When the local runner starts, it displays a public URL where your model is hosted and provides a sample client code snippet for quick testing.
Pulling a model from Ollama may take some time depending on your machine’s resources, but once the download finishes, you can run the snippet in a separate terminal within the same directory to get the model’s response.
Below is an example snippet for running inference using the OpenAI-compatible format:
- Python
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.clarifai.com/v2/ext/openai/v1",
api_key=os.environ['CLARIFAI_PAT'],
)
response = client.chat.completions.create(
model="https://clarifai.com/alfrick/local-runner-app/models/local-runner-model",
messages=[
{"role": "system", "content": "Talk like a pirate."},
{
"role": "user",
"content": "How do I check if a Python object is an instance of a class?",
},
],
temperature=1.0,
stream=False, # stream=True also works, just iterator over the response
)
print(response)
The terminal also shows a link to the AI Playground, which you can copy to interact with the model directly.
Alternatively, while your runner is active in the terminal, you can open the Runners dashboard on the Clarifai platform, locate your runner in the table, and select Open in Playground from the three-dot menu to start chatting with the model.
When you're done, just close the terminal running the local runner to shut it down.