Run Ollama Models Locally
Run Ollama models locally and make them available via a public API
Ollama is an open-source tool that allows you to download, run, and manage large language models (LLMs) directly on your local machine.
When combined with Clarifai’s Local Runners, it enables you to run Ollama models on your machine, expose them securely via a public URL, and tap into Clarifai’s powerful platform — all while keeping the speed, privacy, and control of local deployment.
Step 1: Perform Prerequisites
Install Ollama
Go to the Ollama website and choose the appropriate installer for your system (macOS, Windows, or Linux).
Note: If you're using Windows, make sure to restart your machine after installing Ollama to ensure that the updated environment variables are properly applied.
Sign Up or Log In
Start by logging in to your existing Clarifai account or signing up for a new one. Once logged in, you'll need the following credentials for setup:
-
User ID – Navigate to your personal settings and find your user ID under the Account section.
-
Personal Access Token (PAT) – In the same personal settings page, go to the Security section to generate or copy your PAT. This token is used to securely authenticate your connection to the Clarifai platform.
You can then set the PAT as an environment variable using CLARIFAI_PAT
, which is important when running inference with your models.
- Unix-Like Systems
- Windows
export CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE
set CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE
Install the Clarifai CLI
Install the latest version of the Clarifai CLI, which includes built-in support for Local Runners.
- Bash
pip install --upgrade clarifai
Note: You must have Python 3.11 or 3.12 installed to use Local Runners.
Install OpenAI Package
Install the openai
package, which is required when performing inference with models using the OpenAI-compatible format.
- Python
pip install openai
Step 2: Initialize a Model From Ollama
You can use the Clarifai CLI to download and initialize any model available in the Ollama library directly into your local environment.
For example, here's how to initialize the llama3.2
model in your current directory:
- CLI
clarifai model init --toolkit ollama
Note: The above command will create a new model directory structure that is compatible with the Clarifai platform. You can customize or optimize the generated model by modifying the
1/model.py
file as needed.
Example Output
[INFO] 11:29:45.137222 Initializing model from GitHub repository: https://github.com/Clarifai/runners-examples | thread=8589516992
[INFO] 11:29:46.985596 Successfully cloned repository from https://github.com/Clarifai/runners-examples (branch: ollama) | thread=8589516992
[INFO] 11:29:46.991283 Model initialization complete with GitHub repository | thread=8589516992
[INFO] 11:29:46.991330 Next steps: | thread=8589516992
[INFO] 11:29:46.991358 1. Review the model configuration | thread=8589516992
[INFO] 11:29:46.991380 2. Install any required dependencies manually | thread=8589516992
[INFO] 11:29:46.991403 3. Test the model locally using 'clarifai model local-test' | thread=8589516992
You can customize model initialization from the Ollama library using the Clarifai CLI with the following options:
--model-name
– Name of the Ollama model to use (default:llama3.2
). This lets you specify any model from the Ollama library--port
– Port to run the model on (default:23333
)--context-length
– Context window size for the model in tokens (default:8192
)--verbose
– Enables detailed Ollama logs during execution. By default, logs are suppressed unless this flag is provided.
You can use Ollama commands such as ollama list
to list downloaded models and ollama rm
to remove a model. Run ollama --help
to see the full list of available commands.
Step 3: Log In to Clarifai
Use the following command to log in to the Clarifai platform to create a configuration context and establish a connection:
- CLI
clarifai login
After running the command, you'll be prompted to provide a few details for authentication:
- CLI
context name (default: "default"):
user id:
personal access token value (default: "ENVVAR" to get our env var rather than config):
Here’s what each field means:
- Context name – You can assign a custom name to this configuration context, or simply press Enter to use the default name,
"default"
. This is useful if you manage multiple environments or configurations. - User ID – Enter your Clarifai user ID.
- Personal Access Token (PAT) – Paste your Clarifai PAT here. If you've already set the
CLARIFAI_PAT
environment variable, you can just press Enter to use it automatically.
Step 4: Start Your Local Runner
Start a local runner using the following command:
- CLI
clarifai model local-runner
If the necessary context configurations aren’t detected, the CLI will guide you through creating them using default values.
This setup ensures all required components — such as compute clusters, nodepools, and deployments — are properly included in your configuration context, which are described here. Simply review each prompt and confirm to proceed.
Note: Use the
--verbose
option to show detailed logs from the Ollama server, which is helpful for debugging:clarifai model local-runner --verbose
.
Example Output
clarifai model local-runner --verbose
[INFO] 11:39:05.038034 > Checking local runner requirements... | thread=8589516992
[INFO] 11:39:05.063469 Checking 2 dependencies... | thread=8589516992
[INFO] 11:39:05.063902 ✅ All 2 dependencies are installed! | thread=8589516992
[INFO] 11:39:05.064096 Verifying Ollama installation... | thread=8589516992
[INFO] 11:39:05.099099 > Verifying local runner setup... | thread=8589516992
[INFO] 11:39:05.099300 Current context: default | thread=8589516992
[INFO] 11:39:05.099342 Current user_id: alfrick | thread=8589516992
[INFO] 11:39:05.099370 Current PAT: d6570**** | thread=8589516992
[INFO] 11:39:05.101001 Current compute_cluster_id: local-runner-compute-cluster | thread=8589516992
[WARNING] 11:39:06.191904 Failed to get compute cluster with ID 'local-runner-compute-cluster':
code: CONN_DOES_NOT_EXIST
description: "Resource does not exist"
details: "ComputeCluster with ID \'local-runner-compute-cluster\' not found. Check your request fields."
req_id: "sdk-python-11.7.2-56dfdebebd3d4f42bdff435838d050e1"
| thread=8589516992
Compute cluster not found. Do you want to create a new compute cluster alfrick/local-runner-compute-cluster? (y/n): y
[INFO] 11:39:12.426673 Compute Cluster with ID 'local-runner-compute-cluster' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.7.2-f972b78a4731420ea7018a5591854fe6"
| thread=8589516992
[INFO] 11:39:12.431533 Current nodepool_id: local-runner-nodepool | thread=8589516992
[WARNING] 11:39:13.391660 Failed to get nodepool with ID 'local-runner-nodepool':
code: CONN_DOES_NOT_EXIST
description: "Resource does not exist"
details: "Nodepool not found. Check your request fields."
req_id: "sdk-python-11.7.2-8660a719c7354675bc350e84ab6702c3"
| thread=8589516992
Nodepool not found. Do you want to create a new nodepool alfrick/local-runner-compute-cluster/local-runner-nodepool? (y/n): y
[INFO] 11:39:18.440918 Nodepool with ID 'local-runner-nodepool' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.7.2-35eae829232746da8e3c50e743bb90a0"
| thread=8589516992
[INFO] 11:39:18.457288 Current app_id: local-runner-app | thread=8589516992
[INFO] 11:39:18.790072 Current model_id: local-runner-model | thread=8589516992
[INFO] 11:39:23.772397 Current model version 9d38bb9398944de4bdef699835f17ec9 | thread=8589516992
[INFO] 11:39:23.772643 Creating the local runner tying this 'alfrick/local-runner-app/models/local-runner-model' model (version: 9d38bb9398944de4bdef699835f17ec9) to the 'alfrick/local-runner-compute-cluster/local-runner-nodepool' nodepool. | thread=8589516992
[INFO] 11:39:24.885380 Runner with ID 'db1794a7d250406badcad63cf4ce695c' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.7.2-8f319a34bd134cf7828c41fc9eb7d27d"
| thread=8589516992
[INFO] 11:39:24.893927 Current runner_id: db1794a7d250406badcad63cf4ce695c | thread=8589516992
[WARNING] 11:39:25.252771 Failed to get deployment with ID local-runner-deployment:
code: CONN_DOES_NOT_EXIST
description: "Resource does not exist"
details: "Deployment with ID \'local-runner-deployment\' not found. Check your request fields."
req_id: "sdk-python-11.7.2-4a9e10638c9f461688312571e0a2ceb8"
| thread=8589516992
Deployment not found. Do you want to create a new deployment alfrick/local-runner-compute-cluster/local-runner-nodepool/local-runner-deployment? (y/n): y
[INFO] 11:39:28.762471 Deployment with ID 'local-runner-deployment' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.7.2-afda1f4e53e14055a604c43e85ff6eba"
| thread=8589516992
[INFO] 11:39:28.769409 Current deployment_id: local-runner-deployment | thread=8589516992
[INFO] 11:39:28.771852 Current model section of config.yaml: {'app_id': 'local-dev-runner-app', 'id': 'local-dev-model', 'model_type_id': 'text-to-text', 'user_id': 'clarifai-user-id'} | thread=8589516992
Do you want to backup config.yaml to config.yaml.bk then update the config.yaml with the new model information? (y/n): y
[INFO] 11:39:33.362901 Checking 2 dependencies... | thread=8589516992
[INFO] 11:39:33.363602 ✅ All 2 dependencies are installed! | thread=8589516992
[INFO] 11:39:33.405524 Customizing Ollama model with provided parameters... | thread=8589516992
[INFO] 11:39:33.406114 ✅ Starting local runner... | thread=8589516992
[INFO] 11:39:34.440960 Detected OpenAI chat completions for Clarifai model streaming - validating stream_options... | thread=8589516992
[INFO] 11:39:34.448319 Starting Ollama server in the host: 127.0.0.1:23333 | thread=8589516992
[INFO] 11:39:34.461759 Model llama3.2 pulled successfully. | thread=8589516992
[INFO] 11:39:34.462122 Ollama server started successfully on 127.0.0.1:23333 | thread=8589516992
time=2025-08-22T11:39:34.473+03:00 level=INFO source=routes.go:1318 msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:8192 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:23333 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/macbookpro/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NEW_ESTIMATES:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]"
time=2025-08-22T11:39:34.473+03:00 level=INFO source=images.go:477 msg="total blobs: 0"
time=2025-08-22T11:39:34.473+03:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0"
time=2025-08-22T11:39:34.474+03:00 level=INFO source=routes.go:1371 msg="Listening on 127.0.0.1:23333 (version 0.11.6)"
time=2025-08-22T11:39:34.507+03:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=metal variant="" compute="" driver=0.0 name="" total="10.7 GiB" available="10.7 GiB"
time=2025-08-22T11:39:34.507+03:00 level=INFO source=routes.go:1412 msg="entering low vram mode" "total vram"="10.7 GiB" threshold="20.0 GiB"
[GIN] 2025/08/22 - 11:39:34 | 200 | 68.5µs | 127.0.0.1 | HEAD "/"
[INFO] 11:39:34.521195 Ollama model loaded successfully: llama3.2 | thread=8589516992
[INFO] 11:39:34.525089 ✅ Your model is running locally and is ready for requests from the API...
| thread=8589516992
[INFO] 11:39:34.525126 > Code Snippet: To call your model via the API, use this code snippet:
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.clarifai.com/v2/ext/openai/v1",
api_key=os.environ['CLARIFAI_PAT'],
)
response = client.chat.completions.create(
model="https://clarifai.com/alfrick/local-runner-app/models/local-runner-model",
messages=[
{"role": "system", "content": "Talk like a pirate."},
{
"role": "user",
"content": "How do I check if a Python object is an instance of a class?",
},
],
temperature=0.7,
stream=False, # stream=True also works, just iterator over the response
)
print(response)
| thread=8589516992
[INFO] 11:39:34.525153 > Playground: To chat with your model, visit: https://clarifai.com/playground?model=local-runner-model__9d38bb9398944de4bdef699835f17ec9&user_id=alfrick&app_id=local-runner-app
| thread=8589516992
[INFO] 11:39:34.525172 > API URL: To call your model via the API, use this model URL: https://clarifai.com/users/alfrick/apps/local-runner-app/models/local-runner-model
| thread=8589516992
[INFO] 11:39:34.525185 Press CTRL+C to stop the runner.
| thread=8589516992
[INFO] 11:39:34.525202 Starting 32 threads... | thread=8589516992
pulling manifest ⠏ time=2025-08-22T11:39:36.435+03:00 level=INFO source=download.go:177 msg="downloading dde5aa3fc5ff in 16 126 MB part(s)"
pulling manifest
pulling dde5aa3fc5ff: 1% ▕█ pulling manifest
pulling dde5aa3fc5ff: 1% ▕█ pulling manifest
pulling manifest
pulling manifest
pulling manifest
pulling dde5aa3fc5ff: 31% ▕█████████████████████████████████████████████████████████ pulling manifest
pulling dde5aa3fc5ff: 35% ▕██████ ▏ 712 MB/2.0 GB 2.9 MB/s 7m23stime=2025-08-22T11:45:20.033+03:00 level=INFO source=download.go:295 msg="dde5aa3pulling manifest
pulling dde5aa3fc5ff: 93% ▕████████████████ ▏ 1.9 GB/2.0 GB 3.1 MB/s 48stime=2025-08-22T11:53:56.586+03:00 level=INFO source=download.go:295 msg="dde5aa3pulling manifest
pulling dde5aa3fc5ff: 100% ▕██████████████████▏ 2.0 GB time=2025-08-22T11:54:47.224+03:00 level=INFO source=download.go:177 msg="downloapulling manifest
pulling dde5aa3fc5ff: 100% ▕██████████████████▏ 2.0 GB
pulling 966de95ca8a6: 100% ▕██████████████████▏ 1.4 KB tpulling manifest
pulling dde5aa3fc5ff: 100% ▕██████████████████▏ 2.0 GB
pulling 966de95ca8a6: 100% ▕██████████████████▏ 1.4 KB
pulling manifest
pulling dde5aa3fc5ff: 100% ▕██████████████████▏ 2.0 GB
pulling 966de95ca8a6: 100% ▕██████████████████ ▏ 1.4 KB
pulling manifest
pulling dde5aa3fc5ff: 100% ▕██████████████████▏ 2.0 GB
pulling 966de95ca8a6: 100% ▕██████████████████▏ 1.4 KB
pulling manifest
pulling dde5aa3fc5ff: 100% ▕██████████████████▏ 2.0 GB
pulling manifest
pulling dde5aa3fc5ff: 100% ▕██████████████████▏ 2.0 GB
pulling 966de95ca8a6: 100% ▕██████████████████▏ 1.4 KB
pulling fcc5a6bec9da: 100% ▕██████████████████▏ 7.7 KB
pulling a70ff7e570d9: 100% ▕██████████████████▏ 6.0 KB
pulling 56bb8bd477a5: 100% ▕██████████████████▏ 96 B
pulling 34bb5ab01051: 100% ▕██████████████████▏ 561 B
verifying sha256 digest
writing manifest
success
Step 5: Run Inference
When the local runner starts, it displays a public URL where your model is hosted and provides a sample client code snippet for quick testing.
Pulling a model from Ollama may take some time depending on your machine’s resources, but once the download finishes, you can run the snippet in a separate terminal within the same directory to get the model’s response.
Below is an example of running inference using the OpenAI-compatible format:
- Python
import os
from openai import OpenAI
# Initialize the OpenAI client with Clarifai's OpenAI-compatible endpoint
client = OpenAI(
base_url="https://api.clarifai.com/v2/ext/openai/v1",
api_key=os.environ['CLARIFAI_PAT'],
)
# Replace 'user-id' with your actual Clarifai user ID
response = client.chat.completions.create(
model="https://clarifai.com/user-id/local-runner-app/models/local-runner-model",
messages=[
{"role": "system", "content": "Talk like a pirate."},
{"role": "user", "content": "How do I check if a Python object is an instance of a class?"},
],
temperature=0.7,
stream=False, # Set to True for streaming responses
)
# Print the full response
print(response)
# Example for handling a streaming response:
# if stream=True, uncomment below to print chunks as they arrive
# for chunk in response:
# print(chunk.choices[0].message['content'], end='')
Example Output
ChatCompletion(id='chatcmpl-79', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Ye be wantin\' to know how to check if a Python object be an instance o\' a class, eh?\n\nWell, matey, ye can use the `type()` function and the `isinstance()` function. Here be the ways:\n\n**Method 1: Using `type()`**\n```python\nmy_object = "Hello, world!"\nif type(my_object) == str:\n
print("Aye, it\'s a string!")\n```\nIn this example, we use `type()` to get the type o\' `my_object`, which is indeed `str`. So, we can check if it\'s equal to `str` using the `==` operator.\n\n**Method 2: Using `isinstance()`**\n```python\nmy_object = "Hello, world!"\nif isinstance(my_object, str):\n print("Aye, it\'s a string!")\n```\nIn this example, we use `isinstance()` to check if `my_object` be an instance o\' the `str` class.
This method is more readable and Pythonic, matey!\n\n**Method 3: Using f-strings**\n```python\nmy_object = "Hello, world!"\nif type(my_object) == str:\n print(f"{my_object} be a string!")\n```\nOr,\n```python\nmy_object = "Hello, world!"\nif isinstance(my_object, str):\n
print(f"{my_object} be an instance o\' the {type(my_object).__name__} class!")\n```\nIn these examples, we use f-strings to format the output.\n\nSo, hoist the sails and set course for type checking, me hearty!', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None))], created=1755853613, model='llama3.2', object='chat.completion', service_tier=None, system_fingerprint='fp_ollama', usage=CompletionUsage(completion_tokens=338, prompt_tokens=45, total_tokens=383, completion_tokens_details=None, prompt_tokens_details=None))
When you're done, just close the terminal running the local runner to shut it down.