OpenAI
Run OpenAI-compatible models locally and expose them via a public API
OpenAI's API specification has become the industry standard for interacting with large language models.
With Clarifai's Local Runners, you can deploy any OpenAI-compatible model locally, whether it's from OpenAI's official services, open-source alternatives, or your own fine-tuned models, and make them available via secure public endpoints.
This approach gives you the flexibility of OpenAI's familiar API interface while maintaining data privacy, reducing latency, and having full control over your deployment environment.
Note: After setting up your OpenAI-compatible model locally, you can upload it to Clarifai to leverage the platform's capabilities, such as versioning, monitoring, and auto‑scaling.
Step 1: Perform Prerequisites
Sign Up or Log In
Log in to your existing Clarifai account or sign up for a new one. Once logged in, you'll need the following credentials for setup:
- App ID – Navigate to the application you want to use to run the model and select the Overview option in the collapsible left sidebar. Get the app ID from there.
- User ID – In the collapsible left sidebar, select Settings and choose Account from the dropdown list. Then, locate your user ID.
- Personal Access Token (PAT) – From the same Settings option, choose Secrets to generate or copy your PAT. This token is used to authenticate your connection with the Clarifai platform.
You can then set the PAT as an environment variable using CLARIFAI_PAT.
- Unix-Like Systems
- Windows
export CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE
set CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE
Install Clarifai CLI
Install the latest version of the Clarifai CLI tool. It includes built-in support for Local Runners.
- Bash
pip install --upgrade clarifai
Note: You'll need Python 3.11 or 3.12 installed to successfully run the Local Runners.
Install the OpenAI Package
Install the openai package — it’s required to perform inference with models that support the OpenAI-compatible format.
- Bash
pip install openai
Set Up Your OpenAI-Compatible Server
Before initializing the model, ensure you have an OpenAI-compatible server running locally. Popular options include:
- vLLM — Usually runs at
http://localhost:8000/v1(default port: 8000, unless you specify--port). - LM Studio — Usually at
http://localhost:1234/v1(default port: 1234, shown in the LM Studio UI) - Ollama (OpenAI mode) — Usually at
http://localhost:11434/v1(default port: 11434, unless changed via the config or startup flags)
You’ll use this address in the model.py file.
Step 2: Initialize a Model
With the Clarifai CLI, you can set up any OpenAI-compatible model to work with your local server.
The command below scaffolds a default OpenAI-compatible model template in your current directory:
- Bash
clarifai model init --model-type-id openai
Note: You can initialize a model in a specific location by passing a
MODEL_PATH.
Example Output
clarifai model init --model-type-id openai
[INFO] 07:19:23.816900 Initializing model with default templates... | thread=8490328256
Press Enter to continue...
[INFO] 07:19:27.092345 Configuring OpenAI local runner... | thread=8490328256
Enter port (default: 8000):
[INFO] 07:19:31.983567 Created /Users/macbookpro/Desktop/code3/three/1/model.py | thread=8490328256
[INFO] 07:19:31.984366 Created /Users/macbookpro/Desktop/code3/three/requirements.txt | thread=8490328256
[INFO] 07:19:31.984757 Created /Users/macbookpro/Desktop/code3/three/config.yaml | thread=8490328256
[INFO] 07:19:31.984819 Model initialization complete in /Users/macbookpro/Desktop/code3/three | thread=8490328256
[INFO] 07:19:31.984863 Next steps: | thread=8490328256
[INFO] 07:19:31.984904 1. Search for '# TODO: please fill in' comments in the generated files | thread=8490328256
[INFO] 07:19:31.984946 2. Update the model configuration in config.yaml | thread=8490328256
[INFO] 07:19:31.984985 3. Add your model dependencies to requirements.txt | thread=8490328256
[INFO] 07:19:31.985023 4. Implement your model logic in 1/model.py | thread=8490328256
This command generates a model directory structure that's compatible with the Clarifai platform and configured to work with OpenAI-compatible APIs.
The generated structure includes:
├── 1/
│ └── model.py
├── requirements.txt
└── config.yaml
model.py
Example: model.py
from typing import List, Iterator
from openai import OpenAI
from clarifai.runners.models.openai_class import OpenAIModelClass
from clarifai.runners.utils.data_utils import Param
from clarifai.runners.utils.openai_convertor import build_openai_messages
class MyModel(OpenAIModelClass):
"""A custom model implementation using OpenAIModelClass."""
# TODO: please fill in
# Configure your OpenAI-compatible client for local model
client = OpenAI(
api_key="local-key", # TODO: please fill in - use your local API key
base_url="http://localhost:8000/v1", # TODO: please fill in - your local model server endpoint
)
# Automatically get the first available model
model = client.models.list().data[0].id
def load_model(self):
"""Optional: Add any additional model loading logic here."""
# TODO: please fill in (optional)
# Add any initialization logic if needed
pass
@OpenAIModelClass.method
def predict(
self,
prompt: str = "",
chat_history: List[dict] = None,
max_tokens: int = Param(default=256, description="The maximum number of tokens to generate. Shorter token lengths will provide faster performance."),
temperature: float = Param(default=1.0, description="A decimal number that determines the degree of randomness in the response"),
top_p: float = Param(default=1.0, description="An alternative to sampling with temperature, where the model considers the results of the tokens with top_p probability mass."),
) -> str:
"""Run a single prompt completion using the OpenAI client."""
# TODO: please fill in
# Implement your prediction logic here
messages = build_openai_messages(prompt, chat_history)
response = self.client.chat.completions.create(
model=self.model,
messages=messages,
max_completion_tokens=max_tokens,
temperature=temperature,
top_p=top_p,
)
return response.choices[0].message.content
@OpenAIModelClass.method
def generate(
self,
prompt: str = "",
chat_history: List[dict] = None,
max_tokens: int = Param(default=256, description="The maximum number of tokens to generate. Shorter token lengths will provide faster performance."),
temperature: float = Param(default=1.0, description="A decimal number that determines the degree of randomness in the response"),
top_p: float = Param(default=1.0, description="An alternative to sampling with temperature, where the model considers the results of the tokens with top_p probability mass."),
) -> Iterator[str]:
"""Stream a completion response using the OpenAI client."""
# TODO: please fill in
# Implement your streaming logic here
messages = build_openai_messages(prompt, chat_history)
stream = self.client.chat.completions.create(
model=self.model,
messages=messages,
max_completion_tokens=max_tokens,
temperature=temperature,
top_p=top_p,
stream=True,
)
for chunk in stream:
if chunk.choices:
text = (chunk.choices[0].delta.content
if (chunk and chunk.choices[0].delta.content) is not None else '')
yield text
The model.py file, located inside the 1 folder, acts as the bridge between Clarifai’s model execution environment and your local (or remote) OpenAI-compatible server.
It includes an extension of the Clarifai’s OpenAIModelClass. This base class is designed specifically for wrapping OpenAI-compatible model servers and exposing them through Clarifai’s inference infrastructure.
This class implements the following:
- The
OpenAIclient, which connects to your model server (e.g. vLLM, LM Studio, Ollama) via its/v1API endpoint. - The
predict()method, which handles standard (non-streaming) chat completions. - The
generate()method, which supports streaming token generation. build_openai_messages(), which automatically converts Clarifai inputs into OpenAI-compatible message format.
These are the key components you can configure:
base_url: Your local model server endpoint address (default:http://localhost:8000/v1).
Note: During model initialization, the CLI will prompt you to choose a port, and this value will be automatically updated in the file to match your selection.
api_key: Required only when calling the model through OpenAI’s hosted API. If you’re using a local OpenAI-compatible server, an API key isn’t needed — you can simply provide any dummy value.model: The ID of the model to use. You can leave this as is to let it be automatically detected from your OpenAI-compatible server, or explicitly set it to a specific model ID (for example,"gpt-4").
config.yaml
Example: config.yaml
# Configuration file for your Clarifai model
model:
id: "my-model" # TODO: please fill in - replace with your model ID
user_id: "user-id" # TODO: please fill in - replace with your user ID
app_id: "app_id" # TODO: please fill in - replace with your app ID
model_type_id: "any-to-any" # TODO: please fill in - replace if different model type ID
build_info:
python_version: "3.12"
# TODO: please fill in - adjust compute requirements for your model
inference_compute_info:
cpu_limit: "1" # TODO: please fill in - Amount of CPUs to use as a limit
cpu_memory: "1Gi" # TODO: please fill in - Amount of CPU memory to use as a limit
cpu_requests: "0.5" # TODO: please fill in - Amount of CPUs to use as a minimum
cpu_memory_requests: "512Mi" # TODO: please fill in - Amount of CPU memory to use as a minimum
num_accelerators: 1 # TODO: please fill in - Amount of GPU/TPUs to use
accelerator_type: ["NVIDIA-*"] # TODO: please fill in - type of accelerators requested
accelerator_memory: "1Gi" # TODO: please fill in - Amount of accelerator/GPU memory to use as a minimum
# TODO: please fill in (optional) - add checkpoints section if needed
# checkpoints:
# type: "huggingface" # supported type
# repo_id: "your-model-repo" # for huggingface like openai/gpt-oss-20b
# # hf_token: "your-huggingface-token" # if private repo
# when: "runtime" # or "build", "upload"
The config.yaml file tells Clarifai how to run your OpenAI-compatible custom model — including where it will live on the platform, how it’s served, and what compute resources it needs.
- It specifies where your model will run using values like
id(your chosen model name),user_id(set by default from your active context),app_id, andmodel_type_id. - In the
build_infosection, specify your configure environment settings, such as the Python version required by your OpenAI model implementation. - In the
inference_compute_infosection, specify the compute resources your model should use — including CPU, memory, and optional accelerators (like GPUs) — ensuring your OpenAI-compatible service has the right performance and scalability characteristics.
When to use checkpoints: Most OpenAI models are accessed via the API, so you won’t need a checkpoints block. If you are serving a self‑hosted Hugging Face model, you can uncomment the checkpoints section and set the required values.
requirements.txt
Example: requirements.txt
# Clarifai SDK - required
clarifai>=11.10.2
openai
# TODO: please fill in - add your model's dependencies here
# Examples:
# torch>=2.0.0
# transformers>=4.30.0
# numpy>=1.21.0
# pillow>=9.0.0
The requirements.txt file specifies all the Python dependencies your model needs to run. If these packages are not already installed in your environment, install them by running the following command:
- Bash
pip install -r requirements.txt
Step 3: Log In to Clarifai
Run the following command to log in to the Clarifai platform, create a configuration context, and establish a connection:
clarifai login
You'll be prompted to provide:
- User ID – Enter your Clarifai user ID.
- PAT – Enter your Clarifai PAT. If you've already set the
CLARIFAI_PATenvironment variable, typeENVVARto use it automatically. - Context name – Assign a custom name to this configuration context, or press Enter to accept the default name,
"default".
Example Output
clarifai login
Enter your Clarifai user ID: user-id
> To authenticate, you'll need a Personal Access Token (PAT).
> You can create one from your account settings: https://clarifai.com/alfrick/settings/security
Enter your Personal Access Token (PAT) value (or type "ENVVAR" to use an environment variable): ENVVAR
> Verifying token...
[INFO] 13:59:43.543035 Validating the Context Credentials... | thread=8490328256
[INFO] 13:59:44.940556 ✅ Context is valid | thread=8490328256
> Let's save these credentials to a new context.
> You can have multiple contexts to easily switch between accounts or projects.
Enter a name for this context [default]:
✅ Success! You are now logged in.
Credentials saved to the 'default' context.
💡 To switch contexts later, use `clarifai config use-context <name>`.
[INFO] 13:59:46.641774 Login successful for user 'alfrick' in context 'default' | thread=8490328256
Step 4: Start Your Local Runner
Start a local runner with the following command:
clarifai model local-runner
The CLI will guide you through creating any necessary context configurations with default values, ensuring all components (compute clusters, nodepools, deployments) are properly set up.
Example Output
clarifai model local-runner
[INFO] 11:23:11.406057 Detected OpenAI chat completions for Clarifai model streaming - validating stream_options... | thread=8821432512
[ERROR] 11:23:11.406350 Missing configuration to track usage for OpenAI chat completion calls. Go to your model scripts and make sure to set both: 1) stream_options={'include_usage': True}2) set_output_context | thread=8821432512
[INFO] 11:23:11.406705 > Checking local runner requirements... | thread=8821432512
[INFO] 11:23:11.428851 Checking 2 dependencies... | thread=8821432512
[INFO] 11:23:11.429253 ✅ All 2 dependencies are installed! | thread=8821432512
[INFO] 11:23:11.431322 > Verifying local runner setup... | thread=8821432512
[INFO] 11:23:11.431374 Current context: default | thread=8821432512
[INFO] 11:23:11.431406 Current user_id: alfrick | thread=8821432512
[INFO] 11:23:11.431432 Current PAT: d6974**** | thread=8821432512
[INFO] 11:23:11.433893 Current compute_cluster_id: local-runner-compute-cluster | thread=8821432512
[WARNING] 11:23:14.002018 Failed to get compute cluster with ID 'local-runner-compute-cluster':
code: CONN_DOES_NOT_EXIST
description: "Resource does not exist"
details: "ComputeCluster with ID \'local-runner-compute-cluster\' not found. Check your request fields."
req_id: "sdk-python-11.10.2-34a60189eb514b8b9085ba741a13a7ca"
| thread=8821432512
Compute cluster not found. Do you want to create a new compute cluster alfrick/local-runner-compute-cluster? (y/n): y
[INFO] 11:23:26.498698 Compute Cluster with ID 'local-runner-compute-cluster' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.10.2-a3c367c30482463aa0f23089d8971d7a"
| thread=8821432512
[INFO] 11:23:26.508120 Current nodepool_id: local-runner-nodepool | thread=8821432512
[WARNING] 11:23:29.251200 Failed to get nodepool with ID 'local-runner-nodepool':
code: CONN_DOES_NOT_EXIST
description: "Resource does not exist"
details: "Nodepool not found. Check your request fields."
req_id: "sdk-python-11.10.2-c841763ca1d54452984432a6642ec030"
| thread=8821432512
Nodepool not found. Do you want to create a new nodepool alfrick/local-runner-compute-cluster/local-runner-nodepool? (y/n): y
[INFO] 11:23:32.994964 Nodepool with ID 'local-runner-nodepool' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.10.2-0cd9f37bc175431aa57626ac709b6490"
| thread=8821432512
[INFO] 11:23:33.009030 Current app_id: local-runner-app | thread=8821432512
[WARNING] 11:23:33.330525 Failed to get app with ID 'local-runner-app':
code: CONN_DOES_NOT_EXIST
description: "Resource does not exist"
details: "app identified by path /users/alfrick/apps/local-runner-app not found"
req_id: "sdk-python-11.10.2-05940fc72d46478d8309275b3ccf788e"
| thread=8821432512
App not found. Do you want to create a new app alfrick/local-runner-app? (y/n): y
[INFO] 11:23:36.874801 App with ID 'local-runner-app' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.10.2-da9c70e006774b7a9ab5ae47b339cd6d"
| thread=8821432512
[INFO] 11:23:36.887817 Current model_id: local-runner-model | thread=8821432512
[WARNING] 11:23:38.066408 Failed to get model with ID 'local-runner-model':
code: MODEL_DOES_NOT_EXIST
description: "Model does not exist"
details: "Model \'local-runner-model\' does not exist."
req_id: "sdk-python-11.10.2-a2157a2599f747559e1f9fcc1d459247"
| thread=8821432512
Model not found. Do you want to create a new model alfrick/local-runner-app/models/local-runner-model? (y/n): y
[INFO] 11:23:42.481867 Model with ID 'local-runner-model' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.10.2-a2949b4d81304deda6cb897a05b89cfd"
| thread=8821432512
[WARNING] 11:23:44.422210 No model versions found. Creating a new version for local runner. | thread=8821432512
[INFO] 11:23:45.628623 Model Version with ID '36bde5dcf7c24317a08d1366a8cc5757' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.10.2-1cf881046e2047b795533c4f08da5c7e"
| thread=8821432512
[INFO] 11:23:46.723898 Current model version 36bde5dcf7c24317a08d1366a8cc5757 | thread=8821432512
[INFO] 11:23:46.724179 Creating the local runner tying this 'alfrick/local-runner-app/models/local-runner-model' model (version: 36bde5dcf7c24317a08d1366a8cc5757) to the 'alfrick/local-runner-compute-cluster/local-runner-nodepool' nodepool. | thread=8821432512
[INFO] 11:23:48.660432 Runner with ID '805fa3de93d341d7aaac0aed94786236' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.10.2-a613867263b94219ba9fc558f29c1661"
| thread=8821432512
[INFO] 11:23:48.670950 Current runner_id: 805fa3de93d341d7aaac0aed94786236 | thread=8821432512
[WARNING] 11:23:48.931127 Failed to get deployment with ID local-runner-deployment:
code: CONN_DOES_NOT_EXIST
description: "Resource does not exist"
details: "Deployment with ID \'local-runner-deployment\' not found. Check your request fields."
req_id: "sdk-python-11.10.2-28999059cf6e42b4ab868dcca10b4201"
| thread=8821432512
Deployment not found. Do you want to create a new deployment alfrick/local-runner-compute-cluster/local-runner-nodepool/local-runner-deployment? (y/n): y
[INFO] 11:23:53.891169 Deployment with ID 'local-runner-deployment' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.10.2-8a6701fa26e64acc8109e6affef1df28"
| thread=8821432512
[INFO] 11:23:53.902286 Current deployment_id: local-runner-deployment | thread=8821432512
[INFO] 11:23:53.902438 Current model section of config.yaml: {'id': 'my-model', 'user_id': 'alfrick', 'app_id': 'app_id', 'model_type_id': 'any-to-any'} | thread=8821432512
Do you want to backup config.yaml to config.yaml.bk then update the config.yaml with the new model information? (y/n): y
[INFO] 11:23:57.187239 Checking 2 dependencies... | thread=8821432512
[INFO] 11:23:57.188280 ✅ All 2 dependencies are installed! | thread=8821432512
[INFO] 11:23:57.188387 ✅ Starting local runner... | thread=8821432512
[INFO] 11:23:57.188475 No secrets path configured, running without secrets | thread=8821432512
[INFO] 11:23:58.334211 Detected OpenAI chat completions for Clarifai model streaming - validating stream_options... | thread=8821432512
[ERROR] 11:23:58.334788 Missing configuration to track usage for OpenAI chat completion calls. Go to your model scripts and make sure to set both: 1) stream_options={'include_usage': True}2) set_output_context | thread=8821432512
[INFO] 11:23:58.358199 ModelServer initialized successfully | thread=8821432512
[INFO] 11:23:58.385791 ✅ Your model is running locally and is ready for requests from the API...
| thread=8821432512
[INFO] 11:23:58.385872 > Code Snippet: To call your model via the API, use this code snippet:
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.clarifai.com/v2/ext/openai/v1",
api_key=os.environ['CLARIFAI_PAT'],
)
response = client.chat.completions.create(
model="https://clarifai.com/alfrick/local-runner-app/models/local-runner-model",
messages=[
{"role": "system", "content": "Talk like a pirate."},
{
"role": "user",
"content": "How do I check if a Python object is an instance of a class?",
},
],
temperature=1.0,
stream=False, # stream=True also works, just iterator over the response
)
print(response)
| thread=8821432512
[INFO] 11:23:58.385916 > Playground: To chat with your model, visit: https://clarifai.com/playground?model=local-runner-model__36bde5dcf7c24317a08d1366a8cc5757&user_id=alfrick&app_id=local-runner-app
| thread=8821432512
[INFO] 11:23:58.385946 > API URL: To call your model via the API, use this model URL: https://clarifai.com/alfrick/local-runner-app/models/local-runner-model
| thread=8821432512
[INFO] 11:23:58.385966 Press CTRL+C to stop the runner.
| thread=8821432512
[INFO] 11:23:58.385994 Starting 32 threads... | thread=8821432512
Tip: If your underlying model is running on a specific port (like 8000), ensure your
model.pypoints to that port, and that the Local Runner does not try to bind to the same port.
Step 5: Test Your Runner
Once the local runner starts, it provides a sample client code snippet for testing. You can run this in a separate terminal within the same directory.
Here's an example test snippet:
- Python SDK
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.clarifai.com/v2/ext/openai/v1",
api_key=os.environ['CLARIFAI_PAT'],
)
response = client.chat.completions.create(
model="https://clarifai.com/user-id/local-runner-app/models/local-runner-model",
messages=[
{"role": "system", "content": "Talk like a pirate."},
{
"role": "user",
"content": "How do I check if a Python object is an instance of a class?",
},
],
temperature=1.0,
stream=False, # stream=True also works, just iterator over the response
)
print(response)
Example Output
ChatCompletion(
id='bf90c9f0a20e44d796780d35360d3951',
choices=[
Choice(
finish_reason='stop',
index=0,
logprobs=None,
message=ChatCompletionMessage(
content=(
"Yer lookin' fer a way to check if a Python object be an instance "
"o' a class, eh?\n\n"
"In Python, ye can use the `type()` function or `isinstance()` method "
"to determine if a variable be o' a certain type. Here be some ways "
"to do it:\n\n"
"### Method 1: Using `type()`\n\n"
"```python\n"
"x = \"Hello\"\n"
"y = [1, 2, 3]\n\n"
"if type(x) == str:\n"
" print(\"x is a string\")\n"
"elif type(y) == list:\n"
" print(\"y is a list\")\n"
"else:\n"
" print(\"Unknown type\")\n"
"```\n\n"
"### Method 2: Using `isinstance()`\n\n"
"```python\n"
"class Person:\n"
" def __init__(self, name):\n"
" self.name = name\n\n"
"p = Person(\"Pirate\")\n"
"if isinstance(p, Person):\n"
" print(\"p be an instance o' the Person class\")\n\n"
"x = \"Hello\"\n"
"y = [1, 2, 3]\n"
"```\n\n"
"### Method 3: Constructor Check with Class Definition (Not Recommended)\n\n"
"```python\n"
"class MyShip:\n"
" def __init__(self, speed):\n"
" self.speed = speed\n\n"
"if obj is MyShip(some_speed):\n"
" print(f\"it's a ship made by {some_speed}\")\n"
"else:\n"
" pass # object is not of type MyShip.\n"
"```\n\n"
"Note: Python doesn’t use 'object' in the generic sense; everything is an instance of a class."
),
refusal=None,
role='assistant',
annotations=None,
audio=None,
function_call=None,
tool_calls=None
)
)
],
created=1764578049,
model='llama3.2:latest',
object='chat.completion',
service_tier=None,
system_fingerprint='fp_ollama',
usage=CompletionUsage(
completion_tokens=303,
prompt_tokens=45,
total_tokens=348,
completion_tokens_details=None,
prompt_tokens_details=None
)
)
That’s it!
When you’re done testing, simply stop the terminal running the local development runner and the process hosting your OpenAI-compatible server.