Skip to main content

OpenAI

Run OpenAI-compatible models locally and expose them via a public API


OpenAI's API specification has become the industry standard for interacting with large language models.

With Clarifai's Local Runners, you can deploy any OpenAI-compatible model locally, whether it's from OpenAI's official services, open-source alternatives, or your own fine-tuned models, and make them available via secure public endpoints.

This approach gives you the flexibility of OpenAI's familiar API interface while maintaining data privacy, reducing latency, and having full control over your deployment environment.

Note: After setting up your OpenAI-compatible model locally, you can upload it to Clarifai to leverage the platform's capabilities, such as versioning, monitoring, and auto‑scaling.

Step 1: Perform Prerequisites

Get User ID and PAT

Start by logging in to your existing Clarifai account or signing up for a new one. Once logged in, you'll need your Personal Access Token (PAT) for authentication:

  • In the collapsible left sidebar, select Settings and choose Secrets to generate or copy your PAT.

You can then set the PAT as an environment variable using CLARIFAI_PAT.

export CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE

Install Clarifai CLI

Install the latest version of the Clarifai CLI tool. It includes built-in support for Local Runners.

pip install --upgrade clarifai

Note: You'll need Python 3.11 or 3.12 installed to successfully run the Local Runners.

Install the OpenAI Package

Install the openai package — it’s required to perform inference with models that support the OpenAI-compatible format.

pip install openai

Set Up Your OpenAI-Compatible Server

Before initializing the model, ensure you have an OpenAI-compatible server running locally. Popular options include:

  • vLLM — Usually runs at http://localhost:8000/v1 (default port: 8000, unless you specify --port).
  • LM Studio — Usually at http://localhost:1234/v1 (default port: 1234, shown in the LM Studio UI)
  • Ollama (OpenAI mode) — Usually at http://localhost:11434/v1 (default port: 11434, unless changed via the config or startup flags)

You’ll use this address in the model.py file.

Step 2: Initialize a Model

With the Clarifai CLI, you can set up any OpenAI-compatible model to work with your local server.

The command below scaffolds a default OpenAI-compatible model template. You can pass a MODEL_PATH to control where the directory is created:

clarifai model init --toolkit openai my-wrapper

Note: You can initialize a model in a specific location by passing a MODEL_PATH.

Example Output
clarifai model init --toolkit openai my-wrapper
[INFO] Initializing openai model from template...
[INFO] Created my-wrapper/1/model.py
[INFO] Created my-wrapper/requirements.txt
[INFO] Created my-wrapper/config.yaml

Model initialized in ./my-wrapper

1. Edit 1/model.py with your model logic
2. Add dependencies to requirements.txt

Test locally:
clarifai model serve ./my-wrapper
clarifai model serve ./my-wrapper --mode env # auto-create venv and install deps
clarifai model serve ./my-wrapper --mode container # run inside Docker

Deploy to Clarifai:
clarifai model deploy ./my-wrapper --instance a10g
clarifai list-instances # list available instances

This command generates a model directory structure that's compatible with the Clarifai platform and configured to work with OpenAI-compatible APIs.

The generated structure includes:

├── 1/
│ └── model.py
├── requirements.txt
└── config.yaml

model.py

Example: model.py
from typing import List, Iterator
from openai import OpenAI
from clarifai.runners.models.openai_class import OpenAIModelClass
from clarifai.runners.utils.data_utils import Param
from clarifai.runners.utils.openai_convertor import build_openai_messages

class MyModel(OpenAIModelClass):
"""Wraps an OpenAI-compatible API endpoint."""

client = OpenAI(
api_key="local-key",
base_url="http://localhost:8000/v1",
)

model = client.models.list().data[0].id

def load_model(self):
"""Optional initialization logic."""
pass

@OpenAIModelClass.method
def predict(
self,
prompt: str = "",
chat_history: List[dict] = None,
max_tokens: int = Param(default=256, description="The maximum number of tokens to generate."),
temperature: float = Param(default=1.0, description="Sampling temperature (higher = more random)."),
top_p: float = Param(default=1.0, description="Nucleus sampling threshold."),
) -> str:
"""Run a single prompt completion."""
messages = build_openai_messages(prompt, chat_history)
response = self.client.chat.completions.create(
model=self.model,
messages=messages,
max_completion_tokens=max_tokens,
temperature=temperature,
top_p=top_p,
)
return response.choices[0].message.content

@OpenAIModelClass.method
def generate(
self,
prompt: str = "",
chat_history: List[dict] = None,
max_tokens: int = Param(default=256, description="The maximum number of tokens to generate."),
temperature: float = Param(default=1.0, description="Sampling temperature (higher = more random)."),
top_p: float = Param(default=1.0, description="Nucleus sampling threshold."),
) -> Iterator[str]:
"""Stream a completion response."""
messages = build_openai_messages(prompt, chat_history)
stream = self.client.chat.completions.create(
model=self.model,
messages=messages,
max_completion_tokens=max_tokens,
temperature=temperature,
top_p=top_p,
stream=True,
)
for chunk in stream:
if chunk.choices:
text = (chunk.choices[0].delta.content
if (chunk and chunk.choices[0].delta.content) is not None else '')
yield text

The model.py file, located inside the 1 folder, acts as the bridge between Clarifai’s model execution environment and your local (or remote) OpenAI-compatible server.

It includes an extension of the Clarifai’s OpenAIModelClass. This base class is designed specifically for wrapping OpenAI-compatible model servers and exposing them through Clarifai’s inference infrastructure.

This class implements the following:

  • The OpenAI client, which connects to your model server (e.g. vLLM, LM Studio, Ollama) via its /v1 API endpoint.
  • The predict() method, which handles standard (non-streaming) chat completions.
  • The generate() method, which supports streaming token generation.
  • build_openai_messages(), which automatically converts Clarifai inputs into OpenAI-compatible message format.

These are the key components you can configure:

  • base_url: Your local model server endpoint address (default: http://localhost:8000/v1). Update this to match your server’s address and port.
  • api_key: Required only when calling the model through OpenAI’s hosted API. If you’re using a local OpenAI-compatible server, an API key isn’t needed — you can simply provide any dummy value.
  • model: The ID of the model to use. You can leave this as is to let it be automatically detected from your OpenAI-compatible server, or explicitly set it to a specific model ID (for example, "gpt-4").

config.yaml

Example: config.yaml
model:
id: "my-wrapper"
model_type_id: "openai"

compute:
instance: g5.xlarge # Run 'clarifai list-instances' to see all options.
# cloud: aws # Cloud provider (aws, gcp, vultr). Auto-detected from instance.
# region: us-east-1 # Cloud region. Auto-detected from instance.

# Uncomment to auto-download model checkpoints:
# checkpoints:
# repo_id: owner/model-name

The config.yaml file defines your OpenAI-compatible model’s configuration in a simplified format:

  • model.id — A unique identifier for your model.
  • model.model_type_id — Set to "openai" for OpenAI-compatible models.
  • compute.instance — The GPU instance type for deployment. Run clarifai list-instances to see all available options.
  • checkpoints — (Optional) Uncomment to auto-download model checkpoints from Hugging Face at runtime.

user_id and app_id are auto-filled from your active context at deploy time. You don’t need to add them manually.

When to use checkpoints: Most OpenAI models are accessed via the API, so you won’t need a checkpoints block. If you are serving a self‑hosted Hugging Face model, you can uncomment the checkpoints section and set the required values.

requirements.txt

Example: requirements.txt
clarifai
openai

The requirements.txt file specifies all the Python dependencies your model needs to run. If these packages are not already installed in your environment, install them by running the following command:

pip install -r requirements.txt

Step 3: Log In to Clarifai

Run the following command to log in to the Clarifai platform, create a configuration context, and establish a connection:

clarifai login

You'll be prompted to provide:

  • User ID – Enter your Clarifai user ID.
  • PAT – Enter your Clarifai PAT. If you've already set the CLARIFAI_PAT environment variable, type ENVVAR to use it automatically.
  • Context name – Assign a custom name to this configuration context, or press Enter to accept the default name, "default".
Example Output
clarifai login
Enter your Clarifai user ID: user-id
> To authenticate, you'll need a Personal Access Token (PAT).
> You can create one from your account settings: https://clarifai.com/alfrick/settings/security

Enter your Personal Access Token (PAT) value (or type "ENVVAR" to use an environment variable): ENVVAR

> Verifying token...
[INFO] 13:59:43.543035 Validating the Context Credentials... | thread=8490328256
[INFO] 13:59:44.940556 ✅ Context is valid | thread=8490328256

> Let's save these credentials to a new context.
> You can have multiple contexts to easily switch between accounts or projects.

Enter a name for this context [default]:
✅ Success! You are now logged in.
Credentials saved to the 'default' context.

💡 To switch contexts later, use `clarifai config use-context <name>`.
[INFO] 13:59:46.641774 Login successful for user 'alfrick' in context 'default' | thread=8490328256

Step 4: Serve the Model Locally

Start the model using clarifai model serve:

clarifai model serve

Note: The older clarifai model local-runner command still works as an alias.

The CLI will guide you through creating any necessary context configurations with default values, ensuring all components (compute clusters, nodepools, deployments) are properly set up.

Example Output
clarifai model local-runner
[INFO] 11:23:11.406057 Detected OpenAI chat completions for Clarifai model streaming - validating stream_options... | thread=8821432512
[ERROR] 11:23:11.406350 Missing configuration to track usage for OpenAI chat completion calls. Go to your model scripts and make sure to set both: 1) stream_options={'include_usage': True}2) set_output_context | thread=8821432512
[INFO] 11:23:11.406705 > Checking local runner requirements... | thread=8821432512
[INFO] 11:23:11.428851 Checking 2 dependencies... | thread=8821432512
[INFO] 11:23:11.429253 ✅ All 2 dependencies are installed! | thread=8821432512
[INFO] 11:23:11.431322 > Verifying local runner setup... | thread=8821432512
[INFO] 11:23:11.431374 Current context: default | thread=8821432512
[INFO] 11:23:11.431406 Current user_id: alfrick | thread=8821432512
[INFO] 11:23:11.431432 Current PAT: d6974**** | thread=8821432512
[INFO] 11:23:11.433893 Current compute_cluster_id: local-runner-compute-cluster | thread=8821432512
[WARNING] 11:23:14.002018 Failed to get compute cluster with ID 'local-runner-compute-cluster':
code: CONN_DOES_NOT_EXIST
description: "Resource does not exist"
details: "ComputeCluster with ID \'local-runner-compute-cluster\' not found. Check your request fields."
req_id: "sdk-python-11.10.2-34a60189eb514b8b9085ba741a13a7ca"
| thread=8821432512
Compute cluster not found. Do you want to create a new compute cluster alfrick/local-runner-compute-cluster? (y/n): y
[INFO] 11:23:26.498698 Compute Cluster with ID 'local-runner-compute-cluster' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.10.2-a3c367c30482463aa0f23089d8971d7a"
| thread=8821432512
[INFO] 11:23:26.508120 Current nodepool_id: local-runner-nodepool | thread=8821432512
[WARNING] 11:23:29.251200 Failed to get nodepool with ID 'local-runner-nodepool':
code: CONN_DOES_NOT_EXIST
description: "Resource does not exist"
details: "Nodepool not found. Check your request fields."
req_id: "sdk-python-11.10.2-c841763ca1d54452984432a6642ec030"
| thread=8821432512
Nodepool not found. Do you want to create a new nodepool alfrick/local-runner-compute-cluster/local-runner-nodepool? (y/n): y
[INFO] 11:23:32.994964 Nodepool with ID 'local-runner-nodepool' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.10.2-0cd9f37bc175431aa57626ac709b6490"
| thread=8821432512
[INFO] 11:23:33.009030 Current app_id: local-runner-app | thread=8821432512
[WARNING] 11:23:33.330525 Failed to get app with ID 'local-runner-app':
code: CONN_DOES_NOT_EXIST
description: "Resource does not exist"
details: "app identified by path /users/alfrick/apps/local-runner-app not found"
req_id: "sdk-python-11.10.2-05940fc72d46478d8309275b3ccf788e"
| thread=8821432512
App not found. Do you want to create a new app alfrick/local-runner-app? (y/n): y
[INFO] 11:23:36.874801 App with ID 'local-runner-app' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.10.2-da9c70e006774b7a9ab5ae47b339cd6d"
| thread=8821432512
[INFO] 11:23:36.887817 Current model_id: local-runner-model | thread=8821432512
[WARNING] 11:23:38.066408 Failed to get model with ID 'local-runner-model':
code: MODEL_DOES_NOT_EXIST
description: "Model does not exist"
details: "Model \'local-runner-model\' does not exist."
req_id: "sdk-python-11.10.2-a2157a2599f747559e1f9fcc1d459247"
| thread=8821432512
Model not found. Do you want to create a new model alfrick/local-runner-app/models/local-runner-model? (y/n): y
[INFO] 11:23:42.481867 Model with ID 'local-runner-model' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.10.2-a2949b4d81304deda6cb897a05b89cfd"
| thread=8821432512
[WARNING] 11:23:44.422210 No model versions found. Creating a new version for local runner. | thread=8821432512
[INFO] 11:23:45.628623 Model Version with ID '36bde5dcf7c24317a08d1366a8cc5757' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.10.2-1cf881046e2047b795533c4f08da5c7e"
| thread=8821432512
[INFO] 11:23:46.723898 Current model version 36bde5dcf7c24317a08d1366a8cc5757 | thread=8821432512
[INFO] 11:23:46.724179 Creating the local runner tying this 'alfrick/local-runner-app/models/local-runner-model' model (version: 36bde5dcf7c24317a08d1366a8cc5757) to the 'alfrick/local-runner-compute-cluster/local-runner-nodepool' nodepool. | thread=8821432512
[INFO] 11:23:48.660432 Runner with ID '805fa3de93d341d7aaac0aed94786236' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.10.2-a613867263b94219ba9fc558f29c1661"
| thread=8821432512
[INFO] 11:23:48.670950 Current runner_id: 805fa3de93d341d7aaac0aed94786236 | thread=8821432512
[WARNING] 11:23:48.931127 Failed to get deployment with ID local-runner-deployment:
code: CONN_DOES_NOT_EXIST
description: "Resource does not exist"
details: "Deployment with ID \'local-runner-deployment\' not found. Check your request fields."
req_id: "sdk-python-11.10.2-28999059cf6e42b4ab868dcca10b4201"
| thread=8821432512
Deployment not found. Do you want to create a new deployment alfrick/local-runner-compute-cluster/local-runner-nodepool/local-runner-deployment? (y/n): y
[INFO] 11:23:53.891169 Deployment with ID 'local-runner-deployment' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.10.2-8a6701fa26e64acc8109e6affef1df28"
| thread=8821432512
[INFO] 11:23:53.902286 Current deployment_id: local-runner-deployment | thread=8821432512
[INFO] 11:23:53.902438 Current model section of config.yaml: {'id': 'my-model', 'user_id': 'alfrick', 'app_id': 'app_id', 'model_type_id': 'any-to-any'} | thread=8821432512
Do you want to backup config.yaml to config.yaml.bk then update the config.yaml with the new model information? (y/n): y
[INFO] 11:23:57.187239 Checking 2 dependencies... | thread=8821432512
[INFO] 11:23:57.188280 ✅ All 2 dependencies are installed! | thread=8821432512
[INFO] 11:23:57.188387 ✅ Starting local runner... | thread=8821432512
[INFO] 11:23:57.188475 No secrets path configured, running without secrets | thread=8821432512
[INFO] 11:23:58.334211 Detected OpenAI chat completions for Clarifai model streaming - validating stream_options... | thread=8821432512
[ERROR] 11:23:58.334788 Missing configuration to track usage for OpenAI chat completion calls. Go to your model scripts and make sure to set both: 1) stream_options={'include_usage': True}2) set_output_context | thread=8821432512
[INFO] 11:23:58.358199 ModelServer initialized successfully | thread=8821432512
[INFO] 11:23:58.385791 ✅ Your model is running locally and is ready for requests from the API...
| thread=8821432512
[INFO] 11:23:58.385872 > Code Snippet: To call your model via the API, use this code snippet:

import os

from openai import OpenAI

client = OpenAI(
base_url="https://api.clarifai.com/v2/ext/openai/v1",
api_key=os.environ['CLARIFAI_PAT'],
)

response = client.chat.completions.create(
model="https://clarifai.com/alfrick/local-runner-app/models/local-runner-model",
messages=[
{"role": "system", "content": "Talk like a pirate."},
{
"role": "user",
"content": "How do I check if a Python object is an instance of a class?",
},
],
temperature=1.0,
stream=False, # stream=True also works, just iterator over the response
)
print(response)
| thread=8821432512
[INFO] 11:23:58.385916 > Playground: To chat with your model, visit: https://clarifai.com/playground?model=local-runner-model__36bde5dcf7c24317a08d1366a8cc5757&user_id=alfrick&app_id=local-runner-app
| thread=8821432512
[INFO] 11:23:58.385946 > API URL: To call your model via the API, use this model URL: https://clarifai.com/alfrick/local-runner-app/models/local-runner-model
| thread=8821432512
[INFO] 11:23:58.385966 Press CTRL+C to stop the runner.
| thread=8821432512
[INFO] 11:23:58.385994 Starting 32 threads... | thread=8821432512

Tip: If your underlying model is running on a specific port (like 8000), ensure your model.py points to that port, and that the Local Runner does not try to bind to the same port.

Step 5: Test Your Runner

Once the local runner starts, it provides a sample client code snippet for testing. You can run this in a separate terminal within the same directory.

Here's an example test snippet:

import os

from openai import OpenAI

client = OpenAI(
base_url="https://api.clarifai.com/v2/ext/openai/v1",
api_key=os.environ['CLARIFAI_PAT'],
)

response = client.chat.completions.create(
model="https://clarifai.com/user-id/local-runner-app/models/local-runner-model",
messages=[
{"role": "system", "content": "Talk like a pirate."},
{
"role": "user",
"content": "How do I check if a Python object is an instance of a class?",
},
],
temperature=1.0,
stream=False, # stream=True also works, just iterator over the response
)
print(response)
Example Output
ChatCompletion(
id='bf90c9f0a20e44d796780d35360d3951',
choices=[
Choice(
finish_reason='stop',
index=0,
logprobs=None,
message=ChatCompletionMessage(
content=(
"Yer lookin' fer a way to check if a Python object be an instance "
"o' a class, eh?\n\n"
"In Python, ye can use the `type()` function or `isinstance()` method "
"to determine if a variable be o' a certain type. Here be some ways "
"to do it:\n\n"
"### Method 1: Using `type()`\n\n"
"```python\n"
"x = \"Hello\"\n"
"y = [1, 2, 3]\n\n"
"if type(x) == str:\n"
" print(\"x is a string\")\n"
"elif type(y) == list:\n"
" print(\"y is a list\")\n"
"else:\n"
" print(\"Unknown type\")\n"
"```\n\n"
"### Method 2: Using `isinstance()`\n\n"
"```python\n"
"class Person:\n"
" def __init__(self, name):\n"
" self.name = name\n\n"
"p = Person(\"Pirate\")\n"
"if isinstance(p, Person):\n"
" print(\"p be an instance o' the Person class\")\n\n"
"x = \"Hello\"\n"
"y = [1, 2, 3]\n"
"```\n\n"
"### Method 3: Constructor Check with Class Definition (Not Recommended)\n\n"
"```python\n"
"class MyShip:\n"
" def __init__(self, speed):\n"
" self.speed = speed\n\n"
"if obj is MyShip(some_speed):\n"
" print(f\"it's a ship made by {some_speed}\")\n"
"else:\n"
" pass # object is not of type MyShip.\n"
"```\n\n"
"Note: Python doesn’t use 'object' in the generic sense; everything is an instance of a class."
),
refusal=None,
role='assistant',
annotations=None,
audio=None,
function_call=None,
tool_calls=None
)
)
],
created=1764578049,
model='llama3.2:latest',
object='chat.completion',
service_tier=None,
system_fingerprint='fp_ollama',
usage=CompletionUsage(
completion_tokens=303,
prompt_tokens=45,
total_tokens=348,
completion_tokens_details=None,
prompt_tokens_details=None
)
)

That’s it!

When you’re done testing, simply stop the terminal running the local development runner and the process hosting your OpenAI-compatible server.