LM Studio
Download and run LM Studio models locally and expose them via a public API
LM Studio is a desktop application that lets you run open-source LLMs locally on your machine. Combined with Clarifai's Local Runners, you can serve LM Studio models from your machine, expose them via a public API, and access them through the Clarifai platform — all while keeping the speed, privacy, and control of local inference.
Important: Clarifai's LM Studio integration currently supports macOS only (Apple devices). For other platforms, consider using Ollama or vLLM instead.
Step 1: Install Prerequisites
Install LM Studio
Go to the LM Studio website and install the desktop application for macOS.
After installing, enable the lms CLI tool so Clarifai can detect your models:
~/.lmstudio/bin/lms bootstrap
Restart your terminal, then verify with lms --version.
Keep LM Studio open and running before starting the local runner — it provides the model runtime that Clarifai connects to.
Install Clarifai
- Bash
pip install --upgrade clarifai
Note: Python 3.11 or 3.12 is required. The
openaipackage is included withclarifai.
Step 2: Log In
- CLI
clarifai login
You'll be prompted for your user ID and PAT. This saves your credentials locally so you don't need to set environment variables manually.
Example Output
clarifai login
Enter your Clarifai user ID: alfrick
> To authenticate, you'll need a Personal Access Token (PAT).
> You can create one from your account settings: https://clarifai.com/alfrick/settings/security
Enter your Personal Access Token (PAT) value (or type "ENVVAR" to use an environment variable): ENVVAR
> Verifying token...
[INFO] 09:38:03.867057 Validating the Context Credentials... | thread=8309383360
[INFO] 09:38:05.176881 ✅ Context is valid | thread=8309383360
> Let's save these credentials to a new context.
> You can have multiple contexts to easily switch between accounts or projects.
Enter a name for this context [default]:
✅ Success! You are now logged in.
Credentials saved to the 'default' context.
💡 To switch contexts later, use `clarifai config use-context <name>`.
[INFO] 09:38:10.706639 Login successful for user 'alfrick' in context 'default' | thread=8309383360
Step 3: Initialize a Model
Scaffold a model project using any model from the LM Studio Model Catalog:
- CLI
clarifai model init --toolkit lmstudio --model-name google/gemma-3-4b
The CLI auto-detects LM Studio models already downloaded on your machine. Change --model-name to any other model from the catalog.
Example Output
clarifai model init --toolkit lmstudio
[INFO] Initializing model with lmstudio toolkit...
[INFO] Detected LM Studio models: google/gemma-3-4b
Model initialized in ./gemma-3-4b
Test locally:
clarifai model serve ./gemma-3-4b
clarifai model serve ./gemma-3-4b --mode env # auto-create venv and install deps
clarifai model serve ./gemma-3-4b --mode container # run inside Docker
This creates a ./gemma-3-4b/ directory:
gemma-3-4b/
├── 1/
│ └── model.py # LM Studio inference logic
├── requirements.txt # Python dependencies
└── config.yaml # Model config (user_id/app_id auto-filled from login)
Note: Some models are very large and may require significant memory. Check your machine's capacity before initializing.
model.py
import json
import os
import socket
import subprocess
import sys
import time
from typing import Iterator, List
from openai import OpenAI
from clarifai.runners.models.openai_class import OpenAIModelClass
from clarifai.runners.utils.data_types import Image
from clarifai.runners.utils.data_utils import Param
from clarifai.runners.utils.openai_convertor import build_openai_messages
from clarifai.utils.logging import logger
VERBOSE_LMSTUDIO = True
LMS_MODEL_NAME = "LiquidAI/LFM2-1.2B"
LMS_PORT = 11434
LMS_CONTEXT_LENGTH = 4096
def _stream_command(cmd, verbose=True):
env = os.environ.copy()
env["PYTHONUNBUFFERED"] = "1"
process = subprocess.Popen(
cmd,
shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
text=True,
bufsize=1,
env=env,
)
if verbose and process.stdout:
for line in iter(process.stdout.readline, ""):
if line:
logger.info(f"[lms] {line.rstrip()}")
ret = process.wait()
if ret != 0:
raise RuntimeError(f"Command failed ({ret}): {cmd}")
return True
def _wait_for_port(port, timeout=30.0):
start = time.time()
while time.time() - start < timeout:
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
sock.settimeout(1)
try:
if sock.connect_ex(("127.0.0.1", port)) == 0:
return True
except Exception:
pass
time.sleep(0.5)
raise RuntimeError(f"LM Studio server did not start on port {port} within {timeout}s")
def run_lms_server(model_name='LiquidAI/LFM2-1.2B', port=11434, context_length=4096):
"""Download model, load it, and start the LM Studio server."""
try:
_stream_command(
f"lms get https://huggingface.co/{model_name} --verbose",
verbose=VERBOSE_LMSTUDIO,
)
_stream_command("lms unload --all", verbose=VERBOSE_LMSTUDIO)
_stream_command(
f"lms load {model_name} --verbose --context-length {context_length}",
verbose=VERBOSE_LMSTUDIO,
)
subprocess.Popen(
f"lms server start --port {port}",
shell=True,
stdout=None if not VERBOSE_LMSTUDIO else sys.stdout,
stderr=None if not VERBOSE_LMSTUDIO else sys.stderr,
)
_wait_for_port(port)
logger.info(f"LM Studio server started on port {port}")
except Exception as e:
raise RuntimeError(f"Failed to start LM Studio server: {e}")
def has_image_content(image: Image) -> bool:
return bool(getattr(image, 'url', None) or getattr(image, 'bytes', None))
class LMStudioModel(OpenAIModelClass):
client = True
model = True
def load_model(self):
self.model = LMS_MODEL_NAME
self.port = LMS_PORT
run_lms_server(
model_name=self.model,
port=self.port,
context_length=LMS_CONTEXT_LENGTH,
)
self.client = OpenAI(api_key="notset", base_url=f"http://localhost:{self.port}/v1")
@OpenAIModelClass.method
def predict(
self,
prompt: str = "",
image: Image = None,
images: List[Image] = None,
chat_history: List[dict] = None,
tools: List[dict] = None,
tool_choice: str = None,
max_tokens: int = Param(
default=2048,
description="The maximum number of tokens to generate.",
),
temperature: float = Param(
default=0.7,
description="Sampling temperature (higher = more random).",
),
top_p: float = Param(
default=0.95,
description="Nucleus sampling threshold.",
),
) -> str:
"""Return a single completion."""
if tools is not None and tool_choice is None:
tool_choice = "auto"
img_content = image if has_image_content(image) else None
messages = build_openai_messages(
prompt=prompt, image=img_content, images=images, messages=chat_history
)
response = self.client.chat.completions.create(
model=self.model,
messages=messages,
tools=tools,
tool_choice=tool_choice,
max_completion_tokens=max_tokens,
temperature=temperature,
top_p=top_p,
)
if response.usage is not None:
self.set_output_context(
prompt_tokens=response.usage.prompt_tokens,
completion_tokens=response.usage.completion_tokens,
)
if response.choices[0] and response.choices[0].message.tool_calls:
tool_calls = response.choices[0].message.tool_calls
return json.dumps([tc.to_dict() for tc in tool_calls], indent=2)
return response.choices[0].message.content
@OpenAIModelClass.method
def generate(
self,
prompt: str = "",
image: Image = None,
images: List[Image] = None,
chat_history: List[dict] = None,
tools: List[dict] = None,
tool_choice: str = None,
max_tokens: int = Param(
default=2048,
description="The maximum number of tokens to generate.",
),
temperature: float = Param(
default=0.7,
description="Sampling temperature (higher = more random).",
),
top_p: float = Param(
default=0.95,
description="Nucleus sampling threshold.",
),
) -> Iterator[str]:
"""Stream a completion response."""
if tools is not None and tool_choice is None:
tool_choice = "auto"
img_content = image if has_image_content(image) else None
messages = build_openai_messages(
prompt=prompt, image=img_content, images=images, messages=chat_history
)
for chunk in self.client.chat.completions.create(
model=self.model,
messages=messages,
tools=tools,
tool_choice=tool_choice,
max_completion_tokens=max_tokens,
temperature=temperature,
top_p=top_p,
stream=True,
stream_options={"include_usage": True},
):
if chunk.usage is not None:
if chunk.usage.prompt_tokens or chunk.usage.completion_tokens:
self.set_output_context(
prompt_tokens=chunk.usage.prompt_tokens,
completion_tokens=chunk.usage.completion_tokens,
)
if chunk.choices:
if chunk.choices[0].delta.tool_calls:
tool_calls_json = [tc.to_dict() for tc in chunk.choices[0].delta.tool_calls]
yield json.dumps(tool_calls_json, indent=2)
else:
text = chunk.choices[0].delta.content if chunk.choices[0].delta.content else ''
yield text
config.yaml
model:
id: "my-model"
build_info:
python_version: "3.12"
toolkit:
provider: lmstudio
requirements.txt
clarifai
openai
Step 4: Serve Locally
Start the model as a local runner:
- CLI
clarifai model serve ./gemma-3-4b
Note: Make sure LM Studio is open and running before starting the runner. Add
-vfor verbose logs.
Example Output
clarifai model local-runner
[INFO] 09:40:36.097539 Detected OpenAI chat completions for Clarifai model streaming - validating stream_options... | thread=8309383360
[INFO] 09:40:36.098189 > Checking local runner requirements... | thread=8309383360
[INFO] 09:40:36.118322 Checking 2 dependencies... | thread=8309383360
[INFO] 09:40:36.118807 ✅ All 2 dependencies are installed! | thread=8309383360
[INFO] 09:40:36.119033 > Verifying local runner setup... | thread=8309383360
[INFO] 09:40:36.119083 Current context: default | thread=8309383360
[INFO] 09:40:36.119120 Current user_id: alfrick | thread=8309383360
[INFO] 09:40:36.119150 Current PAT: d6570**** | thread=8309383360
[INFO] 09:40:36.121055 Current compute_cluster_id: local-runner-compute-cluster | thread=8309383360
[WARNING] 09:40:37.622490 Failed to get compute cluster with ID 'local-runner-compute-cluster':
code: CONN_DOES_NOT_EXIST
description: "Resource does not exist"
details: "ComputeCluster with ID \'local-runner-compute-cluster\' not found. Check your request fields."
req_id: "sdk-python-11.8.2-c324cbe5deb248e19d5d0ed1e32e49d0"
| thread=8309383360
Compute cluster not found. Do you want to create a new compute cluster alfrick/local-runner-compute-cluster? (y/n): y
[INFO] 09:40:44.198312 Compute Cluster with ID 'local-runner-compute-cluster' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.8.2-e5b312b4a46f4e2984efc65abb5124c5"
| thread=8309383360
[INFO] 09:40:44.203633 Current nodepool_id: local-runner-nodepool | thread=8309383360
[WARNING] 09:40:46.398631 Failed to get nodepool with ID 'local-runner-nodepool':
code: CONN_DOES_NOT_EXIST
description: "Resource does not exist"
details: "Nodepool not found. Check your request fields."
req_id: "sdk-python-11.8.2-1062d71d21574bce99bd4472a9fdc6ef"
| thread=8309383360
Nodepool not found. Do you want to create a new nodepool alfrick/local-runner-compute-cluster/local-runner-nodepool? (y/n): y
[INFO] 09:40:52.285792 Nodepool with ID 'local-runner-nodepool' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.8.2-66d76251237c4be38764837e639c6800"
| thread=8309383360
[INFO] 09:40:52.292983 Current app_id: local-runner-app | thread=8309383360
[WARNING] 09:40:52.574021 Failed to get app with ID 'local-runner-app':
code: CONN_DOES_NOT_EXIST
description: "Resource does not exist"
details: "app identified by path /users/alfrick/apps/local-runner-app not found"
req_id: "sdk-python-11.8.2-29b94532bf624596abbbaea66be198e2"
| thread=8309383360
App not found. Do you want to create a new app alfrick/local-runner-app? (y/n): y
[INFO] 09:40:56.302447 App with ID 'local-runner-app' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.8.2-b5066f7c64274944ba405ba01da11c1c"
| thread=8309383360
[INFO] 09:40:56.306934 Current model_id: local-runner-model | thread=8309383360
[WARNING] 09:40:58.007139 Failed to get model with ID 'local-runner-model':
code: MODEL_DOES_NOT_EXIST
description: "Model does not exist"
details: "Model \'local-runner-model\' does not exist."
req_id: "sdk-python-11.8.2-8b2717eb04624aca8bf119e03b94b5b4"
| thread=8309383360
Model not found. Do you want to create a new model alfrick/local-runner-app/models/local-runner-model? (y/n): y
[INFO] 09:41:14.336510 Model with ID 'local-runner-model' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.8.2-f36c2684e1bc4d8f99e777d42e5c53f8"
| thread=8309383360
[WARNING] 09:41:17.182009 No model versions found. Creating a new version for local runner. | thread=8309383360
[INFO] 09:41:17.510454 Model Version with ID 'fa82276f4cfa44c08745b028471bbfa5' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.8.2-0121fe726015400c86e4bd3959729787"
| thread=8309383360
[INFO] 09:41:17.517728 Current model version fa82276f4cfa44c08745b028471bbfa5 | thread=8309383360
[INFO] 09:41:17.517802 Creating the local runner tying this 'alfrick/local-runner-app/models/local-runner-model' model (version: fa82276f4cfa44c08745b028471bbfa5) to the 'alfrick/local-runner-compute-cluster/local-runner-nodepool' nodepool. | thread=8309383360
[INFO] 09:41:18.591818 Runner with ID '649b39c737d84dd8a5e3d5af0b19c207' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.8.2-ced2523458a941519a709e6af082832a"
| thread=8309383360
[INFO] 09:41:18.598056 Current runner_id: 649b39c737d84dd8a5e3d5af0b19c207 | thread=8309383360
[WARNING] 09:41:19.150091 Failed to get deployment with ID local-runner-deployment:
code: DEPLOYMENT_INVALID_REQUEST
description: "Invalid deployment request"
details: "Some of the deployment ids provided (local-runner-deployment) do not exist"
req_id: "sdk-python-11.8.2-9af7aa96a9a843e68f8f3ef898bf61c1"
| thread=8309383360
Deployment not found. Do you want to create a new deployment alfrick/local-runner-compute-cluster/local-runner-nodepool/local-runner-deployment? (y/n): y
[INFO] 09:41:25.833184 Deployment with ID 'local-runner-deployment' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.8.2-38e4b0bd886e4979bc9b7324361e2c56"
| thread=8309383360
[INFO] 09:41:25.839987 Current deployment_id: local-runner-deployment | thread=8309383360
[INFO] 09:41:25.841181 Current model section of config.yaml: {'app_id': 'local-runner-app', 'id': 'local-env-model', 'model_type_id': 'text-to-text', 'user_id': 'alfrick'} | thread=8309383360
Do you want to backup config.yaml to config.yaml.bk then update the config.yaml with the new model information? (y/n): y
[INFO] 09:41:29.312446 Checking 2 dependencies... | thread=8309383360
[INFO] 09:41:29.313228 ✅ All 2 dependencies are installed! | thread=8309383360
[INFO] 09:41:29.313325 ✅ Starting local runner... | thread=8309383360
[INFO] 09:41:29.313404 No secrets path configured, running without secrets | thread=8309383360
[INFO] 09:41:30.647566 Detected OpenAI chat completions for Clarifai model streaming - validating stream_options... | thread=8309383360
[INFO] 09:41:34.359410 Detected OpenAI chat completions for Clarifai model streaming - validating stream_options... | thread=8309383360
[INFO] 09:41:34.359915 Running: lms get https://huggingface.co/LiquidAI/LFM2-1.2B --verbose | thread=8309383360
[INFO] 09:41:34.625973 [lms logs] D Found local API server at ws://127.0.0.1:41343 | thread=8309383360
[INFO] 09:41:34.633082 [lms logs] I Searching for models with the term https://huggingface.co/LiquidAI/LFM2-1.2B | thread=8309383360
[INFO] 09:41:34.633891 [lms logs] D Searching for models with options { | thread=8309383360
[INFO] 09:41:34.633919 [lms logs] searchTerm: 'https://huggingface.co/LiquidAI/LFM2-1.2B', | thread=8309383360
[INFO] 09:41:34.633937 [lms logs] compatibilityTypes: undefined, | thread=8309383360
[INFO] 09:41:34.633950 [lms logs] limit: undefined | thread=8309383360
[INFO] 09:41:34.633963 [lms logs] } | thread=8309383360
[INFO] 09:41:40.602478 [lms logs] D Found 10 result(s) | thread=8309383360
[INFO] 09:41:40.602769 [lms logs] D Prompting user to choose a model | thread=8309383360
[INFO] 09:41:40.602822 [lms logs] I No exact match found. Please choose a model from the list below. | thread=8309383360
[INFO] 09:41:40.602867 [lms logs] | thread=8309383360
[INFO] 09:41:40.603408 [lms logs] ! Use the arrow keys to navigate, type to filter, and press enter to select. | thread=8309383360
[INFO] 09:41:40.603520 [lms logs] | thread=8309383360
[INFO] 09:41:40.619671 [lms logs] ? Select a model to download Type to filter... | thread=8309383360
[INFO] 09:41:40.619819 [lms logs] ❯ LiquidAI/LFM2-1.2B-GGUF | thread=8309383360
[INFO] 09:41:40.619874 [lms logs] LiquidAI/LFM2-1.2B-Tool-GGUF | thread=8309383360
[INFO] 09:41:40.619902 [lms logs] LiquidAI/LFM2-1.2B-Extract-GGUF | thread=8309383360
[INFO] 09:41:40.619946 [lms logs] LiquidAI/LFM2-1.2B-RAG-GGUF | thread=8309383360
[INFO] 09:41:40.619969 [lms logs] DevQuasar/LiquidAI.LFM2-1.2B-GGUF | thread=8309383360
[INFO] 09:41:40.619992 [lms logs] bartowski/LiquidAI_LFM2-1.2B-Extract-GGUF | thread=8309383360
[INFO] 09:41:40.620018 [lms logs] bartowski/LiquidAI_LFM2-1.2B-RAG-GGUF | thread=8309383360
[INFO] 09:41:40.620044 [lms logs] bartowski/LiquidAI_LFM2-1.2B-Tool-GGUF | thread=8309383360
[INFO] 09:41:40.620066 [lms logs] DevQuasar/LiquidAI.LFM2-1.2B-RAG-GGUF | thread=8309383360
When ready, the CLI prints:
- A model URL for API calls
- A Playground link for browser-based testing
- A sample code snippet
Press Ctrl+C to stop the runner.
Step 5: Run Inference
While the local runner is active, test it using the OpenAI-compatible client:
- Python
import os
from openai import OpenAI
# Initialize the OpenAI client, pointing to Clarifai's API
client = OpenAI(
base_url="https://api.clarifai.com/v2/ext/openai/v1", # Clarifai's OpenAI-compatible API endpoint
api_key=os.environ["CLARIFAI_PAT"] # Ensure CLARIFAI_PAT is set as an environment variable
)
# Make a chat completion request to a Clarifai-hosted model
response = client.chat.completions.create(
model="https://clarifai.com/<user-id>/local-runner-app/models/local-runner-model",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the future of AI?"}
],
)
# Print the model's response
print(response.choices[0].message.content)
Or use the Clarifai CLI:
clarifai model predict https://clarifai.com/<user-id>/local-runner-app/models/local-runner-model "Explain AI in one sentence"
You can also open the Runners dashboard, find your runner, and select Open in Playground from the three-dot menu.
When you're done, close the terminal running the local runner to shut it down.