LM Studio

Download and run LM Studio models locally and expose them via a public API

LM Studio is a desktop application that lets you run open-source LLMs locally on your machine. Combined with Clarifai's Local Runners, you can serve LM Studio models from your machine, expose them via a public API, and access them through the Clarifai platform — all while keeping the speed, privacy, and control of local inference.

Important: Clarifai's LM Studio integration currently supports macOS only (Apple devices). For other platforms, consider using Ollama or vLLM instead.

Step 1: Install Prerequisites

Install LM Studio

Go to the LM Studio website and install the desktop application for macOS.

After installing, enable the lms CLI tool so Clarifai can detect your models:

~/.lmstudio/bin/lms bootstrap

Restart your terminal, then verify with lms --version.

Keep LM Studio open and running before starting the local runner — it provides the model runtime that Clarifai connects to.

Install Clarifai

Bash

pip install --upgrade clarifai

Note: Python 3.11 or 3.12 is required. The openai package is included with clarifai.

Step 2: Log In

clarifai login

You'll be prompted for your user ID and PAT. This saves your credentials locally so you don't need to set environment variables manually.

Example Output

clarifai login
Enter your Clarifai user ID: alfrick
> To authenticate, you'll need a Personal Access Token (PAT).
> You can create one from your account settings: https://clarifai.com/alfrick/settings/security

Enter your Personal Access Token (PAT) value (or type "ENVVAR" to use an environment variable): ENVVAR

> Verifying token...
[INFO] 09:38:03.867057 Validating the Context Credentials... |  thread=8309383360 
[INFO] 09:38:05.176881 ✅ Context is valid |  thread=8309383360 

> Let's save these credentials to a new context.
> You can have multiple contexts to easily switch between accounts or projects.

Enter a name for this context [default]: 
✅ Success! You are now logged in.
Credentials saved to the 'default' context.

💡 To switch contexts later, use `clarifai config use-context <name>`.
[INFO] 09:38:10.706639 Login successful for user 'alfrick' in context 'default' |  thread=8309383360 

Step 3: Initialize a Model

Scaffold a model project using any model from the LM Studio Model Catalog:

clarifai model init --toolkit lmstudio --model-name google/gemma-3-4b

The CLI auto-detects LM Studio models already downloaded on your machine. Change --model-name to any other model from the catalog.

Example Output

clarifai model init --toolkit lmstudio
[INFO] Initializing model with lmstudio toolkit...
[INFO] Detected LM Studio models: google/gemma-3-4b

  Model initialized in ./gemma-3-4b

  Test locally:
    clarifai model serve ./gemma-3-4b
    clarifai model serve ./gemma-3-4b --mode env       # auto-create venv and install deps
    clarifai model serve ./gemma-3-4b --mode container # run inside Docker

This creates a ./gemma-3-4b/ directory:

gemma-3-4b/
├── 1/
│   └── model.py       # LM Studio inference logic
├── requirements.txt   # Python dependencies
└── config.yaml        # Model config (user_id/app_id auto-filled from login)

Note: Some models are very large and may require significant memory. Check your machine's capacity before initializing.

model.py

import json
import os
import socket
import subprocess
import sys
import time
from typing import Iterator, List

from openai import OpenAI

from clarifai.runners.models.openai_class import OpenAIModelClass
from clarifai.runners.utils.data_types import Image
from clarifai.runners.utils.data_utils import Param
from clarifai.runners.utils.openai_convertor import build_openai_messages
from clarifai.utils.logging import logger

VERBOSE_LMSTUDIO = True
LMS_MODEL_NAME = "LiquidAI/LFM2-1.2B"
LMS_PORT = 11434
LMS_CONTEXT_LENGTH = 4096


def _stream_command(cmd, verbose=True):
    env = os.environ.copy()
    env["PYTHONUNBUFFERED"] = "1"
    process = subprocess.Popen(
        cmd,
        shell=True,
        stdout=subprocess.PIPE,
        stderr=subprocess.STDOUT,
        text=True,
        bufsize=1,
        env=env,
    )
    if verbose and process.stdout:
        for line in iter(process.stdout.readline, ""):
            if line:
                logger.info(f"[lms] {line.rstrip()}")
    ret = process.wait()
    if ret != 0:
        raise RuntimeError(f"Command failed ({ret}): {cmd}")
    return True


def _wait_for_port(port, timeout=30.0):
    start = time.time()
    while time.time() - start < timeout:
        with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
            sock.settimeout(1)
            try:
                if sock.connect_ex(("127.0.0.1", port)) == 0:
                    return True
            except Exception:
                pass
        time.sleep(0.5)
    raise RuntimeError(f"LM Studio server did not start on port {port} within {timeout}s")


def run_lms_server(model_name='LiquidAI/LFM2-1.2B', port=11434, context_length=4096):
    """Download model, load it, and start the LM Studio server."""
    try:
        _stream_command(
            f"lms get https://huggingface.co/{model_name} --verbose",
            verbose=VERBOSE_LMSTUDIO,
        )
        _stream_command("lms unload --all", verbose=VERBOSE_LMSTUDIO)
        _stream_command(
            f"lms load {model_name} --verbose --context-length {context_length}",
            verbose=VERBOSE_LMSTUDIO,
        )
        subprocess.Popen(
            f"lms server start --port {port}",
            shell=True,
            stdout=None if not VERBOSE_LMSTUDIO else sys.stdout,
            stderr=None if not VERBOSE_LMSTUDIO else sys.stderr,
        )
        _wait_for_port(port)
        logger.info(f"LM Studio server started on port {port}")
    except Exception as e:
        raise RuntimeError(f"Failed to start LM Studio server: {e}")


def has_image_content(image: Image) -> bool:
    return bool(getattr(image, 'url', None) or getattr(image, 'bytes', None))


class LMStudioModel(OpenAIModelClass):
    client = True
    model = True

    def load_model(self):
        self.model = LMS_MODEL_NAME
        self.port = LMS_PORT
        run_lms_server(
            model_name=self.model,
            port=self.port,
            context_length=LMS_CONTEXT_LENGTH,
        )
        self.client = OpenAI(api_key="notset", base_url=f"http://localhost:{self.port}/v1")

    @OpenAIModelClass.method
    def predict(
        self,
        prompt: str = "",
        image: Image = None,
        images: List[Image] = None,
        chat_history: List[dict] = None,
        tools: List[dict] = None,
        tool_choice: str = None,
        max_tokens: int = Param(
            default=2048,
            description="The maximum number of tokens to generate.",
        ),
        temperature: float = Param(
            default=0.7,
            description="Sampling temperature (higher = more random).",
        ),
        top_p: float = Param(
            default=0.95,
            description="Nucleus sampling threshold.",
        ),
    ) -> str:
        """Return a single completion."""
        if tools is not None and tool_choice is None:
            tool_choice = "auto"

        img_content = image if has_image_content(image) else None
        messages = build_openai_messages(
            prompt=prompt, image=img_content, images=images, messages=chat_history
        )
        response = self.client.chat.completions.create(
            model=self.model,
            messages=messages,
            tools=tools,
            tool_choice=tool_choice,
            max_completion_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
        )

        if response.usage is not None:
            self.set_output_context(
                prompt_tokens=response.usage.prompt_tokens,
                completion_tokens=response.usage.completion_tokens,
            )

        if response.choices[0] and response.choices[0].message.tool_calls:
            tool_calls = response.choices[0].message.tool_calls
            return json.dumps([tc.to_dict() for tc in tool_calls], indent=2)
        return response.choices[0].message.content

    @OpenAIModelClass.method
    def generate(
        self,
        prompt: str = "",
        image: Image = None,
        images: List[Image] = None,
        chat_history: List[dict] = None,
        tools: List[dict] = None,
        tool_choice: str = None,
        max_tokens: int = Param(
            default=2048,
            description="The maximum number of tokens to generate.",
        ),
        temperature: float = Param(
            default=0.7,
            description="Sampling temperature (higher = more random).",
        ),
        top_p: float = Param(
            default=0.95,
            description="Nucleus sampling threshold.",
        ),
    ) -> Iterator[str]:
        """Stream a completion response."""
        if tools is not None and tool_choice is None:
            tool_choice = "auto"

        img_content = image if has_image_content(image) else None
        messages = build_openai_messages(
            prompt=prompt, image=img_content, images=images, messages=chat_history
        )
        for chunk in self.client.chat.completions.create(
            model=self.model,
            messages=messages,
            tools=tools,
            tool_choice=tool_choice,
            max_completion_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            stream=True,
            stream_options={"include_usage": True},
        ):
            if chunk.usage is not None:
                if chunk.usage.prompt_tokens or chunk.usage.completion_tokens:
                    self.set_output_context(
                        prompt_tokens=chunk.usage.prompt_tokens,
                        completion_tokens=chunk.usage.completion_tokens,
                    )
            if chunk.choices:
                if chunk.choices[0].delta.tool_calls:
                    tool_calls_json = [tc.to_dict() for tc in chunk.choices[0].delta.tool_calls]
                    yield json.dumps(tool_calls_json, indent=2)
                else:
                    text = chunk.choices[0].delta.content if chunk.choices[0].delta.content else ''
                    yield text

config.yaml

model:
  id: "my-model"

build_info:
  python_version: "3.12"

toolkit:
  provider: lmstudio

requirements.txt

clarifai
openai

Step 4: Serve Locally

Start the model as a local runner:

clarifai model serve ./gemma-3-4b

Note: Make sure LM Studio is open and running before starting the runner. Add -v for verbose logs.

Example Output

clarifai model local-runner
[INFO] 09:40:36.097539 Detected OpenAI chat completions for Clarifai model streaming - validating stream_options... |  thread=8309383360 
[INFO] 09:40:36.098189 > Checking local runner requirements... |  thread=8309383360 
[INFO] 09:40:36.118322 Checking 2 dependencies... |  thread=8309383360 
[INFO] 09:40:36.118807 ✅ All 2 dependencies are installed! |  thread=8309383360 
[INFO] 09:40:36.119033 > Verifying local runner setup... |  thread=8309383360 
[INFO] 09:40:36.119083 Current context: default |  thread=8309383360 
[INFO] 09:40:36.119120 Current user_id: alfrick |  thread=8309383360 
[INFO] 09:40:36.119150 Current PAT: d6570**** |  thread=8309383360 
[INFO] 09:40:36.121055 Current compute_cluster_id: local-runner-compute-cluster |  thread=8309383360 
[WARNING] 09:40:37.622490 Failed to get compute cluster with ID 'local-runner-compute-cluster':
code: CONN_DOES_NOT_EXIST
description: "Resource does not exist"
details: "ComputeCluster with ID \'local-runner-compute-cluster\' not found. Check your request fields."
req_id: "sdk-python-11.8.2-c324cbe5deb248e19d5d0ed1e32e49d0"
 |  thread=8309383360 
Compute cluster not found. Do you want to create a new compute cluster alfrick/local-runner-compute-cluster? (y/n): y
[INFO] 09:40:44.198312 Compute Cluster with ID 'local-runner-compute-cluster' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.8.2-e5b312b4a46f4e2984efc65abb5124c5"
 |  thread=8309383360 
[INFO] 09:40:44.203633 Current nodepool_id: local-runner-nodepool |  thread=8309383360 
[WARNING] 09:40:46.398631 Failed to get nodepool with ID 'local-runner-nodepool':
code: CONN_DOES_NOT_EXIST
description: "Resource does not exist"
details: "Nodepool not found. Check your request fields."
req_id: "sdk-python-11.8.2-1062d71d21574bce99bd4472a9fdc6ef"
 |  thread=8309383360 
Nodepool not found. Do you want to create a new nodepool alfrick/local-runner-compute-cluster/local-runner-nodepool? (y/n): y
[INFO] 09:40:52.285792 Nodepool with ID 'local-runner-nodepool' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.8.2-66d76251237c4be38764837e639c6800"
 |  thread=8309383360 
[INFO] 09:40:52.292983 Current app_id: local-runner-app |  thread=8309383360 
[WARNING] 09:40:52.574021 Failed to get app with ID 'local-runner-app':
code: CONN_DOES_NOT_EXIST
description: "Resource does not exist"
details: "app identified by path /users/alfrick/apps/local-runner-app not found"
req_id: "sdk-python-11.8.2-29b94532bf624596abbbaea66be198e2"
 |  thread=8309383360 
App not found. Do you want to create a new app alfrick/local-runner-app? (y/n): y
[INFO] 09:40:56.302447 App with ID 'local-runner-app' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.8.2-b5066f7c64274944ba405ba01da11c1c"
 |  thread=8309383360 
[INFO] 09:40:56.306934 Current model_id: local-runner-model |  thread=8309383360 
[WARNING] 09:40:58.007139 Failed to get model with ID 'local-runner-model':
code: MODEL_DOES_NOT_EXIST
description: "Model does not exist"
details: "Model \'local-runner-model\' does not exist."
req_id: "sdk-python-11.8.2-8b2717eb04624aca8bf119e03b94b5b4"
 |  thread=8309383360 
Model not found. Do you want to create a new model alfrick/local-runner-app/models/local-runner-model? (y/n): y
[INFO] 09:41:14.336510 Model with ID 'local-runner-model' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.8.2-f36c2684e1bc4d8f99e777d42e5c53f8"
 |  thread=8309383360 
[WARNING] 09:41:17.182009 No model versions found. Creating a new version for local runner. |  thread=8309383360 
[INFO] 09:41:17.510454 Model Version with ID 'fa82276f4cfa44c08745b028471bbfa5' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.8.2-0121fe726015400c86e4bd3959729787"
 |  thread=8309383360 
[INFO] 09:41:17.517728 Current model version fa82276f4cfa44c08745b028471bbfa5 |  thread=8309383360 
[INFO] 09:41:17.517802 Creating the local runner tying this 'alfrick/local-runner-app/models/local-runner-model' model (version: fa82276f4cfa44c08745b028471bbfa5) to the 'alfrick/local-runner-compute-cluster/local-runner-nodepool' nodepool. |  thread=8309383360 
[INFO] 09:41:18.591818 Runner with ID '649b39c737d84dd8a5e3d5af0b19c207' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.8.2-ced2523458a941519a709e6af082832a"
 |  thread=8309383360 
[INFO] 09:41:18.598056 Current runner_id: 649b39c737d84dd8a5e3d5af0b19c207 |  thread=8309383360 
[WARNING] 09:41:19.150091 Failed to get deployment with ID local-runner-deployment:
code: DEPLOYMENT_INVALID_REQUEST
description: "Invalid deployment request"
details: "Some of the deployment ids provided (local-runner-deployment) do not exist"
req_id: "sdk-python-11.8.2-9af7aa96a9a843e68f8f3ef898bf61c1"
 |  thread=8309383360 
Deployment not found. Do you want to create a new deployment alfrick/local-runner-compute-cluster/local-runner-nodepool/local-runner-deployment? (y/n): y
[INFO] 09:41:25.833184 Deployment with ID 'local-runner-deployment' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.8.2-38e4b0bd886e4979bc9b7324361e2c56"
 |  thread=8309383360 
[INFO] 09:41:25.839987 Current deployment_id: local-runner-deployment |  thread=8309383360 
[INFO] 09:41:25.841181 Current model section of config.yaml: {'app_id': 'local-runner-app', 'id': 'local-env-model', 'model_type_id': 'text-to-text', 'user_id': 'alfrick'} |  thread=8309383360 
Do you want to backup config.yaml to config.yaml.bk then update the config.yaml with the new model information? (y/n): y
[INFO] 09:41:29.312446 Checking 2 dependencies... |  thread=8309383360 
[INFO] 09:41:29.313228 ✅ All 2 dependencies are installed! |  thread=8309383360 
[INFO] 09:41:29.313325 ✅ Starting local runner... |  thread=8309383360 
[INFO] 09:41:29.313404 No secrets path configured, running without secrets |  thread=8309383360 
[INFO] 09:41:30.647566 Detected OpenAI chat completions for Clarifai model streaming - validating stream_options... |  thread=8309383360 
[INFO] 09:41:34.359410 Detected OpenAI chat completions for Clarifai model streaming - validating stream_options... |  thread=8309383360 
[INFO] 09:41:34.359915 Running: lms get https://huggingface.co/LiquidAI/LFM2-1.2B --verbose |  thread=8309383360 
[INFO] 09:41:34.625973 [lms logs] D Found local API server at ws://127.0.0.1:41343 |  thread=8309383360 
[INFO] 09:41:34.633082 [lms logs] I Searching for models with the term https://huggingface.co/LiquidAI/LFM2-1.2B |  thread=8309383360 
[INFO] 09:41:34.633891 [lms logs] D Searching for models with options { |  thread=8309383360 
[INFO] 09:41:34.633919 [lms logs]   searchTerm: 'https://huggingface.co/LiquidAI/LFM2-1.2B', |  thread=8309383360 
[INFO] 09:41:34.633937 [lms logs]   compatibilityTypes: undefined, |  thread=8309383360 
[INFO] 09:41:34.633950 [lms logs]   limit: undefined |  thread=8309383360 
[INFO] 09:41:34.633963 [lms logs] } |  thread=8309383360 
[INFO] 09:41:40.602478 [lms logs] D Found 10 result(s) |  thread=8309383360 
[INFO] 09:41:40.602769 [lms logs] D Prompting user to choose a model |  thread=8309383360 
[INFO] 09:41:40.602822 [lms logs] I No exact match found. Please choose a model from the list below. |  thread=8309383360 
[INFO] 09:41:40.602867 [lms logs]  |  thread=8309383360 
[INFO] 09:41:40.603408 [lms logs] ! Use the arrow keys to navigate, type to filter, and press enter to select. |  thread=8309383360 
[INFO] 09:41:40.603520 [lms logs]  |  thread=8309383360 
[INFO] 09:41:40.619671 [lms logs] ? Select a model to download Type to filter... |  thread=8309383360 
[INFO] 09:41:40.619819 [lms logs] ❯  LiquidAI/LFM2-1.2B-GGUF |  thread=8309383360 
[INFO] 09:41:40.619874 [lms logs]    LiquidAI/LFM2-1.2B-Tool-GGUF |  thread=8309383360 
[INFO] 09:41:40.619902 [lms logs]    LiquidAI/LFM2-1.2B-Extract-GGUF |  thread=8309383360 
[INFO] 09:41:40.619946 [lms logs]    LiquidAI/LFM2-1.2B-RAG-GGUF |  thread=8309383360 
[INFO] 09:41:40.619969 [lms logs]    DevQuasar/LiquidAI.LFM2-1.2B-GGUF |  thread=8309383360 
[INFO] 09:41:40.619992 [lms logs]    bartowski/LiquidAI_LFM2-1.2B-Extract-GGUF |  thread=8309383360 
[INFO] 09:41:40.620018 [lms logs]    bartowski/LiquidAI_LFM2-1.2B-RAG-GGUF |  thread=8309383360 
[INFO] 09:41:40.620044 [lms logs]    bartowski/LiquidAI_LFM2-1.2B-Tool-GGUF |  thread=8309383360 
[INFO] 09:41:40.620066 [lms logs]    DevQuasar/LiquidAI.LFM2-1.2B-RAG-GGUF |  thread=8309383360 

When ready, the CLI prints:

A model URL for API calls
A Playground link for browser-based testing
A sample code snippet

Press Ctrl+C to stop the runner.

Step 5: Run Inference

While the local runner is active, test it using the OpenAI-compatible client:

Python

import os
from openai import OpenAI

# Initialize the OpenAI client, pointing to Clarifai's API
client = OpenAI(     
    base_url="https://api.clarifai.com/v2/ext/openai/v1",  # Clarifai's OpenAI-compatible API endpoint
    api_key=os.environ["CLARIFAI_PAT"]  # Ensure CLARIFAI_PAT is set as an environment variable
)

# Make a chat completion request to a Clarifai-hosted model
response = client.chat.completions.create(    
    model="https://clarifai.com/<user-id>/local-runner-app/models/local-runner-model",    
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the future of AI?"}
    ],  
)

# Print the model's response
print(response.choices[0].message.content)

Or use the Clarifai CLI:

clarifai model predict https://clarifai.com/<user-id>/local-runner-app/models/local-runner-model "Explain AI in one sentence"

You can also open the Runners dashboard, find your runner, and select Open in Playground from the three-dot menu.

When you're done, close the terminal running the local runner to shut it down.

Step 1: Install Prerequisites​

Install LM Studio​

Install Clarifai​

Step 2: Log In​

Step 3: Initialize a Model​

Step 4: Serve Locally​

Step 5: Run Inference​

Step 1: Install Prerequisites

Install LM Studio

Install Clarifai

Step 2: Log In

Step 3: Initialize a Model

Step 4: Serve Locally

Step 5: Run Inference