Skip to main content

LM Studio

Download and run LM Studio models locally and make them available via a public API


LM Studio is a desktop application that lets you run and chat with open-source large language models (LLMs) locally — no internet connection required.

With Clarifai’s Local Runners, you can take this a step further: run LM Studio models directly on your machine, expose them securely through a public URL, and leverage Clarifai’s powerful AI platform — all while maintaining the speed, privacy, and control of local deployment.

Note: After downloading the model using the LM Studio toolkit, you can upload it to Clarifai to leverage the platform’s capabilities.

Step 1: Perform Prerequisites

Sign Up or Log In

Log in to your existing Clarifai account or sign up for a new one. After logging in, gather the following credentials for setup:

  • App ID – Go to the application you’ll use to run your model and select Overview in the collapsible left sidebar. Get the app ID from there.
  • User ID – In the collapsible left sidebar, select Settings and select Account from the dropdown list. Then, find your user ID.
  • Personal Access Token (PAT) – From the same Settings option, select Secrets to create or copy your PAT. This token is required to authenticate your connection with the Clarifai platform.

Once you have your PAT, set it as an environment variable for secure authentication:

export CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE

Install the Clarifai CLI

Next, install the latest version of the Clarifai CLI, which includes built-in support for Local Runners.

pip install --upgrade clarifai

Note: Ensure you have Python 3.11 or 3.12 installed to successfully run Local Runners.

Install the OpenAI Package

Install the openai package — it’s required to perform inference with LM Studio models that support the OpenAI-compatible format.

pip install openai

Install LM Studio

Download and install the LM Studio desktop application to run open-source large language models locally.

Ensure the LM Studio remains open and running when you start a Clarifai Local Runner, as the runner relies on LM Studio’s internal model runtime for successful execution.

Note: Currently, Clarifai Local Runners support running LLMs through LM Studio only on Apple devices (macOS).

Step 2: Initialize a Model

Using the Clarifai CLI, you can download and set up any model available in the LM Studio Model Catalog that supports the GGUF format.

For example, the command below initializes the default model (LiquidAI/LFM2-1.2B) in your current directory:

clarifai model init --toolkit lmstudio
Example Output
clarifai model init --toolkit lmstudio
[INFO] 09:07:03.086018 Parsed GitHub repository: owner=Clarifai, repo=runners-examples, branch=lmstudio, folder_path= | thread=8309383360
[INFO] 09:07:05.331174 Files to be downloaded are:
1. 1/model.py
2. config.yaml
3. requirements.txt | thread=8309383360
Press Enter to continue...
[INFO] 09:07:09.895510 Initializing model from GitHub repository: https://github.com/Clarifai/runners-examples | thread=8309383360
[INFO] 09:07:37.976873 Successfully cloned repository from https://github.com/Clarifai/runners-examples (branch: lmstudio) | thread=8309383360
[INFO] 09:07:37.980528 Model initialization complete with GitHub repository | thread=8309383360
[INFO] 09:07:37.980580 Next steps: | thread=8309383360
[INFO] 09:07:37.980603 1. Review the model configuration | thread=8309383360
[INFO] 09:07:37.980619 2. Install any required dependencies manually | thread=8309383360
[INFO] 09:07:37.980635 3. Test the model locally using 'clarifai model local-test' | thread=8309383360

Running this command creates a new model directory structure compatible with the Clarifai platform. You can further customize or optimize the model by modifying the generated files as needed.

tip

To initialize a specific LM Studio model that supports the GGUF format, use the --model-name flag.

clarifai model init --toolkit lmstudio --model-name qwen/qwen3-4b-thinking-2507

The generated structure includes:

├── 1/
│ └── model.py
├── requirements.txt
└── config.yaml

model.py

Example: model.py
import sys
import time
import socket
import os
import json
import subprocess
from typing import List, Iterator

from clarifai.runners.models.openai_class import OpenAIModelClass
from clarifai.runners.models.model_builder import ModelBuilder
from clarifai.utils.logging import logger

from clarifai.runners.utils.data_utils import Param
from clarifai.runners.utils.data_types import Image
from clarifai.runners.utils.openai_convertor import build_openai_messages

from openai import OpenAI


VERBOSE_LMSTUDIO = True # Set to True to see the output of the lmstudio server in the logs

def _stream_command(cmd: str, verbose: bool = True):
"""
Run a shell command, streaming its combined stdout/stderr line by line.
Returns True on exit code 0, else raises RuntimeError.
"""
logger.info(f"Running: {cmd}")
# Force line buffering from many tools by setting environment tweaks
env = os.environ.copy()
env["PYTHONUNBUFFERED"] = "1"
# Start process
process = subprocess.Popen(
cmd,
shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
text=True,
bufsize=1,
env=env
)
if verbose and process.stdout:
for line in iter(process.stdout.readline, ""):
if line: # strip trailing newline for cleaner log
logger.info(f"[lms logs] {line.rstrip()}")
ret = process.wait()
if ret != 0:
raise RuntimeError(f"Command failed ({ret}): {cmd}")
return True

def _wait_for_port(port: int, timeout: float = 30.0):
"""
Wait until something is listening on localhost:port.
"""
start = time.time()
while time.time() - start < timeout:
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
sock.settimeout(1)
try:
if sock.connect_ex(("127.0.0.1", port)) == 0:
return True
except Exception:
pass
time.sleep(0.5)
raise RuntimeError(f"Server did not start listening on port {port} within {timeout}s")

def run_lms_server(model_name: str = 'LiquidAI/LFM2-1.2B-GGUF', port: int = 11434,
context_length: int = 4096) -> None:
"""
Start the lmstudio server with ordered, real‑time logs.
"""
from clarifai.runners.utils.model_utils import terminate_process # keep if needed elsewhere
try:
# 1. Pull model
_stream_command(f"lms get https://huggingface.co/{model_name} --verbose", verbose=VERBOSE_LMSTUDIO)
logger.info(f"Model {model_name} pulled successfully.")

# 2. Unload previous models
_stream_command("lms unload --all", verbose=VERBOSE_LMSTUDIO)
logger.info("All models unloaded successfully.")

# 3. Load target model
_stream_command(f"lms load {model_name} --verbose --context-length {context_length}",
verbose=VERBOSE_LMSTUDIO)
logger.info(f"Model {model_name} loaded (context_length={context_length}).")

# 4. Start server (run in background so we return)
logger.info(f"Starting lmstudio server on port {port}...")
# Start server detached so we can still stream its startup output briefly if verbose.
server_proc = subprocess.Popen(
f"lms server start --port {port}",
shell=True,
stdout=None if not VERBOSE_LMSTUDIO else sys.stdout,
stderr=None if not VERBOSE_LMSTUDIO else sys.stderr
)

# 5. Wait for port to be open
_wait_for_port(port)
logger.info(f"lms server started successfully on port {port} (pid={server_proc.pid}).")

except Exception as e:
logger.error(f"Error starting lmstudio server: {e}")
raise RuntimeError(f"Failed to start lmstudio server: {e}")

# Check if Image has content before building messages
def has_image_content(image: Image) -> bool:
"""Check if Image object has either bytes or URL."""
return bool(getattr(image, 'url', None) or getattr(image, 'bytes', None))

class LMstudioModelClass(OpenAIModelClass):

client = True
model = True

def load_model(self):
"""
Load the lmstudio model.
"""
model_path = os.path.dirname(os.path.dirname(__file__))
builder = ModelBuilder(model_path, download_validation_only=True)
self.model = builder.config['toolkit']['model']
self.port = builder.config['toolkit']['port']
self.context_length = builder.config['toolkit']['context_length']

#start lmstudio server
run_lms_server(model_name=self.model, port=self.port, context_length=self.context_length)

self.client = OpenAI(
api_key="notset",
base_url= f"http://localhost:{self.port}/v1")

logger.info(f"LMstudio model loaded successfully: {self.model}")


@OpenAIModelClass.method
def predict(self,
prompt: str,
image: Image = None,
images: List[Image] = None,
chat_history: List[dict] = None,
tools: List[dict] = None,
tool_choice: str = None,
max_tokens: int = Param(default=2048, description="The maximum number of tokens to generate. Shorter token lengths will provide faster performance.", ),
temperature: float = Param(default=0.7, description="A decimal number that determines the degree of randomness in the response", ),
top_p: float = Param(default=0.95, description="An alternative to sampling with temperature, where the model considers the results of the tokens with top_p probability mass."),
) -> str:
"""
This method is used to predict the response for the given prompt and chat history using the model and tools.
"""
if tools is not None and tool_choice is None:
tool_choice = "auto"

img_content = image if has_image_content(image) else None

messages = build_openai_messages(prompt=prompt, image=img_content, images=images, messages=chat_history)
response = self.client.chat.completions.create(
model=self.model,
messages=messages,
tools=tools,
tool_choice=tool_choice,
max_completion_tokens=max_tokens,
temperature=temperature,
top_p=top_p)

if response.usage is not None:
self.set_output_context(prompt_tokens=response.usage.prompt_tokens,
completion_tokens=response.usage.completion_tokens)
if len(response.choices) == 0:
# still need to send the usage back.
return ""

if response.choices[0] and response.choices[0].message.tool_calls:
# If the response contains tool calls, return as a string
tool_calls = response.choices[0].message.tool_calls
tool_calls_json = json.dumps([tc.to_dict() for tc in tool_calls], indent=2)
return tool_calls_json
else:
# Otherwise, return the content of the first choice
return response.choices[0].message.content


@OpenAIModelClass.method
def generate(self,
prompt: str,
image: Image = None,
images: List[Image] = None,
chat_history: List[dict] = None,
tools: List[dict] = None,
tool_choice: str = None,
max_tokens: int = Param(default=2048, description="The maximum number of tokens to generate. Shorter token lengths will provide faster performance.", ),
temperature: float = Param(default=0.7, description="A decimal number that determines the degree of randomness in the response", ),
top_p: float = Param(default=0.95, description="An alternative to sampling with temperature, where the model considers the results of the tokens with top_p probability mass.")) -> Iterator[str]:
"""
This method is used to stream generated text tokens from a prompt + optional chat history and tools.
"""
if tools is not None and tool_choice is None:
tool_choice = "auto"

img_content = image if has_image_content(image) else None

messages = build_openai_messages(prompt=prompt, image=img_content, images=images, messages=chat_history)
for chunk in self.client.chat.completions.create(
model=self.model,
messages=messages,
tools=tools,
tool_choice=tool_choice,
max_completion_tokens=max_tokens,
temperature=temperature,
top_p=top_p,
stream=True,
stream_options={"include_usage": True}
):
if chunk.usage is not None:
if chunk.usage.prompt_tokens or chunk.usage.completion_tokens:
self.set_output_context(prompt_tokens=chunk.usage.prompt_tokens, completion_tokens=chunk.usage.completion_tokens)
if len(chunk.choices) == 0: # still need to send the usage back.
yield ""

if chunk.choices:
if chunk.choices[0].delta.tool_calls:
# If the response contains tool calls, return the first one as a string
import json
tool_calls = chunk.choices[0].delta.tool_calls
tool_calls_json = [tc.to_dict() for tc in tool_calls]
# Convert to JSON string
json_string = json.dumps(tool_calls_json, indent=2)
# Yield the JSON string
yield json_string
else:
# Otherwise, return the content of the first choice
text = (chunk.choices[0].delta.content
if (chunk and chunk.choices[0].delta.content) is not None else '')
yield text

The model.py file inside the 1/ directory defines the model’s logic — including how predictions are made and how inputs and outputs are handled.

config.yaml

Example: config.yaml
build_info:
python_version: '3.12'
inference_compute_info:
cpu_limit: '3'
cpu_memory: 14Gi
num_accelerators: 0
model:
app_id: local-runner-app
id: local-env-model
model_type_id: text-to-text
user_id: clarifai-user-id
toolkit:
provider: lmstudio
model: LiquidAI/LFM2-1.2B
port: 11434
context_length: 2048

The config.yaml file defines key configuration details, such as compute resource requirements and toolkit metadata.

In the model section, specify a unique model ID (any name of your choice) and your Clarifai user ID and app ID. These parameters determine where the model will be deployed on the Clarifai platform.

requirements.txt

Example: requirements.txt
clarifai
openai

The requirements.txt file lists the Python dependencies your model needs. If you haven’t installed them yet, run the following command to install the dependencies:

pip install -r requirements.txt

Step 3: Log In to Clarifai

Use the Clarifai CLI to log in to your account and create a configuration context that securely connects your local environment to the Clarifai platform.

clarifai login

You’ll be prompted to enter the following details:

  • User ID – Your Clarifai User ID.
  • PAT – Your Clarifai Personal Access Token. If you’ve already set the CLARIFAI_PAT environment variable, type ENVVAR to use it automatically.
  • Context name – Optionally, specify a custom name for this configuration context, or press Enter to use the default "default". Contexts are useful when working with multiple environments or projects.
Example Output
clarifai login
Enter your Clarifai user ID: alfrick
> To authenticate, you'll need a Personal Access Token (PAT).
> You can create one from your account settings: https://clarifai.com/alfrick/settings/security

Enter your Personal Access Token (PAT) value (or type "ENVVAR" to use an environment variable): ENVVAR

> Verifying token...
[INFO] 09:38:03.867057 Validating the Context Credentials... | thread=8309383360
[INFO] 09:38:05.176881 ✅ Context is valid | thread=8309383360

> Let's save these credentials to a new context.
> You can have multiple contexts to easily switch between accounts or projects.

Enter a name for this context [default]:
✅ Success! You are now logged in.
Credentials saved to the 'default' context.

💡 To switch contexts later, use `clarifai config use-context <name>`.
[INFO] 09:38:10.706639 Login successful for user 'alfrick' in context 'default' | thread=8309383360

Step 4: Start the Local Runner

Next, start your Local Runner, which connects to the LM Studio runtime to execute your model locally.

clarifai model local-runner

If configuration contexts or defaults are missing, the CLI will guide you through setting them up automatically.

This setup ensures that all necessary components — such as compute clusters, nodepools, and deployments — are properly defined in your configuration context. For more details, see here.

Example Output
clarifai model local-runner
[INFO] 09:40:36.097539 Detected OpenAI chat completions for Clarifai model streaming - validating stream_options... | thread=8309383360
[INFO] 09:40:36.098189 > Checking local runner requirements... | thread=8309383360
[INFO] 09:40:36.118322 Checking 2 dependencies... | thread=8309383360
[INFO] 09:40:36.118807 ✅ All 2 dependencies are installed! | thread=8309383360
[INFO] 09:40:36.119033 > Verifying local runner setup... | thread=8309383360
[INFO] 09:40:36.119083 Current context: default | thread=8309383360
[INFO] 09:40:36.119120 Current user_id: alfrick | thread=8309383360
[INFO] 09:40:36.119150 Current PAT: d6570**** | thread=8309383360
[INFO] 09:40:36.121055 Current compute_cluster_id: local-runner-compute-cluster | thread=8309383360
[WARNING] 09:40:37.622490 Failed to get compute cluster with ID 'local-runner-compute-cluster':
code: CONN_DOES_NOT_EXIST
description: "Resource does not exist"
details: "ComputeCluster with ID \'local-runner-compute-cluster\' not found. Check your request fields."
req_id: "sdk-python-11.8.2-c324cbe5deb248e19d5d0ed1e32e49d0"
| thread=8309383360
Compute cluster not found. Do you want to create a new compute cluster alfrick/local-runner-compute-cluster? (y/n): y
[INFO] 09:40:44.198312 Compute Cluster with ID 'local-runner-compute-cluster' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.8.2-e5b312b4a46f4e2984efc65abb5124c5"
| thread=8309383360
[INFO] 09:40:44.203633 Current nodepool_id: local-runner-nodepool | thread=8309383360
[WARNING] 09:40:46.398631 Failed to get nodepool with ID 'local-runner-nodepool':
code: CONN_DOES_NOT_EXIST
description: "Resource does not exist"
details: "Nodepool not found. Check your request fields."
req_id: "sdk-python-11.8.2-1062d71d21574bce99bd4472a9fdc6ef"
| thread=8309383360
Nodepool not found. Do you want to create a new nodepool alfrick/local-runner-compute-cluster/local-runner-nodepool? (y/n): y
[INFO] 09:40:52.285792 Nodepool with ID 'local-runner-nodepool' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.8.2-66d76251237c4be38764837e639c6800"
| thread=8309383360
[INFO] 09:40:52.292983 Current app_id: local-runner-app | thread=8309383360
[WARNING] 09:40:52.574021 Failed to get app with ID 'local-runner-app':
code: CONN_DOES_NOT_EXIST
description: "Resource does not exist"
details: "app identified by path /users/alfrick/apps/local-runner-app not found"
req_id: "sdk-python-11.8.2-29b94532bf624596abbbaea66be198e2"
| thread=8309383360
App not found. Do you want to create a new app alfrick/local-runner-app? (y/n): y
[INFO] 09:40:56.302447 App with ID 'local-runner-app' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.8.2-b5066f7c64274944ba405ba01da11c1c"
| thread=8309383360
[INFO] 09:40:56.306934 Current model_id: local-runner-model | thread=8309383360
[WARNING] 09:40:58.007139 Failed to get model with ID 'local-runner-model':
code: MODEL_DOES_NOT_EXIST
description: "Model does not exist"
details: "Model \'local-runner-model\' does not exist."
req_id: "sdk-python-11.8.2-8b2717eb04624aca8bf119e03b94b5b4"
| thread=8309383360
Model not found. Do you want to create a new model alfrick/local-runner-app/models/local-runner-model? (y/n): y
[INFO] 09:41:14.336510 Model with ID 'local-runner-model' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.8.2-f36c2684e1bc4d8f99e777d42e5c53f8"
| thread=8309383360
[WARNING] 09:41:17.182009 No model versions found. Creating a new version for local runner. | thread=8309383360
[INFO] 09:41:17.510454 Model Version with ID 'fa82276f4cfa44c08745b028471bbfa5' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.8.2-0121fe726015400c86e4bd3959729787"
| thread=8309383360
[INFO] 09:41:17.517728 Current model version fa82276f4cfa44c08745b028471bbfa5 | thread=8309383360
[INFO] 09:41:17.517802 Creating the local runner tying this 'alfrick/local-runner-app/models/local-runner-model' model (version: fa82276f4cfa44c08745b028471bbfa5) to the 'alfrick/local-runner-compute-cluster/local-runner-nodepool' nodepool. | thread=8309383360
[INFO] 09:41:18.591818 Runner with ID '649b39c737d84dd8a5e3d5af0b19c207' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.8.2-ced2523458a941519a709e6af082832a"
| thread=8309383360
[INFO] 09:41:18.598056 Current runner_id: 649b39c737d84dd8a5e3d5af0b19c207 | thread=8309383360
[WARNING] 09:41:19.150091 Failed to get deployment with ID local-runner-deployment:
code: DEPLOYMENT_INVALID_REQUEST
description: "Invalid deployment request"
details: "Some of the deployment ids provided (local-runner-deployment) do not exist"
req_id: "sdk-python-11.8.2-9af7aa96a9a843e68f8f3ef898bf61c1"
| thread=8309383360
Deployment not found. Do you want to create a new deployment alfrick/local-runner-compute-cluster/local-runner-nodepool/local-runner-deployment? (y/n): y
[INFO] 09:41:25.833184 Deployment with ID 'local-runner-deployment' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.8.2-38e4b0bd886e4979bc9b7324361e2c56"
| thread=8309383360
[INFO] 09:41:25.839987 Current deployment_id: local-runner-deployment | thread=8309383360
[INFO] 09:41:25.841181 Current model section of config.yaml: {'app_id': 'local-runner-app', 'id': 'local-env-model', 'model_type_id': 'text-to-text', 'user_id': 'alfrick'} | thread=8309383360
Do you want to backup config.yaml to config.yaml.bk then update the config.yaml with the new model information? (y/n): y
[INFO] 09:41:29.312446 Checking 2 dependencies... | thread=8309383360
[INFO] 09:41:29.313228 ✅ All 2 dependencies are installed! | thread=8309383360
[INFO] 09:41:29.313325 ✅ Starting local runner... | thread=8309383360
[INFO] 09:41:29.313404 No secrets path configured, running without secrets | thread=8309383360
[INFO] 09:41:30.647566 Detected OpenAI chat completions for Clarifai model streaming - validating stream_options... | thread=8309383360
[INFO] 09:41:34.359410 Detected OpenAI chat completions for Clarifai model streaming - validating stream_options... | thread=8309383360
[INFO] 09:41:34.359915 Running: lms get https://huggingface.co/LiquidAI/LFM2-1.2B --verbose | thread=8309383360
[INFO] 09:41:34.625973 [lms logs] D Found local API server at ws://127.0.0.1:41343 | thread=8309383360
[INFO] 09:41:34.633082 [lms logs] I Searching for models with the term https://huggingface.co/LiquidAI/LFM2-1.2B | thread=8309383360
[INFO] 09:41:34.633891 [lms logs] D Searching for models with options { | thread=8309383360
[INFO] 09:41:34.633919 [lms logs] searchTerm: 'https://huggingface.co/LiquidAI/LFM2-1.2B', | thread=8309383360
[INFO] 09:41:34.633937 [lms logs] compatibilityTypes: undefined, | thread=8309383360
[INFO] 09:41:34.633950 [lms logs] limit: undefined | thread=8309383360
[INFO] 09:41:34.633963 [lms logs] } | thread=8309383360
[INFO] 09:41:40.602478 [lms logs] D Found 10 result(s) | thread=8309383360
[INFO] 09:41:40.602769 [lms logs] D Prompting user to choose a model | thread=8309383360
[INFO] 09:41:40.602822 [lms logs] I No exact match found. Please choose a model from the list below. | thread=8309383360
[INFO] 09:41:40.602867 [lms logs] | thread=8309383360
[INFO] 09:41:40.603408 [lms logs] ! Use the arrow keys to navigate, type to filter, and press enter to select. | thread=8309383360
[INFO] 09:41:40.603520 [lms logs] | thread=8309383360
[INFO] 09:41:40.619671 [lms logs] ? Select a model to download Type to filter... | thread=8309383360
[INFO] 09:41:40.619819 [lms logs] ❯ LiquidAI/LFM2-1.2B-GGUF | thread=8309383360
[INFO] 09:41:40.619874 [lms logs] LiquidAI/LFM2-1.2B-Tool-GGUF | thread=8309383360
[INFO] 09:41:40.619902 [lms logs] LiquidAI/LFM2-1.2B-Extract-GGUF | thread=8309383360
[INFO] 09:41:40.619946 [lms logs] LiquidAI/LFM2-1.2B-RAG-GGUF | thread=8309383360
[INFO] 09:41:40.619969 [lms logs] DevQuasar/LiquidAI.LFM2-1.2B-GGUF | thread=8309383360
[INFO] 09:41:40.619992 [lms logs] bartowski/LiquidAI_LFM2-1.2B-Extract-GGUF | thread=8309383360
[INFO] 09:41:40.620018 [lms logs] bartowski/LiquidAI_LFM2-1.2B-RAG-GGUF | thread=8309383360
[INFO] 09:41:40.620044 [lms logs] bartowski/LiquidAI_LFM2-1.2B-Tool-GGUF | thread=8309383360
[INFO] 09:41:40.620066 [lms logs] DevQuasar/LiquidAI.LFM2-1.2B-RAG-GGUF | thread=8309383360

Step 5: Test Your Runner

After the Local Runner starts, you can use it to perform inference with your LM Studio–based model.

You can run a snippet in a separate terminal, within the same directory, to confirm that your model is running and responding as expected.

Here’s an example snippet:

import os
from openai import OpenAI

# Initialize the OpenAI client, pointing to Clarifai's API
client = OpenAI(
base_url="https://api.clarifai.com/v2/ext/openai/v1", # Clarifai's OpenAI-compatible API endpoint
api_key=os.environ["CLARIFAI_PAT"] # Ensure CLARIFAI_PAT is set as an environment variable
)

# Make a chat completion request to a Clarifai-hosted model
response = client.chat.completions.create(
model="https://clarifai.com/<user-id>/local-runner-app/models/local-runner-model",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the future of AI?"}
],
)

# Print the model's response
print(response.choices[0].message.content)