SGLang
Run models using the SGLang runtime format and make them available via a public API
SGLang is an open-source runtime and programming framework designed for structured generation and high-performance inference of large language models (LLMs) and vision-language models.
It provides a flexible way to execute models with advanced capabilities like multi-step prompting, structured outputs, and multimodal reasoning — all while maximizing throughput and minimizing latency.
With Clarifai’s Local Runners, you can download and run these models on your own machine using the SGLang runtime format, expose them securely via a public URL, and tap into Clarifai’s powerful platform — all while retaining the privacy, performance, and control of local execution.
Note: The SGLang toolkit specifies a runtime format to run models sourced from external sources like Hugging Face. After initializing a model using the toolkit, you can upload it to Clarifai to leverage the platform’s capabilities.
Step 1: Perform Prerequisites
Sign Up or Log In
Log in to your existing Clarifai account or sign up for a new one. Once you’re logged in, gather the following credentials required for setup:
- App ID – Go to the application you want to use to run the model. In the collapsible left sidebar, select Overview and copy the app ID displayed there.
- User ID – In the collapsible left sidebar, open Settings, then choose Account from the dropdown list to locate your user ID.
- Personal Access Token (PAT) – From the same Settings menu, select Secrets to create or copy your PAT. This token is used to authenticate your connection with the Clarifai platform.
Then, set your PAT as an environment variable:
- Unix-Like Systems
- Windows
export CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE
set CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE
Install Clarifai CLI
Install the latest Clarifai CLI which includes built-in support for Local Runners:
- Bash
pip install --upgrade clarifai
Note: The Local Runners require Python 3.11 or 3.12.
Install SGLang
Install SGLang to enable its runtime execution environment.
- Bash
pip install sglang
Tip: GPU acceleration (CUDA) is highly recommended for optimal performance.
Install OpenAI
Install the openai package, which is needed to perform inference with models that use the OpenAI-compatible format.
- Bash
pip install openai
Get Hugging Face Token
If you want to initialize a Hugging Face model for use with SGLang, you’ll need a Hugging Face access token to authenticate with Hugging Face services — especially when accessing private or restricted repositories.
You can create one by following these instructions. Once you have the token, include it either in your model’s config.yaml file (as described below) or set it as an environment variable.
Note: If
hf_tokenis not specified in theconfig.yamlfile, the CLI will automatically use theHF_TOKENenvironment variable for authentication with Hugging Face.
- Unix-Like Systems
- Windows
export HF_TOKEN="YOUR_HF_ACCESS_TOKEN_HERE"
set HF_TOKEN="YOUR_HF_ACCESS_TOKEN_HERE"
Step 2: Initialize a Model
With the Clarifai CLI, you can initialize a model configured to run using the SGLang runtime format. It sets up a Clarifai-compatible project directory with the appropriate files.
You can customize or optimize the model by editing the generated files as needed. For example, the command below initializes a default Hugging Face model (HuggingFaceTB/SmolLM2-135M-Instruct) in your current directory.
- Bash
clarifai model init --toolkit sglang
Example Output
clarifai model init --toolkit sglang
[INFO] 20:14:19.494294 Parsed GitHub repository: owner=Clarifai, repo=runners-examples, branch=sglang, folder_path= | thread=8729403584
[INFO] 20:14:20.762093 Files to be downloaded are:
1. 1/model.py
2. 1/openai_server_starter.py
3. Dockerfile
4. README.md
5. config.yaml
6. requirements.txt | thread=8729403584
Press Enter to continue...
[INFO] 20:14:24.640395 Initializing model from GitHub repository: https://github.com/Clarifai/runners-examples | thread=8729403584
[INFO] 20:14:33.997825 Successfully cloned repository from https://github.com/Clarifai/runners-examples (branch: sglang) | thread=8729403584
[INFO] 20:14:34.006824 Updated Hugging Face model repo_id to: None | thread=8729403584
[INFO] 20:14:34.006878 Model initialization complete with GitHub repository | thread=8729403584
[INFO] 20:14:34.006909 Next steps: | thread=8729403584
[INFO] 20:14:34.006929 1. Review the model configuration | thread=8729403584
[INFO] 20:14:34.006946 2. Install any required dependencies manually | thread=8729403584
[INFO] 20:14:34.006966 3. Test the model locally using 'clarifai model local-test' | thread=8729403584
You can use the --model-name parameter to initialize any supported Hugging Face model. This sets the model’s repo_id, specifying which Hugging Face repository to initialize from.
- Bash
clarifai model init --toolkit sglang --model-name unsloth/Llama-3.2-1B-Instruct
Note: Large models require significant GPU memory. Ensure your machine has enough compute capacity to run them efficiently.
The generated structure includes:
├── 1/
│ └── model.py
| └── openai_server_starter.py