Deploy Your First Model via CLI
Quickly build and deploy your first custom model to the Clarifai platform
The Clarifai platform lets you deploy models to production in just three commands. This guide walks you through the complete workflow — from scaffolding to running predictions in the cloud.
We'll show two paths:
- Path A (recommended): Deploy a pre-trained LLM from HuggingFace using a toolkit
- Path B: Deploy a custom Python model you write from scratch
Step 1: Install and Log In
Install Clarifai
- Bash
pip install --upgrade clarifai
This installs both the Python SDK and the CLI.
Log In
You need a Clarifai account and a Personal Access Token (PAT). If you don't have one:
Then authenticate:
- CLI
clarifai login
The CLI will prompt for your PAT, validate it, auto-detect your user ID, and save everything locally. Verify it worked:
- CLI
clarifai whoami
You should see your user ID and the active context name.
Path A: Deploy a Pre-Trained LLM (Recommended)
This is the fastest way to get a model running on Clarifai. We'll use Qwen3-0.6B — a small, ungated model that doesn't require a HuggingFace token.
Step 2A: Scaffold
- CLI
clarifai model init --toolkit vllm --model-name Qwen/Qwen3-0.6B
This creates a Qwen3-0.6B/ directory with everything pre-configured:
Qwen3-0.6B/
├── 1/
│ └── model.py # vLLM inference logic (ready to use)
├── config.yaml # Config with auto-selected GPU instance
└── requirements.txt # Dependencies (clarifai, openai — vLLM comes from Docker image)
The generated config.yaml is minimal — no placeholders to fill in:
build_info:
image: vllm/vllm-openai:latest
checkpoints:
repo_id: Qwen/Qwen3-0.6B
type: huggingface
when: runtime # Downloads weights at startup, not during build
compute:
instance: g4dn.xlarge # Auto-selected based on model VRAM needs
model:
id: qwen3-06b # Sanitized from Qwen/Qwen3-0.6B
What gets auto-filled:
user_idandapp_idare resolved from your login context at deploy time.model.idis sanitized from the HuggingFace model name (e.g.,Qwen/Qwen3-0.6B→qwen3-06b).compute.instanceis auto-selected based on the model's estimated VRAM requirements — the CLI fetches the model's architecture from HuggingFace and calculates the exact memory needed for weights, KV cache, and framework overhead.build_info.imagespecifies the Docker base image (vLLM comes pre-installed in it, sorequirements.txtonly lists lightweight dependencies).
Step 3A: Deploy
- CLI
clarifai model deploy ./Qwen3-0.6B
That's it. The CLI will:
- Validate — check
config.yamland verify the HuggingFace repo is accessible - Upload — build a Docker image and push it to Clarifai
- Deploy — auto-create a compute cluster (a group of machines), a nodepool (the specific GPU instance), and a deployment (your model running on that hardware)
- Monitor — stream pod events until the model is ready
When it finishes, you'll see output like this:
── Ready ──────────────────────────────────────────────
Model deployed successfully!
Model: https://clarifai.com/your-user/main/models/qwen3-06b
Version: abc12345
Deployment: deploy-qwen3-06b-dd8481
Instance: g4dn.xlarge
Cloud: AWS / us-east-1
── Next Steps ─────────────────────────────────────────
Predict: clarifai model predict your-user/main/models/qwen3-06b "Hello"
Logs: clarifai model logs --deployment "deploy-qwen3-06b-dd8481"
Status: clarifai model status --deployment "deploy-qwen3-06b-dd8481"
Undeploy: clarifai model undeploy --deployment "deploy-qwen3-06b-dd8481"
Copy the predict command from the output — it contains your actual user ID and deployment ID, so you can paste it directly.
Step 4A: Predict
Copy the predict command from the deploy output, or construct it using your user ID:
- CLI
clarifai model predict your-user/main/models/qwen3-06b "Explain quantum computing in one sentence"
Replace
your-userwith your actual Clarifai user ID (the one shown byclarifai whoami). Themainis the default app that's auto-created for you.
The response streams in real-time. You can also test in the Playground by clicking the model URL from the deploy output.
Step 5A: Manage
# Check deployment status
clarifai model status --deployment <deployment-id>
# Stream live logs (useful for debugging startup issues)
clarifai model logs --deployment <deployment-id>
# View Kubernetes scheduling events (useful if pod won't start)
clarifai model logs --deployment <deployment-id> --log-type events
# Remove deployment when done (stops billing)
clarifai model undeploy --deployment <deployment-id>
Replace
<deployment-id>with the ID from the deploy output (e.g.,deploy-qwen3-06b-dd8481).
Path B: Deploy a Custom Python Model
Use this path when you're writing your own model logic (not wrapping a pre-trained model).
Step 2B: Scaffold
- CLI
clarifai model init my-first-model
This creates a blank model project. Replace the generated files with your model logic:
1/model.py
- Python
from clarifai.runners.models.model_class import ModelClass
from typing import Iterator
class MyFirstModel(ModelClass):
"""A custom model that generates 'Hello World' outputs in a streaming fashion."""
@ModelClass.method
def generate(self, text1: str = "") -> Iterator[str]:
"""
This method streams multiple outputs instead of returning just one.
It takes an input string and yields a sequence of outputs.
"""
for i in range(5): # number of generated outputs
output_text = text1 + f" Hello World {i}"
yield output_text
requirements.txt
- Text
clarifai>=11.8.2
config.yaml
- YAML
model:
id: "my-first-model"
model_type_id: "text-to-text"
build_info:
python_version: "3.11"
inference_compute_info:
cpu_limit: "1"
cpu_memory: "5Gi"
num_accelerators: 0
user_idandapp_idare auto-filled from your CLI context at deploy time. You don't need to add them.
Step 3B: Test Locally (Optional)
Before deploying, verify your model works locally:
- CLI
clarifai model serve ./my-first-model
This starts the model and connects it through a Clarifai-managed API endpoint for testing. Press Ctrl+C to stop.
For offline development without a Clarifai login:
- CLI
clarifai model serve ./my-first-model --grpc
Step 4B: Deploy
- CLI
clarifai model deploy ./my-first-model
This CPU-only model doesn't need --instance — the CLI uses the inference_compute_info from config.yaml. For GPU models, you'd add --instance g5.xlarge (run clarifai list-instances to see all available GPU instances).
Step 5B: Predict
Copy the predict command from the deploy output, or construct it:
- CLI
clarifai model predict your-user/main/models/my-first-model "Yes, I uploaded it!"
Replace
your-userwith your Clarifai user ID (shown byclarifai whoami).
Output Example
Yes, I uploaded it! Hello World 0
Yes, I uploaded it! Hello World 1
Yes, I uploaded it! Hello World 2
Yes, I uploaded it! Hello World 3
Yes, I uploaded it! Hello World 4
Or use the Python SDK:
- Python
from clarifai.client import Model
import os
# Set PAT as an environment variable
# export CLARIFAI_PAT=YOUR_PAT_HERE # Unix-Like Systems
# set CLARIFAI_PAT=YOUR_PAT_HERE # Windows
# Also set CLARIFAI_DEPLOYMENT_ID as an environment variable
# Initialize with your model URL
model = Model(
url="https://clarifai.com/user-id/app-id/models/model-id",
deployment_id=os.environ.get("CLARIFAI_DEPLOYMENT_ID", None),
)
for response in model.generate("Yes, I uploaded it! "):
print(response)
Troubleshooting
Deploy is stuck on "Monitor" phase
The model pod is starting up. For large LLMs, this can take 5–10 minutes as weights download at runtime. Check progress with:
clarifai model logs --deployment <deployment-id>
If the pod is failing to schedule, check Kubernetes events:
clarifai model logs --deployment <deployment-id> --log-type events
HuggingFace token error
Some models (Llama, Gemma, etc.) are gated and require a HuggingFace access token. The CLI catches this early during the Validate phase:
UserError: HuggingFace repo 'meta-llama/Llama-3.1-8B-Instruct' requires authentication.
Set HF_TOKEN in your environment:
export HF_TOKEN=hf_...
To fix: create a HuggingFace token, request access to the model, then set the token:
export HF_TOKEN=hf_your_token_here
The CLI will automatically include it in the build.
Model predict returns an error
Make sure the deployment is fully ready before predicting:
clarifai model status --deployment <deployment-id>
Wait until the status shows the deployment is active. Runtime checkpoint downloads can take several minutes for large models.
What's Next?
- Browse all available GPU instances for deployment (
clarifai list-instances) - Learn about toolkits (vLLM, SGLang, Ollama, and more)
- Explore the full CLI reference
- Set up autoscaling for production workloads
Congratulations! You've successfully deployed a model to the Clarifai platform and run inference with it.