Upload Your First Model
Upload a model from Hugging Face to the Clarifai platform
The Clarifai platform allows you to upload custom models for a wide range of use cases. With just a few simple steps, you can get your models up and running and leverage the platform’s powerful capabilities.
Let's demonstrate how you can upload the Llama-3.2-1B-Instruct model from Hugging Face to the Clarifai platform.
To learn more about how to upload different types of models, check out this comprehensive guide.
Step 1: Perform Prerequisites
Install Clarifai Package
Install the latest version of the clarifai
Python SDK. This also installs the Clarifai Command Line Interface (CLI), which we'll use for uploading the model.
- Bash
pip install --upgrade clarifai
Set a PAT Key
You need to set the CLARIFAI_PAT
(Personal Access Token) as an environment variable. You can generate the PAT key in your personal settings page by navigating to the Security section.
This token is essential for authenticating your connection to the Clarifai platform.
- Unix-Like Systems
- Windows
export CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE
set CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE
Get a Hugging Face Access Token
To download models from the Hugging Face platform, you'll need to authenticate your connection. You can create a Hugging Face account, then generate an access token to authorize your downloads.
You can follow the guide here to get it.
Step 2: Create Files
Create a project directory and organize your files as indicated below to fit the requirements of uploading models to the Clarifai platform.
your_model_directory/
├── 1/
│ └── model.py
├── requirements.txt
└── config.yaml
- your_model_directory/ – The main directory containing your model files.
- 1/ – A subdirectory that holds the model file (Note that the folder is named as 1).
- model.py – Contains the code that defines your model, including loading the model and running inference.
- requirements.txt – Lists the Python libraries and dependencies required to run your model.
- config.yaml – Contains model metadata and configuration details necessary for building the Docker image, defining compute resources, and uploading the model to Clarifai.
- 1/ – A subdirectory that holds the model file (Note that the folder is named as 1).
Add the following snippets to each of the respective files.
model.py
- Python
from typing import List, Iterator
from threading import Thread
import os
import torch
from clarifai.runners.models.model_class import ModelClass
from clarifai.utils.logging import logger
from clarifai.runners.models.model_builder import ModelBuilder
from clarifai.runners.utils.openai_convertor import openai_response
from transformers import (AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer)
class MyModel(ModelClass):
"""A custom runner for llama-3.2-1b-instruct llm that integrates with the Clarifai platform"""
def load_model(self):
"""Load the model here."""
self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
logger.info(f"Running on device: {self.device}")
# Load checkpoints
model_path = os.path.dirname(os.path.dirname(__file__))
builder = ModelBuilder(model_path, download_validation_only=True)
self.checkpoints = builder.download_checkpoints(stage="runtime")
# Load model and tokenizer
self.tokenizer = AutoTokenizer.from_pretrained(self.checkpoints,)
self.tokenizer.pad_token = self.tokenizer.eos_token # Set pad token to eos token
self.model = AutoModelForCausalLM.from_pretrained(
self.checkpoints,
low_cpu_mem_usage=True,
device_map=self.device,
torch_dtype=torch.bfloat16,
)
self.streamer = TextIteratorStreamer(tokenizer=self.tokenizer,)
self.chat_template = None
logger.info("Done loading!")
@ModelClass.method
def predict(self,
prompt: str ="",
chat_history: List[dict] = None,
max_tokens: int = 512,
temperature: float = 0.7,
top_p: float = 0.8) -> str:
"""
Predict the response for the given prompt and chat history using the model.
"""
# Construct chat-style messages
messages = chat_history if chat_history else []
if prompt:
messages.append({
"role": "user",
"content": [{"type": "text", "text": prompt}]
})
inputs = self.tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(self.model.device)
generation_kwargs = {
"input_ids": inputs["input_ids"],
"do_sample": True,
"max_new_tokens": max_tokens,
"temperature": temperature,
"top_p": top_p,
"eos_token_id": self.tokenizer.eos_token_id,
}
output = self.model.generate(**generation_kwargs)
generated_tokens = output[0][inputs["input_ids"].shape[-1]:]
return self.tokenizer.decode(generated_tokens, skip_special_tokens=True)
@ModelClass.method
def generate(self,
prompt: str="",
chat_history: List[dict] = None,
max_tokens: int = 512,
temperature: float = 0.7,
top_p: float = 0.8) -> Iterator[str]:
"""Stream generated text tokens from a prompt + optional chat history."""
# Construct chat-style messages
messages = chat_history if chat_history else []
if prompt:
messages.append({
"role": "user",
"content": [{"type": "text", "text": prompt}]
})
response = self.chat(
messages=messages,
max_tokens=max_tokens,
temperature=temperature,
top_p=top_p
)
for each in response:
yield each['choices'][0]['delta']['content']
@ModelClass.method
def chat(self,
messages: List[dict],
max_tokens: int = 512,
temperature: float = 0.7,
top_p: float = 0.8) -> Iterator[dict]:
"""
Stream back JSON dicts for assistant messages.
Example return format:
{"role": "assistant", "content": [{"type": "text", "text": "response here"}]}
"""
# Tokenize using chat template
inputs = self.tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to(self.model.device)
generation_kwargs = {
"input_ids": inputs["input_ids"],
"do_sample": True,
"max_new_tokens": max_tokens,
"temperature": temperature,
"top_p": top_p,
"eos_token_id": self.tokenizer.eos_token_id,
"streamer": self.streamer
}
thread = Thread(target=self.model.generate, kwargs=generation_kwargs)
thread.start()
# Accumulate response text
for token_text in self.streamer:
yield openai_response(token_text)
thread.join()
def test(self):
"""Test the model here."""
try:
print("Testing predict...")
# Test predict
print(self.predict(prompt="What is the capital of India?",))
except Exception as e:
print("Error in predict", e)
try:
print("Testing generate...")
# Test generate
for each in self.generate(prompt="What is the capital of India?",):
print(each, end="")
print()
except Exception as e:
print("Error in generate", e)
try:
print("Testing chat...")
messages = [
{"role": "system", "content": "You are an helpful assistant."},
{"role": "user", "content": "What is the capital of India?"},
]
for each in self.chat(messages=messages,):
print(each, end="")
print()
except Exception as e:
print("Error in generate", e)
requirements.txt
- Text
torch==2.5.1
tokenizers>=0.21.0
transformers>=4.47.0
accelerate>=1.2.0
scipy==1.10.1
optimum>=1.23.3
protobuf==5.27.3
einops>=0.8.0
requests==2.32.3
clarifai>=11.3.0
config.yaml
In the model
section of the config.yaml
file, specify your model ID, Clarifai user ID, and Clarifai app ID. These will define where your model will be uploaded on the Clarifai platform. You also need to specify the hf_token
to authenticate your connection to Hugging Face, as described earlier.
- YAML
model:
id: "llama_3_2_1b_instruct"
user_id: "user_id"
app_id: "app_id"
model_type_id: "text-to-text"
build_info:
python_version: "3.11"
inference_compute_info:
cpu_limit: "1"
cpu_memory: "13Gi"
num_accelerators: 1
accelerator_type: ["NVIDIA-*"]
accelerator_memory: "18Gi"
checkpoints:
type: "huggingface"
repo_id: "unsloth/Llama-3.2-1B-Instruct"
hf_token: "hf_token"
when: "runtime"
Step 3: Upload the Model
Once your custom model is ready, upload it to the Clarifai platform by navigating to the directory containing the model and running the following command:
- CLI
clarifai model upload
Congratulations — you've just uploaded your first model to the Clarifai platform!
Now, you can deploy the model to a cluster and nodepool. This allows you to cost-efficiently and scalably make inferences with it.