Model Uploading
Import custom models, including from external sources like Hugging Face and OpenAI
The Clarifai Python SDK allows you to upload custom models easily. Whether you're working with a pre-trained model from an external source like Hugging Face or OpenAI, or one you've built from scratch, Clarifai allows seamless integration of your models, enabling you to take advantage of the platform’s powerful capabilities.
Once imported to our platform, your model can be utilized alongside Clarifai's vast suite of AI tools. It will be automatically deployed and ready to be evaluated, combined with other models and agent operators in a workflow, or used to serve inference requests as it is.
Let’s demonstrate how you can successfully upload different types of models to the Clarifai platform.
You can explore this repository for examples on uploading different model types.
Prerequisites
Set up Docker or a Virtual Environment
To test, run, and upload your model, you need to set up either a Docker container or a Python virtual environment. This ensures proper dependency management and prevents conflicts in your project.
Both options allow you to work with different Python versions. For example, you can use Python 3.11 for uploading one model and Python 3.12 for another — configured via the config.yaml
file.
If Docker is installed on your system, it is highly recommended to use it for running the model. Docker provides better isolation and a fully portable environment, including for Python and system libraries.
You should ensure your local environment has sufficient memory and compute resources to handle model loading and execution, especially during testing.
Install Clarifai Package
Install the latest version of the clarifai
Python package. This will also install the Clarifai Command Line Interface (CLI), which we'll use for testing and uploading the model.
- Bash
pip install --upgrade clarifai
Set a PAT Key
You need to set the CLARIFAI_PAT
(Personal Access Token) as an environment variable. You can generate the PAT key in your personal settings page by navigating to the Security section.
This token is essential for authenticating your connection to the Clarifai platform.
- Unix-Like Systems
- Windows
export CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE
set CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE
On Windows, the Clarifai Python SDK expects a HOME
environment variable, which isn’t set by default. To ensure compatibility with file paths used by the SDK, set HOME
to the value of your USERPROFILE
. You can set it in your Command Prompt this way: set HOME=%USERPROFILE%
.
Create Project Directory
Create a project directory and organize your files as indicated below to fit the requirements of uploading models to the Clarifai platform.
your_model_directory/
├── 1/
│ └── model.py
├── requirements.txt
└── config.yaml
- your_model_directory/ – The main directory containing your model files.
- 1/ – A subdirectory that holds the model file (Note that the folder is named as 1).
- model.py – Contains the code that defines your model, including loading the model and running inference.
- requirements.txt – Lists the Python libraries and dependencies required to run your model.
- config.yaml – Contains model metadata and configuration details necessary for building the Docker image, defining compute resources, and uploading the model to Clarifai.
- 1/ – A subdirectory that holds the model file (Note that the folder is named as 1).
How to Upload a Model
Let's talk about the general steps you'd follow to upload any type of model to the Clarifai platform.
You can refer to the examples below to help you configure your files correctly.
Step 1: Prepare the config.yaml
File
The config.yaml
file is essential for specifying the model’s metadata, compute resource requirements, and model checkpoints.
Here’s a breakdown of the key sections in the file.
Model Info
This section defines your model ID, Clarifai user ID, and Clarifai app ID, which will determine where the model is uploaded on the Clarifai platform.
- YAML
model:
id: "model_id"
user_id: "user_id"
app_id: "app_id"
model_type_id: "text-to-text" # Change this based on your model type (e.g., image-classifier, text-to-text)
Build Info
This section specifies details about the environment used to build or run the model. You can include the python_version
, which is useful for ensuring compatibility between the model and its runtime environment, as different Python versions may have varying dependencies, library support, and performance characteristics.
We currently support Python 3.11 and Python 3.12 (default).
- YAML
build_info:
python_version: "3.11"
Compute Resources
Here, you define the minimum compute resources required for running your model, including CPU, memory, and optional GPU specifications.
- YAML
inference_compute_info:
cpu_limit: "2"
cpu_memory: "13Gi"
num_accelerators: 1
accelerator_type: ["NVIDIA-A10G"] # Specify the GPU type if needed
accelerator_memory: "15Gi"
cpu_limit
– Number of CPUs allocated for the model (follows Kubernetes notation, e.g., "1", "2").cpu_memory
– Minimum memory required for the CPU (uses Kubernetes notation, e.g., "1Gi", "1500Mi", "3Gi").num_accelerators
– Number of GPUs or TPUs to use for inference.accelerator_type
– Specifies the type of hardware accelerators (e.g., GPU or TPU) supported by the model (e.g., "NVIDIA-A10G"). Note that instead of specifying an exact accelerator type, you can use a wildcard(*)
to automatically match all available accelerators that fit your use case. For example, using["NVIDIA-*"]
will enable the system to choose from all NVIDIA options compatible with your model.accelerator_memory
– Minimum memory required for the GPU or TPU.
Hugging Face Model Checkpoints
If you're using a model from Hugging Face, you can automatically download its checkpoints by specifying the appropriate configuration in this section. For private or restricted Hugging Face repositories, include an access token.
- YAML
checkpoints:
type: "huggingface"
repo_id: "meta-llama/Meta-Llama-3-8B-Instruct"
when: "runtime"
hf_token: "your_hf_token" # Required for private models
The when
parameter in the checkpoints
section determines when model checkpoints should be downloaded and stored. It must be set to one of the following options:
runtime
(default) – Downloads checkpoints when loading the model in theload_model
method.build
– Downloads checkpoints during the image build process.upload
– Downloads checkpoints before uploading the model.
For larger models, we highly recommend downloading checkpoints at runtime
. Doing so prevents unnecessary increases in Docker image size, which has some advantages:
- Smaller image sizes
- Faster build times
- Quicker uploads and inference on the Clarifai platform
Downloading checkpoints at build
or upload
time can significantly increase image size, resulting in longer upload times and increased cold start latency.
Model Concepts or Labels
This section is required if your model outputs concepts or labels and is not being directly loaded from Hugging Face. So, you must define a concepts
section in the config.yaml
file.
The following model types output concepts or labels:
visual-classifier
visual-detector
visual-segmenter
text-classifier
- YAML
concepts:
- id: '0'
name: bus
- id: '1'
name: person
- id: '2'
name: bicycle
- id: '3'
name: car
If you're using a model from Hugging Face and the checkpoints
section is defined, the Clarifai platform will automatically infer concepts. In this case, you don’t need to manually specify them.
Step 2: Define Dependencies in requirements.txt
The requirements.txt
file lists all the Python dependencies your model needs. If your model requires Torch, we provide optimized pre-built Torch images as the base for machine learning and inference tasks.
These images include all necessary dependencies, ensuring efficient execution. The available pre-built Torch images are:
2.4.1-py3.11-cuda124
— Based on PyTorch 2.4.1, Python 3.11, and CUDA 12.4.2.5.1-py3.11-cuda124
— Based on PyTorch 2.5.1, Python 3.11, and CUDA 12.4.2.4.1-py3.12-cuda124
— Based on PyTorch 2.4.1, Python 3.12, and CUDA 12.4.2.5.1-py3.12-cuda124
— Based on PyTorch 2.5.1, Python 3.12, and CUDA 12.4.
To use a specific Torch version, define it in your requirements.txt file like this:
torch==2.5.1
This ensures the correct pre-built image is pulled from Clarifai's container registry, ensuring the correct environment is used. This minimizes cold start times and speeds up model uploads and runtime execution — avoiding the overhead of building images from scratch or pulling and configuring them from external sources.
We recommend using either torch==2.5.1
or torch==2.4.1
. If your model requires a different Torch version, you can specify it in requirements.txt, but this may slightly increase the model upload time.
Step 3: Prepare the model.py
File
The model.py
file contains the core logic for your model, including how the model is loaded and how predictions are made. This file must define a custom class that inherits from ModelClass
and implements the required methods.
Each parameter in the class methods must be annotated with a type, and the return type must also be specified. Clarifai's model framework supports rich data typing for both inputs and outputs. Supported types include Text
, Image
, Audio
, Video
, and more.
To define a custom model, create a class that inherits from ModelClass
and implements the following methods:
a. load_model
Method
The load_model
method is optional but recommended, as it prepares the model for inference by handling resource-heavy initializations. It is particularly useful for:
- One-time setup of heavy resources, such as loading trained models or initializing data transformations.
- Executing tasks during model container startup to reduce runtime latency.
- Loading essential components like tokenizers, pipelines, and other model-related assets.
Here is an example:
def load_model(self):
self.tokenizer = AutoTokenizer.from_pretrained("model/")
self.pipeline = transformers.pipeline(...)
b. Prediction Methods
You need to include at least one method decorated with @ModelClass.method
to define the prediction endpoints.
We support various methods of predictions based on type hints:
# Unary-Unary (Standard request-response)
@ModelClass.method
def predict(self, input: Image) -> Text
# Unary-Stream (Server-side streaming)
@ModelClass.method
def generate(self, prompt: Text) -> Stream[Text]
# Stream-Stream (Bidirectional streaming)
@ModelClass.method
def analyze_video(self, frames: Stream[Image]) -> Stream[str]
Here is an example of a model.py
file.
- Python
from clarifai.runners.models.model_class import ModelClass
from clarifai.runners.utils.data_types import Stream, Text
class MyModel(ModelClass):
"""A custom runner that adds "Hello World" to the end of the text."""
def load_model(self):
"""Load the model here."""
@ModelClass.method
def predict(self, text1: Text = "") -> Text:
"""This is the method that will be called when the runner is run. It takes in an input and
returns an output.
"""
output_text = text1.text + "Hello World"
return Text(output_text)
@ModelClass.method
def generate(self, text1: Text = Text("")) -> Stream[Text]:
"""Example yielding a whole batch of streamed stuff back."""
for i in range(10): # fake something iterating generating 10 times.
output_text = text1.text + f"Generate Hello World {i}"
yield Text(output_text)
@ModelClass.method
def stream(self, input_iterator: Stream[Text]) -> Stream[Text]:
"""Example yielding a whole batch of streamed stuff back."""
for i, input in enumerate(input_iterator):
output_text = input.text + f"Stream Hello World {i}"
yield Text(output_text)
The structure of prediction methods on the client side directly mirrors the method signatures defined in your model.py
file. This one-to-one mapping provides flexibility in defining prediction methods with varying names and arguments.
Here are some examples of method mapping:
model.py Model Implementation | Client-Side Usage Pattern |
---|---|
@ModelClass.method def predict(...) | model.predict(...) |
@ModelClass.method def generate(...) | model.generate(...) |
@ModelClass.method def stream(...) | model.stream(...) |
This design allows you to define any custom method with any number of parameters. For example, you could define a method like @ModelClass.method def analyze_video(...)
in model.py
, and then call it on the client side using model.analyze_video(...)
.
Here are some key characteristics of this design:
-
Method names must match exactly between
model.py
and client usage. -
Parameters retain the same names and types as defined in your method.
-
Return types follow the structure defined by your model’s outputs.
You can find more details on making predictions here.
Step 4: Test the Model Locally
Before uploading your model to the Clarifai platform, it's important to test it locally to catch any typos or misconfigurations in the code.
Learn how to test your models locally here.
Step 5: Upload the Model to Clarifai
Once your model is ready, you can upload it to the platform using Clarifai CLI.
To upload your model, run the following command in your terminal:
- Bash
clarifai model upload ./your/model/path/here
Alternatively, navigate to the directory containing your custom model and run the command without specifying the directory path:
- Bash
clarifai model upload
This command builds the model’s Docker image using the defined compute resources and uploads it to Clarifai, where it can be served in production. The build logs will be displayed in your terminal, which helps you troubleshoot any upload issues.
Build Logs Example
[INFO] 13:11:29.227543 Validating folder: C:\Users\Alfrick\Desktop\delete1\ | thread=13964
[INFO] 13:11:35.221354 Skipping downloading checkpoints for stage upload since config.yaml says to download them at stage runtime | thread=13964
[INFO] 13:11:35.288554 Using Python version 3.11 from the config file to build the Dockerfile | thread=13964
[INFO] 13:11:35.307556 Using Torch version 2.5.1 base image to build the Docker image | thread=13964
[WARNING] 13:11:35.308554 clarifai version not found in requirements.txt, using the latest version 11.2.3 | thread=13964
[WARNING] 13:11:35.319543 Updated requirements.txt to have clarifai==11.2.3 | thread=13964
[INFO] 13:11:35.637387 New model will be created at https://clarifai.com/alfrick/upload-models-2/models/test34 with it's first version. | thread=13964
Press Enter to continue...
[INFO] 13:11:44.497592 Uploading file... | thread=19308
[INFO] 13:11:44.499677 Upload complete! | thread=19308
Status: Upload done, Progress: 0% - Completed upload of files, initiating model version image build.. request_id:
Status: Model image is currently being built., Progress: 0% - Model version image is being built. request_id:
[INFO] 13:11:45.601654 Created Model Version ID: 2eeea43632294240995c0e1030bc2217 | thread=13964
[INFO] 13:11:45.602677 Full url to that version is: https://clarifai.com/alfrick/upload-models-2/models/test34 | thread=13964
[INFO] 13:11:50.934179 2025-04-22 10:11:43.505688 INFO: Downloading uploaded model from storage...
2025-04-22 10:11:44.177740 INFO: Done downloading model
2025-04-22 10:11:44.180182 INFO: Extracting upload...
2025-04-22 10:11:44.183856 INFO: Done extracting upload
2025-04-22 10:11:44.185767 INFO: Parsing requirements file for model version ID ****0e1030bc2217
2025-04-22 10:11:44.207227 INFO: Dockerfile found at /shared/context/Dockerfile
2025-04-22 10:11:45.064052 INFO: Setting up credentials
amazon-ecr-credential-helper
Version: 0.8.0
Git commit: ********
2025-04-22 10:11:45.067678 INFO: Building image...
#1 \[internal] load build definition from Dockerfile
#1 transferring dockerfile: 2.61kB done
#1 DONE 0.0s
#2 resolve image config for docker-image://docker.io/docker/dockerfile:1.13-labs
#2 DONE 0.1s
#3 docker-image://docker.io/docker/dockerfile:1.13-labs@sha256:************18b8
#3 resolve docker.io/docker/dockerfile:1.13-labs@sha256:************18b8 done
#3 CACHED
#4 \[internal] load metadata for public.ecr.aws/clarifai-models/torch:2.5.1-py3.11-cu124-********
#4 DONE 0.1s
#5 \[internal] load .dockerignore
#5 transferring context: 2B done
#5 DONE 0.0s
#6 \[final 1/8] FROM public.ecr.aws/clarifai-models/torch:2.5.1-py3.11-cu124-********@sha256:************ef64
#6 resolve public.ecr.aws/clarifai-models/torch:2.5.1-py3.11-cu124-********@sha256:************ef64 done
#6 DONE 0.0s
#7 \[internal] load build context
#7 transferring context: 7.49kB done
#7 DONE 0.0s
#8 \[final 3/8] RUN ["pip", "install", "--no-cache-dir", "-r", "/home/nonroot/requirements.txt"]
#8 CACHED
#9 \[final 5/8] COPY --chown=nonroot:nonroot downloader/unused.yaml /home/nonroot/main/1/checkpoints/.cache/unused.yaml
#9 CACHED
#10 \[final 2/8] COPY --link requirements.txt /home/nonroot/requirements.txt
#10 CACHED
#11 \[final 4/8] RUN ["pip", "show", "clarifai"]
#11 CACHED
#12 \[final 6/8] RUN ["python", "-m", "clarifai.cli", "model", "download-checkpoints", "/home/nonroot/main", "--out_path", "/home/nonroot/main/1/checkpoints", "--stage", "build"]
#12 CACHED
#13 \[final 7/8] COPY --link=true 1 /home/nonroot/main/1
#13 DONE 0.0s
#14 \[final 8/8] COPY --link=true requirements.txt config.yaml /home/nonroot/main/
#14 DONE 0.0s
#15 \[auth] sharing credentials for 891377382885.dkr.ecr.us-east-1.amazonaws.com
#15 DONE 0.0s
#16 exporting to image
#16 exporting layers done
#16 exporting manifest sha256:************f4b0 done
#16 exporting config sha256:************2c6f done
#16 pushing layers
#16 pushing layers 1.2s done
#16 pushing manifest for ****/prod/pytorch:****0e1030bc2217@sha256:************f4b0
#16 pushing manifest for ****/prod/pytorch:****0e1030bc2217@sha256:************f4b0 0.4s done
#16 DONE 1.6s
2025-04-22 10:11:47.054550 INFO: Done building image!!! | thread=13964
[INFO] 13:11:52.614363 #16 pushing manifest for ****/prod/pytorch:****0e1030bc2217@sha256:************f4b0 0.4s done
#16 DONE 1.6s
2025-04-22 10:11:47.054550 INFO: Done building image!!! | thread=13964
[INFO] 13:11:54.358104 Model build complete! | thread=13964
[INFO] 13:11:54.359987 Build time elapsed 8.8s) | thread=13964
[INFO] 13:11:54.360985 Check out the model at https://clarifai.com/alfrick/upload-models-2/models/test34 version: 2eeea43632294240995c0e1030bc2217 | thread=13964
Note: If you make any changes to your model and upload it again to the Clarifai platform, a new version of the model will be created automatically.
Step 6: Predict With Model
Once the model is successfully uploaded to Clarifai, you can start making predictions with it.
Note that before making a prediction request with our Compute Orchestration capabilities, you need to first deploy it into a cluster and nodepool you've created.
Examples
You can find various up-to-date model upload examples here, which demonstrate different use cases and optimizations.
Llama-3.2-1B-Instruct
model.py
- Python
from typing import List, Iterator
from threading import Thread
import os
import torch
from clarifai.runners.models.model_class import ModelClass
from clarifai.utils.logging import logger
from clarifai.runners.models.model_builder import ModelBuilder
from clarifai.runners.utils.openai_convertor import openai_response
from transformers import (AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer)
class MyModel(ModelClass):
"""A custom runner for llama-3.2-1b-instruct llm that integrates with the Clarifai platform"""
def load_model(self):
"""Load the model here."""
self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
logger.info(f"Running on device: {self.device}")
# Load checkpoints
model_path = os.path.dirname(os.path.dirname(__file__))
builder = ModelBuilder(model_path, download_validation_only=True)
self.checkpoints = builder.download_checkpoints(stage="runtime")
# Load model and tokenizer
self.tokenizer = AutoTokenizer.from_pretrained(self.checkpoints,)
self.tokenizer.pad_token = self.tokenizer.eos_token # Set pad token to eos token
self.model = AutoModelForCausalLM.from_pretrained(
self.checkpoints,
low_cpu_mem_usage=True,
device_map=self.device,
torch_dtype=torch.bfloat16,
)
self.streamer = TextIteratorStreamer(tokenizer=self.tokenizer,)
self.chat_template = None
logger.info("Done loading!")
@ModelClass.method
def predict(self,
prompt: str ="",
chat_history: List[dict] = None,
max_tokens: int = 512,
temperature: float = 0.7,
top_p: float = 0.8) -> str:
"""
Predict the response for the given prompt and chat history using the model.
"""
# Construct chat-style messages
messages = chat_history if chat_history else []
if prompt:
messages.append({
"role": "user",
"content": [{"type": "text", "text": prompt}]
})
inputs = self.tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(self.model.device)
generation_kwargs = {
"input_ids": inputs["input_ids"],
"do_sample": True,
"max_new_tokens": max_tokens,
"temperature": temperature,
"top_p": top_p,
"eos_token_id": self.tokenizer.eos_token_id,
}
output = self.model.generate(**generation_kwargs)
generated_tokens = output[0][inputs["input_ids"].shape[-1]:]
return self.tokenizer.decode(generated_tokens, skip_special_tokens=True)
@ModelClass.method
def generate(self,
prompt: str="",
chat_history: List[dict] = None,
max_tokens: int = 512,
temperature: float = 0.7,
top_p: float = 0.8) -> Iterator[str]:
"""Stream generated text tokens from a prompt + optional chat history."""
# Construct chat-style messages
messages = chat_history if chat_history else []
if prompt:
messages.append({
"role": "user",
"content": [{"type": "text", "text": prompt}]
})
response = self.chat(
messages=messages,
max_tokens=max_tokens,
temperature=temperature,
top_p=top_p
)
for each in response:
yield each['choices'][0]['delta']['content']
@ModelClass.method
def chat(self,
messages: List[dict],
max_tokens: int = 512,
temperature: float = 0.7,
top_p: float = 0.8) -> Iterator[dict]:
"""
Stream back JSON dicts for assistant messages.
Example return format:
{"role": "assistant", "content": [{"type": "text", "text": "response here"}]}
"""
# Tokenize using chat template
inputs = self.tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to(self.model.device)
generation_kwargs = {
"input_ids": inputs["input_ids"],
"do_sample": True,
"max_new_tokens": max_tokens,
"temperature": temperature,
"top_p": top_p,
"eos_token_id": self.tokenizer.eos_token_id,
"streamer": self.streamer
}
thread = Thread(target=self.model.generate, kwargs=generation_kwargs)
thread.start()
# Accumulate response text
for token_text in self.streamer:
yield openai_response(token_text)
thread.join()
def test(self):
"""Test the model here."""
try:
print("Testing predict...")
# Test predict
print(self.predict(prompt="What is the capital of India?",))
except Exception as e:
print("Error in predict", e)
try:
print("Testing generate...")
# Test generate
for each in self.generate(prompt="What is the capital of India?",):
print(each, end="")
print()
except Exception as e:
print("Error in generate", e)
try:
print("Testing chat...")
messages = [
{"role": "system", "content": "You are an helpful assistant."},
{"role": "user", "content": "What is the capital of India?"},
]
for each in self.chat(messages=messages,):
print(each, end="")
print()
except Exception as e:
print("Error in generate", e)
requirements.txt
- Text
torch==2.5.1
tokenizers>=0.21.0
transformers>=4.47.0
accelerate>=1.2.0
scipy==1.10.1
optimum>=1.23.3
protobuf==5.27.3
einops>=0.8.0
requests==2.32.3
clarifai>=11.3.0
config.yaml
- YAML
model:
id: "llama_3_2_1b_instruct"
user_id: "user_id"
app_id: "app_id"
model_type_id: "text-to-text"
build_info:
python_version: "3.11"
inference_compute_info:
cpu_limit: "1"
cpu_memory: "13Gi"
num_accelerators: 1
accelerator_type: ["NVIDIA-*"]
accelerator_memory: "18Gi"
checkpoints:
type: "huggingface"
repo_id: "unsloth/Llama-3.2-1B-Instruct"
hf_token: "hf_token"
when: "runtime"
NSFW Image Classifier
model.py
- Python
import os
import tempfile
from typing import List, Iterator
from io import BytesIO
import cv2
import torch
from transformers import AutoModelForImageClassification, ViTImageProcessor
from clarifai.runners.models.model_class import ModelClass
from clarifai.runners.utils.data_types import Image, Concept, Video
from clarifai.runners.models.model_builder import ModelBuilder
from PIL import Image as PILImage
def video_to_frames(video_bytes):
"""Convert video bytes to frames."""
frames = []
with tempfile.NamedTemporaryFile(delete=False, suffix=".mp4") as temp_video_file:
temp_video_file.write(video_bytes)
temp_video_path = temp_video_file.name
video = cv2.VideoCapture(temp_video_path)
while video.isOpened():
ret, frame = video.read()
if not ret:
break
frame_bytes = cv2.imencode('.jpg', frame)[1].tobytes()
frames.append(frame_bytes)
video.release()
return frames
def preprocess_image(image_bytes):
"""Convert image bytes into RGB format suitable for model processing
Args:
image_bytes: Raw image data in bytes format
Returns:
PIL Image object in RGB format ready for model input
"""
return PILImage.open(BytesIO(image_bytes)).convert("RGB")
def process_concepts( logits, model_labels):
"""Process logits and map them to concepts."""
outputs = []
for logit in logits:
probs = torch.softmax(logit, dim=-1)
sorted_indices = torch.argsort(probs, dim=-1, descending=True)
output_concepts = []
for idx in sorted_indices:
concept = Concept(id = model_labels[idx.item()],name=model_labels[idx.item()], value=probs[idx].item())
output_concepts.append(concept)
outputs.append(output_concepts)
return outputs
class ImageClassifierModel(ModelClass):
"""A custom runner that classifies images and outputs concepts."""
def load_model(self):
"""Load the model and processor."""
model_path = os.path.dirname(os.path.dirname(__file__))
builder = ModelBuilder(model_path, download_validation_only=True)
checkpoints = builder.download_checkpoints(stage="runtime")
self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
self.model = AutoModelForImageClassification.from_pretrained(checkpoints,).to(self.device)
self.model_labels = self.model.config.id2label
self.processor = ViTImageProcessor.from_pretrained(checkpoints)
@ModelClass.method
def predict(self, image: Image) -> List[List[Concept]]:
"""Predict concepts for a list of images."""
pil_image = preprocess_image(image.bytes)
inputs = self.processor(images=pil_image, return_tensors="pt")
inputs = {name: tensor.to(self.device) for name, tensor in inputs.items()}
with torch.no_grad():
logits = self.model(**inputs).logits
return process_concepts(logits, self.model_labels)
@ModelClass.method
def generate(self, video: Video) -> Iterator[List[Concept]]:
"""Generate concepts for frames extracted from a video."""
video_bytes = video.bytes
frame_generator = video_to_frames(video_bytes)
for frame in frame_generator:
image = preprocess_image(frame)
inputs = self.processor(images=image, return_tensors="pt")
inputs = {name: tensor.to(self.device) for name, tensor in inputs.items()}
with torch.no_grad():
logits = self.model(**inputs).logits
yield process_concepts(logits, self.model_labels) # Yield concepts for each frame
@ModelClass.method
def stream_image(self, image_stream: Iterator[Image]) -> Iterator[List[Concept]]:
"""Stream process image inputs."""
for image in image_stream:
result = self.predict(image)
yield result
@ModelClass.method
def stream_video(self, video_stream: Iterator[Video]) -> Iterator[List[Concept]]:
"""Stream process video inputs."""
for video in video_stream:
for frame_result in self.generate(video):
yield frame_result
requirements.txt
- Text
torch==2.5.1
transformers>=4.47.0
pillow==10.4.0
requests==2.32.3
timm==1.0.12
opencv-python-headless==4.10.0.84
numpy
aiohttp
clarifai>=11.3.0
clarifai-protocol>=0.0.20
config.yaml
- YAML
model:
id: model_id
user_id: user_id
app_id: app_id
model_type_id: visual-classifier
build_info:
python_version: '3.11'
inference_compute_info:
cpu_limit: '2'
cpu_memory: 2Gi
num_accelerators: 1
accelerator_type:
- NVIDIA-A10G
accelerator_memory: 3Gi
checkpoints:
type: huggingface
repo_id: Falconsai/nsfw_image_detection
hf_token: hf_token
DETR Resnet Image Detector
model.py
- Python
# Standard library imports
import os
import tempfile
import time
from io import BytesIO
from typing import List, Dict, Any, Iterator
# Third-party imports
import cv2
import torch
from PIL import Image as PILImage
from transformers import DetrForObjectDetection, DetrImageProcessor
# Clarifai imports
from clarifai.runners.models.model_class import ModelClass
from clarifai.runners.models.model_builder import ModelBuilder
from clarifai.runners.utils.data_types import Concept, Image, Video, Region
from clarifai.utils.logging import logger
def preprocess_image(image_bytes: bytes) -> PILImage:
"""Convert image bytes into RGB format suitable for model processing.
Args:
image_bytes: Raw image data in bytes format
Returns:
PIL Image object in RGB format ready for model input
"""
return PILImage.open(BytesIO(image_bytes)).convert("RGB")
def detect_objects(
images: List[PILImage],
model: DetrForObjectDetection,
processor: DetrImageProcessor,
device: str
) -> Dict[str, Any]:
"""Process images through the DETR model to detect objects.
Args:
images: List of preprocessed images
model: DETR model instance
processor: Image processor for DETR
device: Computation device (CPU/GPU)
Returns:
Detection results from the model
"""
model_inputs = processor(images=images, return_tensors="pt").to(device)
model_inputs = {name: tensor.to(device) for name, tensor in model_inputs.items()}
model_output = model(**model_inputs)
results = processor.post_process_object_detection(model_output)
return results
def process_detections(
results: List[Dict[str, torch.Tensor]],
images: List[PILImage],
threshold: float,
model_labels: Dict[int, str]
) -> List[List[Region]]:
"""Convert model outputs into a structured format of detections.
Args:
results: Raw detection results from model
images: Original input images
threshold: Confidence threshold for detections
model_labels: Dictionary mapping label indices to names
Returns:
List of lists containing Region objects for each detection
"""
outputs = []
for i, result in enumerate(results):
image = images[i]
detections = []
for score, label_idx, box in zip(result["scores"], result["labels"], result["boxes"]):
if score > threshold:
label = model_labels[label_idx.item()]
detections.append(
Region(
box=box.tolist(),
concepts=[Concept(id=label, name=label, value=score.item())]
)
)
outputs.append(detections)
return outputs
def video_to_frames(video_bytes: bytes) -> Iterator[bytes]:
"""Convert video bytes to frames.
Args:
video_bytes: Raw video data in bytes
Yields:
JPEG encoded frame data as bytes
"""
with tempfile.NamedTemporaryFile(delete=False, suffix=".mp4") as temp_video_file:
temp_video_file.write(video_bytes)
temp_video_path = temp_video_file.name
logger.info(f"temp_video_path: {temp_video_path}")
video = cv2.VideoCapture(temp_video_path)
logger.info(f"video opened: {video.isOpened()}")
while video.isOpened():
ret, frame = video.read()
if not ret:
break
frame_bytes = cv2.imencode('.jpg', frame)[1].tobytes()
yield frame_bytes
video.release()
os.unlink(temp_video_path)
class MyRunner(ModelClass):
"""A custom runner for DETR object detection model that processes images and videos"""
def load_model(self):
"""Load the model here."""
model_path = os.path.dirname(os.path.dirname(__file__))
builder = ModelBuilder(model_path, download_validation_only=True)
checkpoint_path = builder.download_checkpoints(stage="runtime")
self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
logger.info(f"Running on device: {self.device}")
self.model = DetrForObjectDetection.from_pretrained(checkpoint_path).to(self.device)
self.processor = DetrImageProcessor.from_pretrained(checkpoint_path)
self.model.eval()
self.threshold = 0.9
self.model_labels = self.model.config.id2label
logger.info("Done loading!")
@ModelClass.method
def predict(self, image: Image) -> List[Region]:
"""Process a single image and return detected objects."""
image_bytes = image.bytes
image = preprocess_image(image_bytes)
with torch.no_grad():
results = detect_objects([image], self.model, self.processor, self.device)
outputs = process_detections(results, [image], self.threshold, self.model_labels)
return outputs[0] # Return detections for single image
@ModelClass.method
def generate(self, video: Video) -> Iterator[List[Region]]:
"""Process video frames and yield detected objects for each frame."""
video_bytes = video.bytes
frame_generator = video_to_frames(video_bytes)
for frame in frame_generator:
image = preprocess_image(frame)
with torch.no_grad():
results = detect_objects([image], self.model, self.processor, self.device)
outputs = process_detections(results, [image], self.threshold, self.model_labels)
yield outputs[0] # Yield detections for each frame
@ModelClass.method
def stream_image(self, image_stream: Iterator[Image]) -> Iterator[List[Region]]:
"""Stream process image inputs."""
logger.info("Starting stream processing for images")
for image in image_stream:
start_time = time.time()
result = self.predict(image)
yield result
logger.info(f"Processing time: {time.time() - start_time:.3f}s")
@ModelClass.method
def stream_video(self, video_stream: Iterator[Video]) -> Iterator[List[Region]]:
"""Stream process video inputs."""
logger.info("Starting stream processing for videos")
for video in video_stream:
start_time = time.time()
for frame_result in self.generate(video):
yield frame_result
logger.info(f"Processing time: {time.time() - start_time:.3f}s")
def test(self):
"""Test the model functionality."""
import requests # Import moved here as it's only used for testing
# Test configuration
TEST_URLS = {
"images": [
"https://samples.clarifai.com/metro-north.jpg",
"https://samples.clarifai.com/dog.tiff"
],
"video": "https://samples.clarifai.com/beer.mp4"
}
def get_test_data(url):
return Image(bytes=requests.get(url).content)
def get_test_video():
return Video(bytes=requests.get(TEST_URLS["video"]).content)
def run_test(name, test_fn):
logger.info(f"\nTesting {name}...")
try:
test_fn()
logger.info(f"{name} test completed successfully")
except Exception as e:
logger.error(f"Error in {name} test: {e}")
# Test predict
def test_predict():
result = self.predict(get_test_data(TEST_URLS["images"][0]))
logger.info(f"Predict result: {result}")
# Test generate
def test_generate():
for detections in self.generate(get_test_video()):
logger.info(f"First frame detections: {detections}")
break
# Test stream
def test_stream():
# Split into two separate test functions for clarity
def test_stream_image():
images = [get_test_data(url) for url in TEST_URLS["images"]]
for result in self.stream_image(iter(images)):
logger.info(f"Image stream result: {result}")
def test_stream_video():
for result in self.stream_video(iter([get_test_video()])):
logger.info(f"Video stream result: {result}")
break # Just test first frame
logger.info("\nTesting image streaming...")
test_stream_image()
logger.info("\nTesting video streaming...")
test_stream_video()
# Run all tests
for test_name, test_fn in [
("predict", test_predict),
("generate", test_generate),
("stream", test_stream)
]:
run_test(test_name, test_fn)
requirements.txt
- Text
torch==2.5.1
transformers>=4.47.0
pillow==10.4.0
requests==2.32.3
timm==1.0.12
opencv-python-headless==4.10.0.84
clarifai>=11.3.0
config.yaml
- YAML
# This is the sample config file for the image-detection model.
model:
id: "detr-resnet-50"
user_id: "user_id"
app_id: "app_id"
model_type_id: "visual-detector"
build_info:
python_version: "3.11"
inference_compute_info:
cpu_limit: "4"
cpu_memory: "2Gi"
num_accelerators: 1
accelerator_type: ["NVIDIA-*"]
accelerator_memory: "5Gi"
checkpoints:
type: "huggingface"
repo_id: "facebook/detr-resnet-50"
hf_token: "hf_token"