Skip to main content

Model Uploading

Import custom models, including from external sources like Hugging Face and OpenAI




The Clarifai Python SDK allows you to upload custom models easily. Whether you're working with a pre-trained model from an external source like Hugging Face or OpenAI, or one you've built from scratch, Clarifai allows seamless integration of your models, enabling you to take advantage of the platform’s powerful capabilities.

Once imported to our platform, your model can be utilized alongside Clarifai's vast suite of AI tools. It will be automatically deployed and ready to be evaluated, combined with other models and agent operators in a workflow, or used to serve inference requests as it is.

Let’s demonstrate how you can successfully upload different types of models to the Clarifai platform.

tip

You can explore this repository for examples on uploading different model types.

Prerequisites

Set up Docker or a Virtual Environment

To test, run, and upload your model, you need to set up either a Docker container or a Python virtual environment. This ensures proper dependency management and prevents conflicts in your project.

Both options allow you to work with different Python versions. For example, you can use Python 3.11 for uploading one model and Python 3.12 for another — configured via the config.yaml file.

If Docker is installed on your system, it is highly recommended to use it for running the model. Docker provides better isolation and a fully portable environment, including for Python and system libraries.

You should ensure your local environment has sufficient memory and compute resources to handle model loading and execution, especially during testing.

Install Clarifai Package

Install the latest version of the clarifai Python package. This will also install the Clarifai Command Line Interface (CLI), which we'll use for testing and uploading the model.

 pip install --upgrade clarifai 

Set a PAT Key

You need to set the CLARIFAI_PAT (Personal Access Token) as an environment variable. You can generate the PAT key in your personal settings page by navigating to the Security section.

This token is essential for authenticating your connection to the Clarifai platform.

 export CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE 
tip

On Windows, the Clarifai Python SDK expects a HOME environment variable, which isn’t set by default. To ensure compatibility with file paths used by the SDK, set HOME to the value of your USERPROFILE. You can set it in your Command Prompt this way: set HOME=%USERPROFILE%.

Create Project Directory

Create a project directory and organize your files as indicated below to fit the requirements of uploading models to the Clarifai platform.

your_model_directory/
├── 1/
│ └── model.py
├── requirements.txt
└── config.yaml
  • your_model_directory/ – The main directory containing your model files.
    • 1/ – A subdirectory that holds the model file (Note that the folder is named as 1).
      • model.py – Contains the code that defines your model, including loading the model and running inference.
    • requirements.txt – Lists the Python libraries and dependencies required to run your model.
    • config.yaml – Contains model metadata and configuration details necessary for building the Docker image, defining compute resources, and uploading the model to Clarifai.

How to Upload a Model

Let's talk about the general steps you'd follow to upload any type of model to the Clarifai platform.

You can refer to the examples below to help you configure your files correctly.

Step 1: Prepare the config.yaml File

The config.yaml file is essential for specifying the model’s metadata, compute resource requirements, and model checkpoints.

Here’s a breakdown of the key sections in the file.

Model Info

This section defines your model ID, Clarifai user ID, and Clarifai app ID, which will determine where the model is uploaded on the Clarifai platform.

model:
id: "model_id"
user_id: "user_id"
app_id: "app_id"
model_type_id: "text-to-text" # Change this based on your model type (e.g., image-classifier, text-to-text)

Build Info

This section specifies details about the environment used to build or run the model. You can include the python_version, which is useful for ensuring compatibility between the model and its runtime environment, as different Python versions may have varying dependencies, library support, and performance characteristics.

note

We currently support Python 3.11 and Python 3.12 (default).

build_info:
python_version: "3.11"

Compute Resources

Here, you define the minimum compute resources required for running your model, including CPU, memory, and optional GPU specifications.

inference_compute_info:
cpu_limit: "2"
cpu_memory: "13Gi"
num_accelerators: 1
accelerator_type: ["NVIDIA-A10G"] # Specify the GPU type if needed
accelerator_memory: "15Gi"
  • cpu_limit – Number of CPUs allocated for the model (follows Kubernetes notation, e.g., "1", "2").
  • cpu_memory – Minimum memory required for the CPU (uses Kubernetes notation, e.g., "1Gi", "1500Mi", "3Gi").
  • num_accelerators – Number of GPUs or TPUs to use for inference.
  • accelerator_type – Specifies the type of hardware accelerators (e.g., GPU or TPU) supported by the model (e.g., "NVIDIA-A10G"). Note that instead of specifying an exact accelerator type, you can use a wildcard (*) to automatically match all available accelerators that fit your use case. For example, using ["NVIDIA-*"] will enable the system to choose from all NVIDIA options compatible with your model.
  • accelerator_memory – Minimum memory required for the GPU or TPU.

Hugging Face Model Checkpoints

If you're using a model from Hugging Face, you can automatically download its checkpoints by specifying the appropriate configuration in this section. For private or restricted Hugging Face repositories, include an access token.

checkpoints:
type: "huggingface"
repo_id: "meta-llama/Meta-Llama-3-8B-Instruct"
when: "runtime"
hf_token: "your_hf_token" # Required for private models
note

The when parameter in the checkpoints section determines when model checkpoints should be downloaded and stored. It must be set to one of the following options:

  • runtime (default) – Downloads checkpoints when loading the model in the load_model method.
  • build – Downloads checkpoints during the image build process.
  • upload – Downloads checkpoints before uploading the model.

For larger models, we highly recommend downloading checkpoints at runtime. Doing so prevents unnecessary increases in Docker image size, which has some advantages:

  • Smaller image sizes
  • Faster build times
  • Quicker uploads and inference on the Clarifai platform

Downloading checkpoints at build or upload time can significantly increase image size, resulting in longer upload times and increased cold start latency.

Model Concepts or Labels

This section is required if your model outputs concepts or labels and is not being directly loaded from Hugging Face. So, you must define a concepts section in the config.yaml file.

The following model types output concepts or labels:

  • visual-classifier
  • visual-detector
  • visual-segmenter
  • text-classifier
concepts:
- id: '0'
name: bus
- id: '1'
name: person
- id: '2'
name: bicycle
- id: '3'
name: car
note

If you're using a model from Hugging Face and the checkpoints section is defined, the Clarifai platform will automatically infer concepts. In this case, you don’t need to manually specify them.

Step 2: Define Dependencies in requirements.txt

The requirements.txt file lists all the Python dependencies your model needs. If your model requires Torch, we provide optimized pre-built Torch images as the base for machine learning and inference tasks.

These images include all necessary dependencies, ensuring efficient execution. The available pre-built Torch images are:

  • 2.4.1-py3.11-cuda124 — Based on PyTorch 2.4.1, Python 3.11, and CUDA 12.4.
  • 2.5.1-py3.11-cuda124 — Based on PyTorch 2.5.1, Python 3.11, and CUDA 12.4.
  • 2.4.1-py3.12-cuda124 — Based on PyTorch 2.4.1, Python 3.12, and CUDA 12.4.
  • 2.5.1-py3.12-cuda124 — Based on PyTorch 2.5.1, Python 3.12, and CUDA 12.4.

To use a specific Torch version, define it in your requirements.txt file like this:

torch==2.5.1

This ensures the correct pre-built image is pulled from Clarifai's container registry, ensuring the correct environment is used. This minimizes cold start times and speeds up model uploads and runtime execution — avoiding the overhead of building images from scratch or pulling and configuring them from external sources.

We recommend using either torch==2.5.1 or torch==2.4.1. If your model requires a different Torch version, you can specify it in requirements.txt, but this may slightly increase the model upload time.

Step 3: Prepare the model.py File

The model.py file contains the core logic for your model, including how the model is loaded and how predictions are made. This file must define a custom class that inherits from ModelClass and implements the required methods.

Each parameter in the class methods must be annotated with a type, and the return type must also be specified. Clarifai's model framework supports rich data typing for both inputs and outputs. Supported types include Text, Image, Audio, Video, and more.

To define a custom model, create a class that inherits from ModelClass and implements the following methods:

a. load_model Method

The load_model method is optional but recommended, as it prepares the model for inference by handling resource-heavy initializations. It is particularly useful for:

  • One-time setup of heavy resources, such as loading trained models or initializing data transformations.
  • Executing tasks during model container startup to reduce runtime latency.
  • Loading essential components like tokenizers, pipelines, and other model-related assets.

Here is an example:

def load_model(self):
self.tokenizer = AutoTokenizer.from_pretrained("model/")
self.pipeline = transformers.pipeline(...)

b. Prediction Methods

You need to include at least one method decorated with @ModelClass.method to define the prediction endpoints.

We support various methods of predictions based on type hints:

# Unary-Unary (Standard request-response)
@ModelClass.method
def predict(self, input: Image) -> Text

# Unary-Stream (Server-side streaming)
@ModelClass.method
def generate(self, prompt: Text) -> Stream[Text]

# Stream-Stream (Bidirectional streaming)
@ModelClass.method
def analyze_video(self, frames: Stream[Image]) -> Stream[str]

Here is an example of a model.py file.

from clarifai.runners.models.model_class import ModelClass
from clarifai.runners.utils.data_types import Stream, Text


class MyModel(ModelClass):
"""A custom runner that adds "Hello World" to the end of the text."""

def load_model(self):
"""Load the model here."""

@ModelClass.method
def predict(self, text1: Text = "") -> Text:
"""This is the method that will be called when the runner is run. It takes in an input and
returns an output.
"""

output_text = text1.text + "Hello World"

return Text(output_text)

@ModelClass.method
def generate(self, text1: Text = Text("")) -> Stream[Text]:
"""Example yielding a whole batch of streamed stuff back."""

for i in range(10): # fake something iterating generating 10 times.
output_text = text1.text + f"Generate Hello World {i}"
yield Text(output_text)

@ModelClass.method
def stream(self, input_iterator: Stream[Text]) -> Stream[Text]:
"""Example yielding a whole batch of streamed stuff back."""

for i, input in enumerate(input_iterator):
output_text = input.text + f"Stream Hello World {i}"
yield Text(output_text)
note

The structure of prediction methods on the client side directly mirrors the method signatures defined in your model.py file. This one-to-one mapping provides flexibility in defining prediction methods with varying names and arguments.

Here are some examples of method mapping:

model.py Model ImplementationClient-Side Usage Pattern
@ModelClass.method def predict(...)model.predict(...)
@ModelClass.method def generate(...)model.generate(...)
@ModelClass.method def stream(...)model.stream(...)

This design allows you to define any custom method with any number of parameters. For example, you could define a method like @ModelClass.method def analyze_video(...) in model.py, and then call it on the client side using model.analyze_video(...).

Here are some key characteristics of this design:

  • Method names must match exactly between model.py and client usage.

  • Parameters retain the same names and types as defined in your method.

  • Return types follow the structure defined by your model’s outputs.

You can find more details on making predictions here.

Step 4: Test the Model Locally

Before uploading your model to the Clarifai platform, it's important to test it locally to catch any typos or misconfigurations in the code.

Learn how to test your models locally here.

Step 5: Upload the Model to Clarifai

Once your model is ready, you can upload it to the platform using Clarifai CLI.

To upload your model, run the following command in your terminal:

 clarifai model upload ./your/model/path/here 

Alternatively, navigate to the directory containing your custom model and run the command without specifying the directory path:

 clarifai model upload 

This command builds the model’s Docker image using the defined compute resources and uploads it to Clarifai, where it can be served in production. The build logs will be displayed in your terminal, which helps you troubleshoot any upload issues.

Build Logs Example
[INFO] 13:11:29.227543 Validating folder: C:\Users\Alfrick\Desktop\delete1\ |  thread=13964
[INFO] 13:11:35.221354 Skipping downloading checkpoints for stage upload since config.yaml says to download them at stage runtime | thread=13964
[INFO] 13:11:35.288554 Using Python version 3.11 from the config file to build the Dockerfile | thread=13964
[INFO] 13:11:35.307556 Using Torch version 2.5.1 base image to build the Docker image | thread=13964
[WARNING] 13:11:35.308554 clarifai version not found in requirements.txt, using the latest version 11.2.3 | thread=13964
[WARNING] 13:11:35.319543 Updated requirements.txt to have clarifai==11.2.3 | thread=13964
[INFO] 13:11:35.637387 New model will be created at https://clarifai.com/alfrick/upload-models-2/models/test34 with it's first version. | thread=13964
Press Enter to continue...
[INFO] 13:11:44.497592 Uploading file... | thread=19308
[INFO] 13:11:44.499677 Upload complete! | thread=19308
Status: Upload done, Progress: 0% - Completed upload of files, initiating model version image build.. request_id:
Status: Model image is currently being built., Progress: 0% - Model version image is being built. request_id:
[INFO] 13:11:45.601654 Created Model Version ID: 2eeea43632294240995c0e1030bc2217 | thread=13964
[INFO] 13:11:45.602677 Full url to that version is: https://clarifai.com/alfrick/upload-models-2/models/test34 | thread=13964
[INFO] 13:11:50.934179 2025-04-22 10:11:43.505688 INFO: Downloading uploaded model from storage...

2025-04-22 10:11:44.177740 INFO: Done downloading model

2025-04-22 10:11:44.180182 INFO: Extracting upload...

2025-04-22 10:11:44.183856 INFO: Done extracting upload

2025-04-22 10:11:44.185767 INFO: Parsing requirements file for model version ID ****0e1030bc2217

2025-04-22 10:11:44.207227 INFO: Dockerfile found at /shared/context/Dockerfile

2025-04-22 10:11:45.064052 INFO: Setting up credentials

amazon-ecr-credential-helper

Version: 0.8.0

Git commit: ********

2025-04-22 10:11:45.067678 INFO: Building image...

#1 \[internal] load build definition from Dockerfile

#1 transferring dockerfile: 2.61kB done

#1 DONE 0.0s



#2 resolve image config for docker-image://docker.io/docker/dockerfile:1.13-labs

#2 DONE 0.1s



#3 docker-image://docker.io/docker/dockerfile:1.13-labs@sha256:************18b8

#3 resolve docker.io/docker/dockerfile:1.13-labs@sha256:************18b8 done

#3 CACHED



#4 \[internal] load metadata for public.ecr.aws/clarifai-models/torch:2.5.1-py3.11-cu124-********

#4 DONE 0.1s



#5 \[internal] load .dockerignore

#5 transferring context: 2B done

#5 DONE 0.0s



#6 \[final 1/8] FROM public.ecr.aws/clarifai-models/torch:2.5.1-py3.11-cu124-********@sha256:************ef64

#6 resolve public.ecr.aws/clarifai-models/torch:2.5.1-py3.11-cu124-********@sha256:************ef64 done

#6 DONE 0.0s



#7 \[internal] load build context

#7 transferring context: 7.49kB done

#7 DONE 0.0s



#8 \[final 3/8] RUN ["pip", "install", "--no-cache-dir", "-r", "/home/nonroot/requirements.txt"]

#8 CACHED



#9 \[final 5/8] COPY --chown=nonroot:nonroot downloader/unused.yaml /home/nonroot/main/1/checkpoints/.cache/unused.yaml

#9 CACHED



#10 \[final 2/8] COPY --link requirements.txt /home/nonroot/requirements.txt

#10 CACHED



#11 \[final 4/8] RUN ["pip", "show", "clarifai"]

#11 CACHED



#12 \[final 6/8] RUN ["python", "-m", "clarifai.cli", "model", "download-checkpoints", "/home/nonroot/main", "--out_path", "/home/nonroot/main/1/checkpoints", "--stage", "build"]

#12 CACHED



#13 \[final 7/8] COPY --link=true 1 /home/nonroot/main/1

#13 DONE 0.0s



#14 \[final 8/8] COPY --link=true requirements.txt config.yaml /home/nonroot/main/

#14 DONE 0.0s



#15 \[auth] sharing credentials for 891377382885.dkr.ecr.us-east-1.amazonaws.com

#15 DONE 0.0s



#16 exporting to image

#16 exporting layers done

#16 exporting manifest sha256:************f4b0 done

#16 exporting config sha256:************2c6f done

#16 pushing layers

#16 pushing layers 1.2s done

#16 pushing manifest for ****/prod/pytorch:****0e1030bc2217@sha256:************f4b0

#16 pushing manifest for ****/prod/pytorch:****0e1030bc2217@sha256:************f4b0 0.4s done

#16 DONE 1.6s

2025-04-22 10:11:47.054550 INFO: Done building image!!! | thread=13964
[INFO] 13:11:52.614363 #16 pushing manifest for ****/prod/pytorch:****0e1030bc2217@sha256:************f4b0 0.4s done

#16 DONE 1.6s

2025-04-22 10:11:47.054550 INFO: Done building image!!! | thread=13964
[INFO] 13:11:54.358104 Model build complete! | thread=13964
[INFO] 13:11:54.359987 Build time elapsed 8.8s) | thread=13964
[INFO] 13:11:54.360985 Check out the model at https://clarifai.com/alfrick/upload-models-2/models/test34 version: 2eeea43632294240995c0e1030bc2217 | thread=13964

Note: If you make any changes to your model and upload it again to the Clarifai platform, a new version of the model will be created automatically.

Step 6: Predict With Model

Once the model is successfully uploaded to Clarifai, you can start making predictions with it.

Note that before making a prediction request with our Compute Orchestration capabilities, you need to first deploy it into a cluster and nodepool you've created.

Examples

tip

You can find various up-to-date model upload examples here, which demonstrate different use cases and optimizations.

Llama-3.2-1B-Instruct

model.py

from typing import List, Iterator
from threading import Thread
import os
import torch

from clarifai.runners.models.model_class import ModelClass
from clarifai.utils.logging import logger
from clarifai.runners.models.model_builder import ModelBuilder
from clarifai.runners.utils.openai_convertor import openai_response
from transformers import (AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer)


class MyModel(ModelClass):
"""A custom runner for llama-3.2-1b-instruct llm that integrates with the Clarifai platform"""

def load_model(self):
"""Load the model here."""
self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
logger.info(f"Running on device: {self.device}")

# Load checkpoints
model_path = os.path.dirname(os.path.dirname(__file__))
builder = ModelBuilder(model_path, download_validation_only=True)
self.checkpoints = builder.download_checkpoints(stage="runtime")

# Load model and tokenizer
self.tokenizer = AutoTokenizer.from_pretrained(self.checkpoints,)
self.tokenizer.pad_token = self.tokenizer.eos_token # Set pad token to eos token
self.model = AutoModelForCausalLM.from_pretrained(
self.checkpoints,
low_cpu_mem_usage=True,
device_map=self.device,
torch_dtype=torch.bfloat16,
)
self.streamer = TextIteratorStreamer(tokenizer=self.tokenizer,)
self.chat_template = None
logger.info("Done loading!")

@ModelClass.method
def predict(self,
prompt: str ="",
chat_history: List[dict] = None,
max_tokens: int = 512,
temperature: float = 0.7,
top_p: float = 0.8) -> str:
"""
Predict the response for the given prompt and chat history using the model.
"""
# Construct chat-style messages
messages = chat_history if chat_history else []
if prompt:
messages.append({
"role": "user",
"content": [{"type": "text", "text": prompt}]
})

inputs = self.tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(self.model.device)

generation_kwargs = {
"input_ids": inputs["input_ids"],
"do_sample": True,
"max_new_tokens": max_tokens,
"temperature": temperature,
"top_p": top_p,
"eos_token_id": self.tokenizer.eos_token_id,
}

output = self.model.generate(**generation_kwargs)
generated_tokens = output[0][inputs["input_ids"].shape[-1]:]
return self.tokenizer.decode(generated_tokens, skip_special_tokens=True)

@ModelClass.method
def generate(self,
prompt: str="",
chat_history: List[dict] = None,
max_tokens: int = 512,
temperature: float = 0.7,
top_p: float = 0.8) -> Iterator[str]:
"""Stream generated text tokens from a prompt + optional chat history."""

# Construct chat-style messages
messages = chat_history if chat_history else []
if prompt:
messages.append({
"role": "user",
"content": [{"type": "text", "text": prompt}]
})

response = self.chat(
messages=messages,
max_tokens=max_tokens,
temperature=temperature,
top_p=top_p
)
for each in response:
yield each['choices'][0]['delta']['content']


@ModelClass.method
def chat(self,
messages: List[dict],
max_tokens: int = 512,
temperature: float = 0.7,
top_p: float = 0.8) -> Iterator[dict]:
"""
Stream back JSON dicts for assistant messages.
Example return format:
{"role": "assistant", "content": [{"type": "text", "text": "response here"}]}
"""

# Tokenize using chat template
inputs = self.tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to(self.model.device)

generation_kwargs = {
"input_ids": inputs["input_ids"],
"do_sample": True,
"max_new_tokens": max_tokens,
"temperature": temperature,
"top_p": top_p,
"eos_token_id": self.tokenizer.eos_token_id,
"streamer": self.streamer
}

thread = Thread(target=self.model.generate, kwargs=generation_kwargs)
thread.start()

# Accumulate response text
for token_text in self.streamer:
yield openai_response(token_text)

thread.join()


def test(self):
"""Test the model here."""
try:
print("Testing predict...")
# Test predict
print(self.predict(prompt="What is the capital of India?",))
except Exception as e:
print("Error in predict", e)

try:
print("Testing generate...")
# Test generate
for each in self.generate(prompt="What is the capital of India?",):
print(each, end="")
print()
except Exception as e:
print("Error in generate", e)

try:
print("Testing chat...")
messages = [
{"role": "system", "content": "You are an helpful assistant."},
{"role": "user", "content": "What is the capital of India?"},
]
for each in self.chat(messages=messages,):
print(each, end="")
print()
except Exception as e:
print("Error in generate", e)

requirements.txt

torch==2.5.1
tokenizers>=0.21.0
transformers>=4.47.0
accelerate>=1.2.0
scipy==1.10.1
optimum>=1.23.3
protobuf==5.27.3
einops>=0.8.0
requests==2.32.3
clarifai>=11.3.0

config.yaml

model:
id: "llama_3_2_1b_instruct"
user_id: "user_id"
app_id: "app_id"
model_type_id: "text-to-text"

build_info:
python_version: "3.11"

inference_compute_info:
cpu_limit: "1"
cpu_memory: "13Gi"
num_accelerators: 1
accelerator_type: ["NVIDIA-*"]
accelerator_memory: "18Gi"

checkpoints:
type: "huggingface"
repo_id: "unsloth/Llama-3.2-1B-Instruct"
hf_token: "hf_token"
when: "runtime"

NSFW Image Classifier

model.py

import os
import tempfile
from typing import List, Iterator
from io import BytesIO
import cv2
import torch
from transformers import AutoModelForImageClassification, ViTImageProcessor

from clarifai.runners.models.model_class import ModelClass
from clarifai.runners.utils.data_types import Image, Concept, Video
from clarifai.runners.models.model_builder import ModelBuilder

from PIL import Image as PILImage


def video_to_frames(video_bytes):
"""Convert video bytes to frames."""
frames = []
with tempfile.NamedTemporaryFile(delete=False, suffix=".mp4") as temp_video_file:
temp_video_file.write(video_bytes)
temp_video_path = temp_video_file.name

video = cv2.VideoCapture(temp_video_path)
while video.isOpened():
ret, frame = video.read()
if not ret:
break
frame_bytes = cv2.imencode('.jpg', frame)[1].tobytes()
frames.append(frame_bytes)
video.release()
return frames

def preprocess_image(image_bytes):
"""Convert image bytes into RGB format suitable for model processing
Args:
image_bytes: Raw image data in bytes format
Returns:
PIL Image object in RGB format ready for model input
"""
return PILImage.open(BytesIO(image_bytes)).convert("RGB")

def process_concepts( logits, model_labels):
"""Process logits and map them to concepts."""
outputs = []
for logit in logits:
probs = torch.softmax(logit, dim=-1)
sorted_indices = torch.argsort(probs, dim=-1, descending=True)
output_concepts = []
for idx in sorted_indices:
concept = Concept(id = model_labels[idx.item()],name=model_labels[idx.item()], value=probs[idx].item())
output_concepts.append(concept)
outputs.append(output_concepts)
return outputs


class ImageClassifierModel(ModelClass):
"""A custom runner that classifies images and outputs concepts."""

def load_model(self):
"""Load the model and processor."""

model_path = os.path.dirname(os.path.dirname(__file__))
builder = ModelBuilder(model_path, download_validation_only=True)
checkpoints = builder.download_checkpoints(stage="runtime")

self.device = 'cuda' if torch.cuda.is_available() else 'cpu'

self.model = AutoModelForImageClassification.from_pretrained(checkpoints,).to(self.device)
self.model_labels = self.model.config.id2label
self.processor = ViTImageProcessor.from_pretrained(checkpoints)

@ModelClass.method
def predict(self, image: Image) -> List[List[Concept]]:
"""Predict concepts for a list of images."""
pil_image = preprocess_image(image.bytes)
inputs = self.processor(images=pil_image, return_tensors="pt")
inputs = {name: tensor.to(self.device) for name, tensor in inputs.items()}
with torch.no_grad():
logits = self.model(**inputs).logits
return process_concepts(logits, self.model_labels)

@ModelClass.method
def generate(self, video: Video) -> Iterator[List[Concept]]:
"""Generate concepts for frames extracted from a video."""
video_bytes = video.bytes
frame_generator = video_to_frames(video_bytes)
for frame in frame_generator:
image = preprocess_image(frame)
inputs = self.processor(images=image, return_tensors="pt")
inputs = {name: tensor.to(self.device) for name, tensor in inputs.items()}
with torch.no_grad():
logits = self.model(**inputs).logits
yield process_concepts(logits, self.model_labels) # Yield concepts for each frame


@ModelClass.method
def stream_image(self, image_stream: Iterator[Image]) -> Iterator[List[Concept]]:
"""Stream process image inputs."""
for image in image_stream:
result = self.predict(image)
yield result

@ModelClass.method
def stream_video(self, video_stream: Iterator[Video]) -> Iterator[List[Concept]]:
"""Stream process video inputs."""
for video in video_stream:
for frame_result in self.generate(video):
yield frame_result

requirements.txt

torch==2.5.1
transformers>=4.47.0
pillow==10.4.0
requests==2.32.3
timm==1.0.12
opencv-python-headless==4.10.0.84
numpy
aiohttp
clarifai>=11.3.0
clarifai-protocol>=0.0.20

config.yaml

model:
id: model_id
user_id: user_id
app_id: app_id
model_type_id: visual-classifier
build_info:
python_version: '3.11'
inference_compute_info:
cpu_limit: '2'
cpu_memory: 2Gi
num_accelerators: 1
accelerator_type:
- NVIDIA-A10G
accelerator_memory: 3Gi
checkpoints:
type: huggingface
repo_id: Falconsai/nsfw_image_detection
hf_token: hf_token

DETR Resnet Image Detector

model.py

# Standard library imports
import os
import tempfile
import time
from io import BytesIO
from typing import List, Dict, Any, Iterator

# Third-party imports
import cv2
import torch
from PIL import Image as PILImage
from transformers import DetrForObjectDetection, DetrImageProcessor

# Clarifai imports
from clarifai.runners.models.model_class import ModelClass
from clarifai.runners.models.model_builder import ModelBuilder
from clarifai.runners.utils.data_types import Concept, Image, Video, Region
from clarifai.utils.logging import logger


def preprocess_image(image_bytes: bytes) -> PILImage:
"""Convert image bytes into RGB format suitable for model processing.

Args:
image_bytes: Raw image data in bytes format

Returns:
PIL Image object in RGB format ready for model input
"""
return PILImage.open(BytesIO(image_bytes)).convert("RGB")


def detect_objects(
images: List[PILImage],
model: DetrForObjectDetection,
processor: DetrImageProcessor,
device: str
) -> Dict[str, Any]:
"""Process images through the DETR model to detect objects.

Args:
images: List of preprocessed images
model: DETR model instance
processor: Image processor for DETR
device: Computation device (CPU/GPU)

Returns:
Detection results from the model
"""
model_inputs = processor(images=images, return_tensors="pt").to(device)
model_inputs = {name: tensor.to(device) for name, tensor in model_inputs.items()}
model_output = model(**model_inputs)
results = processor.post_process_object_detection(model_output)
return results


def process_detections(
results: List[Dict[str, torch.Tensor]],
images: List[PILImage],
threshold: float,
model_labels: Dict[int, str]
) -> List[List[Region]]:
"""Convert model outputs into a structured format of detections.

Args:
results: Raw detection results from model
images: Original input images
threshold: Confidence threshold for detections
model_labels: Dictionary mapping label indices to names

Returns:
List of lists containing Region objects for each detection
"""
outputs = []
for i, result in enumerate(results):
image = images[i]
detections = []
for score, label_idx, box in zip(result["scores"], result["labels"], result["boxes"]):
if score > threshold:
label = model_labels[label_idx.item()]
detections.append(
Region(
box=box.tolist(),
concepts=[Concept(id=label, name=label, value=score.item())]
)
)
outputs.append(detections)
return outputs


def video_to_frames(video_bytes: bytes) -> Iterator[bytes]:
"""Convert video bytes to frames.

Args:
video_bytes: Raw video data in bytes

Yields:
JPEG encoded frame data as bytes
"""
with tempfile.NamedTemporaryFile(delete=False, suffix=".mp4") as temp_video_file:
temp_video_file.write(video_bytes)
temp_video_path = temp_video_file.name
logger.info(f"temp_video_path: {temp_video_path}")

video = cv2.VideoCapture(temp_video_path)
logger.info(f"video opened: {video.isOpened()}")

while video.isOpened():
ret, frame = video.read()
if not ret:
break
frame_bytes = cv2.imencode('.jpg', frame)[1].tobytes()
yield frame_bytes

video.release()
os.unlink(temp_video_path)


class MyRunner(ModelClass):
"""A custom runner for DETR object detection model that processes images and videos"""

def load_model(self):
"""Load the model here."""
model_path = os.path.dirname(os.path.dirname(__file__))
builder = ModelBuilder(model_path, download_validation_only=True)
checkpoint_path = builder.download_checkpoints(stage="runtime")

self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
logger.info(f"Running on device: {self.device}")

self.model = DetrForObjectDetection.from_pretrained(checkpoint_path).to(self.device)
self.processor = DetrImageProcessor.from_pretrained(checkpoint_path)
self.model.eval()
self.threshold = 0.9
self.model_labels = self.model.config.id2label

logger.info("Done loading!")

@ModelClass.method
def predict(self, image: Image) -> List[Region]:
"""Process a single image and return detected objects."""
image_bytes = image.bytes
image = preprocess_image(image_bytes)

with torch.no_grad():
results = detect_objects([image], self.model, self.processor, self.device)
outputs = process_detections(results, [image], self.threshold, self.model_labels)
return outputs[0] # Return detections for single image

@ModelClass.method
def generate(self, video: Video) -> Iterator[List[Region]]:
"""Process video frames and yield detected objects for each frame."""
video_bytes = video.bytes
frame_generator = video_to_frames(video_bytes)
for frame in frame_generator:
image = preprocess_image(frame)
with torch.no_grad():
results = detect_objects([image], self.model, self.processor, self.device)
outputs = process_detections(results, [image], self.threshold, self.model_labels)
yield outputs[0] # Yield detections for each frame

@ModelClass.method
def stream_image(self, image_stream: Iterator[Image]) -> Iterator[List[Region]]:
"""Stream process image inputs."""
logger.info("Starting stream processing for images")
for image in image_stream:
start_time = time.time()
result = self.predict(image)
yield result
logger.info(f"Processing time: {time.time() - start_time:.3f}s")

@ModelClass.method
def stream_video(self, video_stream: Iterator[Video]) -> Iterator[List[Region]]:
"""Stream process video inputs."""
logger.info("Starting stream processing for videos")
for video in video_stream:
start_time = time.time()
for frame_result in self.generate(video):
yield frame_result
logger.info(f"Processing time: {time.time() - start_time:.3f}s")

def test(self):
"""Test the model functionality."""
import requests # Import moved here as it's only used for testing

# Test configuration
TEST_URLS = {
"images": [
"https://samples.clarifai.com/metro-north.jpg",
"https://samples.clarifai.com/dog.tiff"
],
"video": "https://samples.clarifai.com/beer.mp4"
}

def get_test_data(url):
return Image(bytes=requests.get(url).content)

def get_test_video():
return Video(bytes=requests.get(TEST_URLS["video"]).content)

def run_test(name, test_fn):
logger.info(f"\nTesting {name}...")
try:
test_fn()
logger.info(f"{name} test completed successfully")
except Exception as e:
logger.error(f"Error in {name} test: {e}")

# Test predict
def test_predict():
result = self.predict(get_test_data(TEST_URLS["images"][0]))
logger.info(f"Predict result: {result}")

# Test generate
def test_generate():
for detections in self.generate(get_test_video()):
logger.info(f"First frame detections: {detections}")
break

# Test stream
def test_stream():
# Split into two separate test functions for clarity
def test_stream_image():
images = [get_test_data(url) for url in TEST_URLS["images"]]
for result in self.stream_image(iter(images)):
logger.info(f"Image stream result: {result}")

def test_stream_video():
for result in self.stream_video(iter([get_test_video()])):
logger.info(f"Video stream result: {result}")
break # Just test first frame

logger.info("\nTesting image streaming...")
test_stream_image()
logger.info("\nTesting video streaming...")
test_stream_video()

# Run all tests
for test_name, test_fn in [
("predict", test_predict),
("generate", test_generate),
("stream", test_stream)
]:
run_test(test_name, test_fn)

requirements.txt

torch==2.5.1
transformers>=4.47.0
pillow==10.4.0
requests==2.32.3
timm==1.0.12
opencv-python-headless==4.10.0.84
clarifai>=11.3.0

config.yaml

# This is the sample config file for the image-detection model.

model:
id: "detr-resnet-50"
user_id: "user_id"
app_id: "app_id"
model_type_id: "visual-detector"

build_info:
python_version: "3.11"

inference_compute_info:
cpu_limit: "4"
cpu_memory: "2Gi"
num_accelerators: 1
accelerator_type: ["NVIDIA-*"]
accelerator_memory: "5Gi"


checkpoints:
type: "huggingface"
repo_id: "facebook/detr-resnet-50"
hf_token: "hf_token"