Skip to main content

Model Uploading

Import custom models, including from external sources like Hugging Face and OpenAI




The Clarifai Python SDK allows you to upload custom models easily. Whether you're working with a pre-trained model from an external source like Hugging Face or OpenAI, or one you've built from scratch, Clarifai allows seamless integration of your models, enabling you to take advantage of the platform’s powerful capabilities.

Once imported to our platform, your model can be utilized alongside Clarifai's vast suite of AI tools. It will be automatically deployed and ready to be evaluated, combined with other models and agent operators in a workflow, or used to serve inference requests as it is.

Objective

Let’s walk through how to build and upload a custom model to the Clarifai platform. This example model appends the phrase Hello World to any input text and also supports streaming responses. You can test the already uploaded model here.

tip

You can explore this repository for examples on uploading different model types.

Prerequisites

Set up Docker or a Virtual Environment

To test, run, and upload your model, you need to set up either a Docker container or a Python virtual environment. This ensures proper dependency management and prevents conflicts in your project.

Both options allow you to work with different Python versions. For example, you can use Python 3.11 for uploading one model and Python 3.12 for another — configured via the config.yaml file.

If Docker is installed on your system, it is highly recommended to use it for running the model. Docker provides better isolation and a fully portable environment, including for Python and system libraries.

You should ensure your local environment has sufficient memory and compute resources to handle model loading and execution, especially during testing.

Install Clarifai Package

Install the latest version of the clarifai Python package. This will also install the Clarifai Command Line Interface (CLI), which we'll use for testing and uploading the model.

 pip install --upgrade clarifai 

Set a PAT Key

You need to set the CLARIFAI_PAT (Personal Access Token) as an environment variable. You can generate the PAT key in your personal settings page by navigating to the Security section.

This token is essential for authenticating your connection to the Clarifai platform.

 export CLARIFAI_PAT=YOUR_PERSONAL_ACCESS_TOKEN_HERE 
tip

On Windows, the Clarifai Python SDK expects a HOME environment variable, which isn’t set by default. To ensure compatibility with file paths used by the SDK, set HOME to the value of your USERPROFILE. You can set it in your Command Prompt this way: set HOME=%USERPROFILE%.

Create Project Directory

Create a project directory and organize your files as indicated below to fit the requirements of uploading models to the Clarifai platform.

your_model_directory/
├── 1/
│ └── model.py
├── requirements.txt
└── config.yaml
  • your_model_directory/ – The root directory containing all files related to your custom model.
    • 1/ – A subdirectory that holds the model file (Note that the folder is named as 1).
      • model.py – Contains the code that defines your model, including loading the model and running inference.
    • requirements.txt – Lists the Python dependencies required to run your model.
    • config.yaml – Contains model metadata and configuration details necessary for building the Docker image, defining compute resources, and uploading the model to Clarifai.

How to Upload a Model

Let's talk about the general steps you'd follow to upload any type of model to the Clarifai platform.

Step 1: Prepare the model.py File

The model.py file contains the core logic for your model, including how the model is loaded and how predictions are made. This file must define a custom class that inherits from ModelClass and implements the required methods.

This is the model.py file for the custom model we want to upload:

from clarifai.runners.models.model_class import ModelClass
from clarifai.runners.utils.data_types import Text
from typing import Iterator


class MyModel(ModelClass):
"""A custom runner that adds "Hello World" to the end of the text."""

def load_model(self):
"""Load the model here."""

@ModelClass.method
def predict(self, text1: Text = "") -> Text:
"""This is the method that will be called when the runner is run. It takes in an input and
returns an output.
"""

output_text = text1.text + " Hello World!"

return Text(output_text)

@ModelClass.method
def generate(self, text1: Text = Text("")) -> Iterator[Text]:
"""Example yielding a whole batch of streamed stuff back."""

for i in range(10): # fake something iterating generating 10 times.
output_text = text1.text + f"Generate Hello World {i}"
yield Text(output_text)

@ModelClass.method
def stream(self, input_iterator: Iterator[Text]) -> Iterator[Text]:
"""Example yielding a whole batch of streamed stuff back."""

for i, input in enumerate(input_iterator):
output_text = input.text + f"Stream Hello World {i}"
yield Text(output_text)

Let’s break down what each part of the file does.

a. load_model Method

The load_model method is optional but recommended, as it prepares the model for inference by handling resource-heavy initializations. It is particularly useful for:

  • One-time setup of heavy resources, such as loading trained models or initializing data transformations.
  • Executing tasks during model container startup to reduce runtime latency.
  • Loading essential components like tokenizers, pipelines, and other model-related assets.

Here is an example:

def load_model(self):
self.tokenizer = AutoTokenizer.from_pretrained("model/")
self.pipeline = transformers.pipeline(...)

b. Prediction Methods

The model.py file must include at least one method decorated with @ModelClass.method to define the prediction endpoints.

In the example model we want to upload, we defined a method that appends the phrase Hello World to any input text and added support for different types of streaming responses.

Note: The structure of prediction methods on the client side directly mirrors the method signatures defined in your model.py file. This one-to-one mapping provides flexibility in defining prediction methods with varying names and arguments.

Here are some examples of method mapping:

model.py Model ImplementationClient-Side Usage Pattern
@ModelClass.method def predict(...)model.predict(...)
@ModelClass.method def generate(...)model.generate(...)
@ModelClass.method def stream(...)model.stream(...)

You can learn more about the structure of prediction methods here.

Each parameter in the class methods must be annotated with a type, and the return type must also be specified. Clarifai's model framework supports rich data typing for both inputs and outputs. Supported types include Text, Image, Audio, Video, and more.

Step 2: Prepare the config.yaml File

The config.yaml file is essential for specifying the model’s metadata, compute resource requirements, and model checkpoints.

This is the config.yaml file for the custom model we want to upload:

model:
id: "my-uploaded-model"
user_id: "YOUR_USER_ID_HERE"
app_id: "YOUR_APP_ID_HERE"
model_type_id: "text-to-text"

build_info:
python_version: "3.11"

inference_compute_info:
cpu_limit: "1"
cpu_memory: "13Gi"
num_accelerators: 1
accelerator_type: ["NVIDIA-*"]
accelerator_memory: "15Gi"

Let’s break down what each part of the file does.

Model Info

This section defines your model ID, Clarifai user ID, and Clarifai app ID, which will determine where the model is uploaded on the Clarifai platform.

Build Info

This section specifies details about the environment used to build or run the model. You can include the python_version, which is useful for ensuring compatibility between the model and its runtime environment, as different Python versions may have varying dependencies, library support, and performance characteristics.

note

We currently support Python 3.11 and Python 3.12 (default).

Compute Resources

You must define the minimum compute resources required for running your model, including CPU, memory, and optional GPU specifications.

These are some parameters you can define:

  • cpu_limit – Number of CPUs allocated for the model (follows Kubernetes notation, e.g., "1", "2").
  • cpu_memory – Minimum memory required for the CPU (uses Kubernetes notation, e.g., "1Gi", "1500Mi", "3Gi").
  • num_accelerators – Number of GPUs or TPUs to use for inference.
  • accelerator_type – Specifies the type of hardware accelerators (e.g., GPU or TPU) supported by the model (e.g., "NVIDIA-A10G"). Note that instead of specifying an exact accelerator type, you can use a wildcard (*) to automatically match all available accelerators that fit your use case. For example, using ["NVIDIA-*"] will enable the system to choose from all NVIDIA options compatible with your model.
  • accelerator_memory – Minimum memory required for the GPU or TPU.

Hugging Face Model Checkpoints

If you're using a model from Hugging Face, you can automatically download its checkpoints by specifying the appropriate configuration in this section. For private or restricted Hugging Face repositories, include an access token.

See the example below for how to define Hugging Face checkpoints.

checkpoints:
type: "huggingface"
repo_id: "meta-llama/Meta-Llama-3-8B-Instruct"
when: "runtime"
hf_token: "your_hf_token" # Required for private models
note

The when parameter in the checkpoints section determines when model checkpoints should be downloaded and stored. It must be set to one of the following options:

  • runtime (default) – Downloads checkpoints when loading the model in the load_model method.
  • build – Downloads checkpoints during the image build process.
  • upload – Downloads checkpoints before uploading the model.

For larger models, we highly recommend downloading checkpoints at runtime. Doing so prevents unnecessary increases in Docker image size, which has some advantages:

  • Smaller image sizes
  • Faster build times
  • Quicker uploads and inference on the Clarifai platform

Downloading checkpoints at build or upload time can significantly increase image size, resulting in longer upload times and increased cold start latency.

Model Concepts or Labels

This section is required if your model outputs concepts or labels and is not being directly loaded from Hugging Face. So, you must define a concepts section in the config.yaml file.

The following model types output concepts or labels:

  • visual-classifier
  • visual-detector
  • visual-segmenter
  • text-classifier
concepts:
- id: '0'
name: bus
- id: '1'
name: person
- id: '2'
name: bicycle
- id: '3'
name: car
note

If you're using a model from Hugging Face and the checkpoints section is defined, the Clarifai platform will automatically infer concepts. In this case, you don’t need to manually specify them.

Step 3: Define Dependencies in requirements.txt

The requirements.txt file lists all the Python dependencies your model needs.

This is the requirements.txt file for the custom model we want to upload:

clarifai>=11.3.0

If your model requires Torch, we provide optimized pre-built Torch images as the base for machine learning and inference tasks.

These images include all necessary dependencies, ensuring efficient execution. The available pre-built Torch images are:

  • 2.4.1-py3.11-cuda124 — Based on PyTorch 2.4.1, Python 3.11, and CUDA 12.4.
  • 2.5.1-py3.11-cuda124 — Based on PyTorch 2.5.1, Python 3.11, and CUDA 12.4.
  • 2.4.1-py3.12-cuda124 — Based on PyTorch 2.4.1, Python 3.12, and CUDA 12.4.
  • 2.5.1-py3.12-cuda124 — Based on PyTorch 2.5.1, Python 3.12, and CUDA 12.4.

To use a specific Torch version, define it in your requirements.txt file like this:

torch==2.5.1

This ensures the correct pre-built image is pulled from Clarifai's container registry, ensuring the correct environment is used. This minimizes cold start times and speeds up model uploads and runtime execution — avoiding the overhead of building images from scratch or pulling and configuring them from external sources.

We recommend using either torch==2.5.1 or torch==2.4.1. If your model requires a different Torch version, you can specify it in requirements.txt, but this may slightly increase the model upload time.

Step 4: Test the Model Locally

Before uploading your model to the Clarifai platform, it's important to test it locally to catch any typos or misconfigurations in the code.

Learn how to test your models locally here.

Step 5: Upload the Model to Clarifai

Once your model is ready, you can upload it to the platform using Clarifai CLI.

To upload your model, run the following command in your terminal:

 clarifai model upload ./your/model/path/here 

Alternatively, navigate to the directory containing your custom model and run the command without specifying the directory path:

 clarifai model upload 

This command builds the model’s Docker image using the defined compute resources and uploads it to Clarifai, where it can be served in production. The build logs will be displayed in your terminal, which helps you troubleshoot any upload issues.

Build Logs Example
[INFO] 13:21:18.571215 Validating folder: """" |  thread=15892
[INFO] 13:21:19.635009 No checkpoints specified in the config file | thread=15892
[INFO] 13:21:19.644012 Using Python version 3.11 from the config file to build the Dockerfile | thread=15892
[INFO] 13:21:19.977325 New model will be created at https://clarifai.com/alfrick/docs-demos/models/my-uploaded-model with it's first version. | thread=15892
Press Enter to continue...
[INFO] 13:21:24.984514 Uploading file... | thread=10284
[INFO] 13:21:24.985517 Upload complete! | thread=10284
Status: Upload done, Progress: 0% - Completed upload of files, initiating model version image build.. request_id:
Status: Model image is currently being built., Progress: 0% - Model version image is being built. request_id:
[INFO] 13:21:25.791835 Created Model Version ID: 959b32947f0f4061b598f56b8ffc152f | thread=15892
[INFO] 13:21:25.791835 Full url to that version is: https://clarifai.com/alfrick/docs-demos/models/my-uploaded-model | thread=15892
[INFO] 13:21:31.140198 2025-05-07 10:21:26.135234 INFO: Downloading uploaded model from storage... | thread=15892
[INFO] 13:21:37.939478 2025-05-07 10:21:31.839495 INFO: Done downloading model

2025-05-07 10:21:31.842088 INFO: Extracting upload...

2025-05-07 10:21:31.846218 INFO: Done extracting upload

2025-05-07 10:21:31.848309 INFO: Parsing requirements file for model version ID ****f56b8ffc152f

2025-05-07 10:21:31.869731 INFO: Dockerfile found at /shared/context/Dockerfile

cat: /shared/context/downloader/hf_token: No such file or directory

2025-05-07 10:21:32.520135 INFO: Setting up credentials

amazon-ecr-credential-helper

Version: 0.8.0

Git commit: ********

2025-05-07 10:21:32.523522 INFO: Building image...

#1 \[internal] load build definition from Dockerfile

#1 DONE 0.0s



#1 \[internal] load build definition from Dockerfile

#1 transferring dockerfile: 2.61kB done

#1 WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 2)

#1 DONE 0.0s



#2 resolve image config for docker-image://docker.io/docker/dockerfile:1.13-labs

#2 DONE 0.1s



#3 docker-image://docker.io/docker/dockerfile:1.13-labs@sha256:************18b8

#3 resolve docker.io/docker/dockerfile:1.13-labs@sha256:************18b8 done

#3 CACHED



#4 \[internal] load metadata for public.ecr.aws/clarifai-models/python-base:3.11-********

#4 DONE 0.1s



#5 \[internal] load .dockerignore

#5 transferring context: 2B done

#5 DONE 0.0s



#6 \[internal] load build context

#6 transferring context: 2.66kB done

#6 DONE 0.0s



#7 \[final 1/8] FROM public.ecr.aws/clarifai-models/python-base:3.11-********@sha256:************6ab0

#7 resolve public.ecr.aws/clarifai-models/python-base:3.11-********@sha256:************6ab0 done

#7 DONE 0.0s



#8 \[final 5/8] COPY --chown=nonroot:nonroot downloader/unused.yaml /home/nonroot/main/1/checkpoints/.cache/unused.yaml

#8 CACHED



#9 \[final 4/8] RUN ["pip", "show", "clarifai"]

#9 CACHED



#10 \[final 2/8] COPY --link requirements.txt /home/nonroot/requirements.txt

#10 CACHED



#11 \[final 3/8] RUN ["pip", "install", "--no-cache-dir", "-r", "/home/nonroot/requirements.txt"]

#11 CACHED



#12 \[final 6/8] RUN ["python", "-m", "clarifai.cli", "model", "download-checkpoints", "/home/nonroot/main", "--out_path", "/home/nonroot/main/1/checkpoints", "--stage", "build"]

#12 CACHED



#13 \[final 7/8] COPY --link=true 1 /home/nonroot/main/1

#13 DONE 0.0s



#14 \[final 8/8] COPY --link=true requirements.txt config.yaml /home/nonroot/main/

#14 DONE 0.0s



#15 \[auth] sharing credentials for 891377382885.dkr.ecr.us-east-1.amazonaws.com

#15 DONE 0.0s



#16 exporting to image

#16 exporting layers done

#16 exporting manifest sha256:************4cc5 done

#16 exporting config sha256:************bd5a done

#16 pushing layers

#16 pushing layers 1.0s done

#16 pushing manifest for ****/prod/python:****f56b8ffc152f@sha256:************4cc5

#16 pushing manifest for ****/prod/python:****f56b8ffc152f@sha256:************4cc5 0.4s done

#16 DONE 1.4s

2025-05-07 10:21:34.241532 INFO: Done building image!!! | thread=15892
[INFO] 13:21:39.758911 Model build complete! | thread=15892
[INFO] 13:21:39.760236 Build time elapsed 14.0s) | thread=15892
[INFO] 13:21:39.760236 Check out the model at https://clarifai.com/alfrick/docs-demos/models/my-uploaded-model version: 959b32947f0f4061b598f56b8ffc152f | thread=15892

Note: If you make any changes to your model and upload it again to the Clarifai platform, a new version of the model will be created automatically.

Step 6: Predict With Model

Once the model is successfully uploaded to Clarifai, you can start making predictions with it.

Note: If you want to make a prediction request with our Compute Orchestration capabilities, you need to first deploy it into a cluster and nodepool you've created.

Unary-Unary Predict Call

You can make a unary-unary predict call using the uploaded model.

import os
from clarifai.client import Model

# Set your Personal Access Token (PAT)
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"

# Initialize with model URL
model = Model(url="https://clarifai.com/alfrick/docs-demos/models/my-uploaded-model")

response = model.predict("Yes, I uploaded it!")

print(response)
Example Output
Text(text='Yes, I uploaded it! Hello World!', url=None)

Unary-Stream Predict Call

You can make a unary-stream predict call using the uploaded model.

import os
from clarifai.client import Model

# Set your Personal Access Token (PAT)
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"

# Initialize with model URL
model = Model(url="https://clarifai.com/alfrick/docs-demos/models/my-uploaded-model")

for response in model.generate("Yes, I uploaded it! "):
print(response.text)
Example Output
Yes, I uploaded it! Generate Hello World 0
Yes, I uploaded it! Generate Hello World 1
Yes, I uploaded it! Generate Hello World 2
Yes, I uploaded it! Generate Hello World 3
Yes, I uploaded it! Generate Hello World 4
Yes, I uploaded it! Generate Hello World 5
Yes, I uploaded it! Generate Hello World 6
Yes, I uploaded it! Generate Hello World 7
Yes, I uploaded it! Generate Hello World 8
Yes, I uploaded it! Generate Hello World 9

Stream-Stream Predict Call

You can make a stream-stream predict call using the uploaded model.

import os
from clarifai.client import Model
from clarifai.runners.utils.data_types import Text

# Set your Personal Access Token (PAT)
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"

# Initialize with model URL
model = Model(url="https://clarifai.com/alfrick/docs-demos/models/my-uploaded-model", deployment_id="YOUR_DEPLOYMENT_ID_HERE")

# Create a list of input Texts to simulate a stream
input_texts = iter([
Text(text="First input."),
Text(text="Second input."),
Text(text="Third input.")
])

# Call the stream method and process outputs
response_iterator = model.stream(input_texts)

# Print streamed results
print("Streaming output:\n")
for response in response_iterator:
print(response.text)
Example Output
Streaming output:

First input.Stream Hello World 0
Second input.Stream Hello World 1
Third input.Stream Hello World 2

Additional Examples

tip

You can find various up-to-date model upload examples here, which demonstrate different use cases and optimizations.

Llama-3.2-1B-Instruct

model.py
from typing import List, Iterator
from threading import Thread
import os
import torch

from clarifai.runners.models.model_class import ModelClass
from clarifai.utils.logging import logger
from clarifai.runners.models.model_builder import ModelBuilder
from clarifai.runners.utils.openai_convertor import openai_response
from transformers import (AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer)


class MyModel(ModelClass):
"""A custom runner for llama-3.2-1b-instruct llm that integrates with the Clarifai platform"""

def load_model(self):
"""Load the model here."""
self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
logger.info(f"Running on device: {self.device}")

# Load checkpoints
model_path = os.path.dirname(os.path.dirname(__file__))
builder = ModelBuilder(model_path, download_validation_only=True)
self.checkpoints = builder.download_checkpoints(stage="runtime")

# Load model and tokenizer
self.tokenizer = AutoTokenizer.from_pretrained(self.checkpoints,)
self.tokenizer.pad_token = self.tokenizer.eos_token # Set pad token to eos token
self.model = AutoModelForCausalLM.from_pretrained(
self.checkpoints,
low_cpu_mem_usage=True,
device_map=self.device,
torch_dtype=torch.bfloat16,
)
self.streamer = TextIteratorStreamer(tokenizer=self.tokenizer,)
self.chat_template = None
logger.info("Done loading!")

@ModelClass.method
def predict(self,
prompt: str ="",
chat_history: List[dict] = None,
max_tokens: int = 512,
temperature: float = 0.7,
top_p: float = 0.8) -> str:
"""
Predict the response for the given prompt and chat history using the model.
"""
# Construct chat-style messages
messages = chat_history if chat_history else []
if prompt:
messages.append({
"role": "user",
"content": [{"type": "text", "text": prompt}]
})

inputs = self.tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(self.model.device)

generation_kwargs = {
"input_ids": inputs["input_ids"],
"do_sample": True,
"max_new_tokens": max_tokens,
"temperature": temperature,
"top_p": top_p,
"eos_token_id": self.tokenizer.eos_token_id,
}

output = self.model.generate(**generation_kwargs)
generated_tokens = output[0][inputs["input_ids"].shape[-1]:]
return self.tokenizer.decode(generated_tokens, skip_special_tokens=True)

@ModelClass.method
def generate(self,
prompt: str="",
chat_history: List[dict] = None,
max_tokens: int = 512,
temperature: float = 0.7,
top_p: float = 0.8) -> Iterator[str]:
"""Stream generated text tokens from a prompt + optional chat history."""

# Construct chat-style messages
messages = chat_history if chat_history else []
if prompt:
messages.append({
"role": "user",
"content": [{"type": "text", "text": prompt}]
})

response = self.chat(
messages=messages,
max_tokens=max_tokens,
temperature=temperature,
top_p=top_p
)
for each in response:
yield each['choices'][0]['delta']['content']


@ModelClass.method
def chat(self,
messages: List[dict],
max_tokens: int = 512,
temperature: float = 0.7,
top_p: float = 0.8) -> Iterator[dict]:
"""
Stream back JSON dicts for assistant messages.
Example return format:
{"role": "assistant", "content": [{"type": "text", "text": "response here"}]}
"""

# Tokenize using chat template
inputs = self.tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to(self.model.device)

generation_kwargs = {
"input_ids": inputs["input_ids"],
"do_sample": True,
"max_new_tokens": max_tokens,
"temperature": temperature,
"top_p": top_p,
"eos_token_id": self.tokenizer.eos_token_id,
"streamer": self.streamer
}

thread = Thread(target=self.model.generate, kwargs=generation_kwargs)
thread.start()

# Accumulate response text
for token_text in self.streamer:
yield openai_response(token_text)

thread.join()


def test(self):
"""Test the model here."""
try:
print("Testing predict...")
# Test predict
print(self.predict(prompt="What is the capital of India?",))
except Exception as e:
print("Error in predict", e)

try:
print("Testing generate...")
# Test generate
for each in self.generate(prompt="What is the capital of India?",):
print(each, end="")
print()
except Exception as e:
print("Error in generate", e)

try:
print("Testing chat...")
messages = [
{"role": "system", "content": "You are an helpful assistant."},
{"role": "user", "content": "What is the capital of India?"},
]
for each in self.chat(messages=messages,):
print(each, end="")
print()
except Exception as e:
print("Error in generate", e)
config.yaml
model:
id: "llama_3_2_1b_instruct"
user_id: "user_id"
app_id: "app_id"
model_type_id: "text-to-text"

build_info:
python_version: "3.11"

inference_compute_info:
cpu_limit: "1"
cpu_memory: "13Gi"
num_accelerators: 1
accelerator_type: ["NVIDIA-*"]
accelerator_memory: "18Gi"

checkpoints:
type: "huggingface"
repo_id: "unsloth/Llama-3.2-1B-Instruct"
hf_token: "hf_token"
when: "runtime"
requirements.txt
torch==2.5.1
tokenizers>=0.21.0
transformers>=4.47.0
accelerate>=1.2.0
scipy==1.10.1
optimum>=1.23.3
protobuf==5.27.3
einops>=0.8.0
requests==2.32.3
clarifai>=11.3.0

NSFW Image Classifier

model.py
import os
import tempfile
from typing import List, Iterator
from io import BytesIO
import cv2
import torch
from transformers import AutoModelForImageClassification, ViTImageProcessor

from clarifai.runners.models.model_class import ModelClass
from clarifai.runners.utils.data_types import Image, Concept, Video
from clarifai.runners.models.model_builder import ModelBuilder

from PIL import Image as PILImage


def video_to_frames(video_bytes):
"""Convert video bytes to frames."""
frames = []
with tempfile.NamedTemporaryFile(delete=False, suffix=".mp4") as temp_video_file:
temp_video_file.write(video_bytes)
temp_video_path = temp_video_file.name

video = cv2.VideoCapture(temp_video_path)
while video.isOpened():
ret, frame = video.read()
if not ret:
break
frame_bytes = cv2.imencode('.jpg', frame)[1].tobytes()
frames.append(frame_bytes)
video.release()
return frames

def preprocess_image(image_bytes):
"""Convert image bytes into RGB format suitable for model processing
Args:
image_bytes: Raw image data in bytes format
Returns:
PIL Image object in RGB format ready for model input
"""
return PILImage.open(BytesIO(image_bytes)).convert("RGB")

def process_concepts( logits, model_labels):
"""Process logits and map them to concepts."""
outputs = []
for logit in logits:
probs = torch.softmax(logit, dim=-1)
sorted_indices = torch.argsort(probs, dim=-1, descending=True)
output_concepts = []
for idx in sorted_indices:
concept = Concept(id = model_labels[idx.item()],name=model_labels[idx.item()], value=probs[idx].item())
output_concepts.append(concept)
outputs.append(output_concepts)
return outputs


class ImageClassifierModel(ModelClass):
"""A custom runner that classifies images and outputs concepts."""

def load_model(self):
"""Load the model and processor."""

model_path = os.path.dirname(os.path.dirname(__file__))
builder = ModelBuilder(model_path, download_validation_only=True)
checkpoints = builder.download_checkpoints(stage="runtime")

self.device = 'cuda' if torch.cuda.is_available() else 'cpu'

self.model = AutoModelForImageClassification.from_pretrained(checkpoints,).to(self.device)
self.model_labels = self.model.config.id2label
self.processor = ViTImageProcessor.from_pretrained(checkpoints)

@ModelClass.method
def predict(self, image: Image) -> List[List[Concept]]:
"""Predict concepts for a list of images."""
pil_image = preprocess_image(image.bytes)
inputs = self.processor(images=pil_image, return_tensors="pt")
inputs = {name: tensor.to(self.device) for name, tensor in inputs.items()}
with torch.no_grad():
logits = self.model(**inputs).logits
return process_concepts(logits, self.model_labels)

@ModelClass.method
def generate(self, video: Video) -> Iterator[List[Concept]]:
"""Generate concepts for frames extracted from a video."""
video_bytes = video.bytes
frame_generator = video_to_frames(video_bytes)
for frame in frame_generator:
image = preprocess_image(frame)
inputs = self.processor(images=image, return_tensors="pt")
inputs = {name: tensor.to(self.device) for name, tensor in inputs.items()}
with torch.no_grad():
logits = self.model(**inputs).logits
yield process_concepts(logits, self.model_labels) # Yield concepts for each frame


@ModelClass.method
def stream_image(self, image_stream: Iterator[Image]) -> Iterator[List[Concept]]:
"""Stream process image inputs."""
for image in image_stream:
result = self.predict(image)
yield result

@ModelClass.method
def stream_video(self, video_stream: Iterator[Video]) -> Iterator[List[Concept]]:
"""Stream process video inputs."""
for video in video_stream:
for frame_result in self.generate(video):
yield frame_result
config.yaml
model:
id: model_id
user_id: user_id
app_id: app_id
model_type_id: visual-classifier
build_info:
python_version: '3.11'
inference_compute_info:
cpu_limit: '2'
cpu_memory: 2Gi
num_accelerators: 1
accelerator_type:
- NVIDIA-A10G
accelerator_memory: 3Gi
checkpoints:
type: huggingface
repo_id: Falconsai/nsfw_image_detection
hf_token: hf_token
requirements.txt
torch==2.5.1
transformers>=4.47.0
pillow==10.4.0
requests==2.32.3
timm==1.0.12
opencv-python-headless==4.10.0.84
numpy
aiohttp
clarifai>=11.3.0
clarifai-protocol>=0.0.20

DETR Resnet Image Detector

model.py
# Standard library imports
import os
import tempfile
import time
from io import BytesIO
from typing import List, Dict, Any, Iterator

# Third-party imports
import cv2
import torch
from PIL import Image as PILImage
from transformers import DetrForObjectDetection, DetrImageProcessor

# Clarifai imports
from clarifai.runners.models.model_class import ModelClass
from clarifai.runners.models.model_builder import ModelBuilder
from clarifai.runners.utils.data_types import Concept, Image, Video, Region
from clarifai.utils.logging import logger


def preprocess_image(image_bytes: bytes) -> PILImage:
"""Convert image bytes into RGB format suitable for model processing.

Args:
image_bytes: Raw image data in bytes format

Returns:
PIL Image object in RGB format ready for model input
"""
return PILImage.open(BytesIO(image_bytes)).convert("RGB")


def detect_objects(
images: List[PILImage],
model: DetrForObjectDetection,
processor: DetrImageProcessor,
device: str
) -> Dict[str, Any]:
"""Process images through the DETR model to detect objects.

Args:
images: List of preprocessed images
model: DETR model instance
processor: Image processor for DETR
device: Computation device (CPU/GPU)

Returns:
Detection results from the model
"""
model_inputs = processor(images=images, return_tensors="pt").to(device)
model_inputs = {name: tensor.to(device) for name, tensor in model_inputs.items()}
model_output = model(**model_inputs)
results = processor.post_process_object_detection(model_output)
return results


def process_detections(
results: List[Dict[str, torch.Tensor]],
images: List[PILImage],
threshold: float,
model_labels: Dict[int, str]
) -> List[List[Region]]:
"""Convert model outputs into a structured format of detections.

Args:
results: Raw detection results from model
images: Original input images
threshold: Confidence threshold for detections
model_labels: Dictionary mapping label indices to names

Returns:
List of lists containing Region objects for each detection
"""
outputs = []
for i, result in enumerate(results):
image = images[i]
detections = []
for score, label_idx, box in zip(result["scores"], result["labels"], result["boxes"]):
if score > threshold:
label = model_labels[label_idx.item()]
detections.append(
Region(
box=box.tolist(),
concepts=[Concept(id=label, name=label, value=score.item())]
)
)
outputs.append(detections)
return outputs


def video_to_frames(video_bytes: bytes) -> Iterator[bytes]:
"""Convert video bytes to frames.

Args:
video_bytes: Raw video data in bytes

Yields:
JPEG encoded frame data as bytes
"""
with tempfile.NamedTemporaryFile(delete=False, suffix=".mp4") as temp_video_file:
temp_video_file.write(video_bytes)
temp_video_path = temp_video_file.name
logger.info(f"temp_video_path: {temp_video_path}")

video = cv2.VideoCapture(temp_video_path)
logger.info(f"video opened: {video.isOpened()}")

while video.isOpened():
ret, frame = video.read()
if not ret:
break
frame_bytes = cv2.imencode('.jpg', frame)[1].tobytes()
yield frame_bytes

video.release()
os.unlink(temp_video_path)


class MyRunner(ModelClass):
"""A custom runner for DETR object detection model that processes images and videos"""

def load_model(self):
"""Load the model here."""
model_path = os.path.dirname(os.path.dirname(__file__))
builder = ModelBuilder(model_path, download_validation_only=True)
checkpoint_path = builder.download_checkpoints(stage="runtime")

self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
logger.info(f"Running on device: {self.device}")

self.model = DetrForObjectDetection.from_pretrained(checkpoint_path).to(self.device)
self.processor = DetrImageProcessor.from_pretrained(checkpoint_path)
self.model.eval()
self.threshold = 0.9
self.model_labels = self.model.config.id2label

logger.info("Done loading!")

@ModelClass.method
def predict(self, image: Image) -> List[Region]:
"""Process a single image and return detected objects."""
image_bytes = image.bytes
image = preprocess_image(image_bytes)

with torch.no_grad():
results = detect_objects([image], self.model, self.processor, self.device)
outputs = process_detections(results, [image], self.threshold, self.model_labels)
return outputs[0] # Return detections for single image

@ModelClass.method
def generate(self, video: Video) -> Iterator[List[Region]]:
"""Process video frames and yield detected objects for each frame."""
video_bytes = video.bytes
frame_generator = video_to_frames(video_bytes)
for frame in frame_generator:
image = preprocess_image(frame)
with torch.no_grad():
results = detect_objects([image], self.model, self.processor, self.device)
outputs = process_detections(results, [image], self.threshold, self.model_labels)
yield outputs[0] # Yield detections for each frame

@ModelClass.method
def stream_image(self, image_stream: Iterator[Image]) -> Iterator[List[Region]]:
"""Stream process image inputs."""
logger.info("Starting stream processing for images")
for image in image_stream:
start_time = time.time()
result = self.predict(image)
yield result
logger.info(f"Processing time: {time.time() - start_time:.3f}s")

@ModelClass.method
def stream_video(self, video_stream: Iterator[Video]) -> Iterator[List[Region]]:
"""Stream process video inputs."""
logger.info("Starting stream processing for videos")
for video in video_stream:
start_time = time.time()
for frame_result in self.generate(video):
yield frame_result
logger.info(f"Processing time: {time.time() - start_time:.3f}s")

def test(self):
"""Test the model functionality."""
import requests # Import moved here as it's only used for testing

# Test configuration
TEST_URLS = {
"images": [
"https://samples.clarifai.com/metro-north.jpg",
"https://samples.clarifai.com/dog.tiff"
],
"video": "https://samples.clarifai.com/beer.mp4"
}

def get_test_data(url):
return Image(bytes=requests.get(url).content)

def get_test_video():
return Video(bytes=requests.get(TEST_URLS["video"]).content)

def run_test(name, test_fn):
logger.info(f"\nTesting {name}...")
try:
test_fn()
logger.info(f"{name} test completed successfully")
except Exception as e:
logger.error(f"Error in {name} test: {e}")

# Test predict
def test_predict():
result = self.predict(get_test_data(TEST_URLS["images"][0]))
logger.info(f"Predict result: {result}")

# Test generate
def test_generate():
for detections in self.generate(get_test_video()):
logger.info(f"First frame detections: {detections}")
break

# Test stream
def test_stream():
# Split into two separate test functions for clarity
def test_stream_image():
images = [get_test_data(url) for url in TEST_URLS["images"]]
for result in self.stream_image(iter(images)):
logger.info(f"Image stream result: {result}")

def test_stream_video():
for result in self.stream_video(iter([get_test_video()])):
logger.info(f"Video stream result: {result}")
break # Just test first frame

logger.info("\nTesting image streaming...")
test_stream_image()
logger.info("\nTesting video streaming...")
test_stream_video()

# Run all tests
for test_name, test_fn in [
("predict", test_predict),
("generate", test_generate),
("stream", test_stream)
]:
run_test(test_name, test_fn)
config.yaml
# This is the sample config file for the image-detection model.

model:
id: "detr-resnet-50"
user_id: "user_id"
app_id: "app_id"
model_type_id: "visual-detector"

build_info:
python_version: "3.11"

inference_compute_info:
cpu_limit: "4"
cpu_memory: "2Gi"
num_accelerators: 1
accelerator_type: ["NVIDIA-*"]
accelerator_memory: "5Gi"


checkpoints:
type: "huggingface"
repo_id: "facebook/detr-resnet-50"
hf_token: "hf_token"
requirements.txt
torch==2.5.1
transformers>=4.47.0
pillow==10.4.0
requests==2.32.3
timm==1.0.12
opencv-python-headless==4.10.0.84
clarifai>=11.3.0