Skip to main content

Model Upload

Learn how to perform model upload using Clarifai SDKs


Users can now upload their custom-built models into production using Clarifai SDKs. The Clarifai SDKs offers features like a command-line interface, easy implementation, and testing in Python to make the process of deploying the model easier.

Additionally, for serving configurations, the serving_backend section contains custom settings, including options for NVIDIA Triton. With Triton, users can leverage high-performance GPU computation for inference tasks. NVIDIA Triton Inference Server stands out as a powerful and versatile platform. It streamlines the deployment and execution of machine learning models for inference tasks, offering a professional solution for developers and data scientists seeking to bridge the gap between model development and real-world applications. Its emphasis on framework flexibility, performance optimization, scalability, and ease of integration makes it a compelling choice for maximizing the impact of machine learning models across various industries.

Prerequisites

  • Setting up the Clarifai SDKs along with PAT. Refer to the installation and configuration with the PAT token here.
note

Guide to get your PAT

  • Create a project directory named your_model_dir and run the following commands,
clarifai create model --type text-to-text --working-dir your_model_dir
cd your_model_dir

The Clarifai SDKs will then create all the necessary files required for the deployment process inside your_model_dir .

your_model_dir
├── clarifai_config.yaml
├── inference.py
├── test.py
└── requirements.txt
info
  • inference.py: The crucial file where users need to implement their Python code.
  • clarifai_config.yaml: Contains all necessary configurations for model test, build and upload
  • test.py: Predefined test cases to evaluate inference.py.
  • requirements.text: Equivalent to a normal Python project's requirements.txt.

Implementation

The next step involves the implementation of an inference class inside inference.py , for your custom model. There are 2 functions inside the class that you must implement:

  • init: a method to load the model, called once.
  • predict: a function designed to generate predictions based on the provided inputs and inference parameters. This method includes a docstring inherited from its parent, providing information on input, parameters, and output types. Refer to the docstring to confirm that the outputs of this method adhere to the correct Clarifai Output Type, as errors may occur otherwise.

Below is an example template of inference.py for a text-to-text model,

class InferenceModel(TextToText):
"""User model inference class."""

def __init__(self) -> None:
"""
Load inference time artifacts that are called frequently .e.g. models, tokenizers, etc.
in this method so they are loaded only once for faster inference.
"""
# current directory
self.base_path: Path = os.path.dirname(__file__)

def predict(self, input_data: list,
inference_parameters: Dict[str, Union[str, float, int, bool]]) -> list:
""" Custom prediction function for `text-to-text` (also called as `text generation`) model.

Args:
input_data (List[str]): List of text
inference_parameters (Dict[str, Union[str, float, int, bool]]): your inference parameters

Returns:
list of TextOutput

"""

raise NotImplementedError()

Consider a scenario where we are going to use a HuggingFace Text-To-Text model, the inference class would look like this:

import os
from typing import Dict, Union
from clarifai.models.model_serving.model_config import *

import torch
from transformers import AutoTokenizer
import transformers

class InferenceModel(TextToText):
"""User model inference class."""

def __init__(self) -> None:
"""
Load inference time artifacts that are called frequently .e.g. models, tokenizers, etc.
in this method so they are loaded only once for faster inference.
"""
# current directory
self.base_path = os.path.dirname(__file__)
# where you save hf checkpoint in your working dir e.i. `your_model_dir`
model_path = os.path.join(self.base_path, "checkpoint")
self.tokenizer = AutoTokenizer.from_pretrained(model_path)
self.pipeline = transformers.pipeline(
"text-generation",
model=model_path,
torch_dtype=torch.float16,
device_map="auto",
)

def predict(self, input_data: list,
inference_parameters: Dict[str, Union[str, float, int]]) -> list:
""" Custom prediction function for `text-to-text` (also called as `text generation`) model.

Args:
input_data (List[str]): List of text
inference_parameters (Dict[str, Union[str, float, int]]): your inference parameters

Returns:
list of TextOutput

"""
output_sequences = self.pipeline(
input_data,
eos_token_id=self.tokenizer.eos_token_id,
**inference_parameters)

# wrap outputs in Clarifai defined output
return [TextOutput(each[0]) for each in output_sequences]

Setup Configuration File

The clarifai_config.yaml contains all the required files for testing, building and uploading a model.

The config file will have the following structure,

clarifai_model:
clarifai_model_id:
clarifai_user_app_id:
description:
inference_parameters: (*)
labels: (*)
type: (**)
serving_backend:
triton: (***)
max_batch_size:
image_shape:

The clarifai_model includes configurations necessary for the building, testing, and uploading process. It comprises several attributes: clarifai_model_id, which specifies the model ID on the platform; clarifai_user_app_id, which denotes the user ID and App ID on the platform, separated by '/'; and description, providing a brief model description. These attributes are crucial for the model upload process, though if not provided, they can be passed during the upload command. Additionally, there are optional attributes: inference_parameters, allowing customization of model prediction methods for testing and uploading purposes; and labels, requiring manual insertion of concept names essential for specific model types. The type attribute, generated upon initializing the working directory, must not be modified.

For serving configurations, the serving_backend section contains custom settings, including triton options such as max_batch_size, determining the maximum number of inputs for prediction, and image_shape, applicable solely for image input models, specifying the width and height of input images. The default max_batch_size is 1, but if the model supports batch inference, it can be adjusted for enhanced GPU performance. The image_shape default is [-1, -1], indicating acceptance of any image size.

Testing

This test serves two primary purposes aimed at optimizing the testing and validation procedures:

  • Implementation Validation: Before proceeding with the build or upload operations, users can utilize this feature to conduct a comprehensive assessment of their implementation. This step ensures the accuracy and preparedness of the model for deployment. The test encompasses the validation of custom configuration settings outlined in the clarifai_config.yaml file.

  • Inference Parameter Management: Users are provided with the convenience of adding or updating inference parameters directly within the clarifai_config.yaml file. Additionally, the system performs automatic validation during the inference process to guarantee the accuracy and compatibility of these parameters with the model's requirements. The test ensures that only defined inference parameters with appropriate values can be utilized.

Example test case for text input,

def test_text_input(self):
text: list = ["Tell me about Clarifai", "How deploy model to Clarifai"]
outputs = self.model.predict(text, temperature=0.9) # In term of inference parameters for the above example, it will PASSED
outputs = self.model.predict(text, top_k=10) # And this one will FAILED since `top_k` param is not defined when init self.model

Click here to know more about test files and clarifai_config.yaml file.

Each model built for inference with triton requires certain dependencies & dependency versions to be installed for successful inference execution. Therefore the next step is to add the required dependencies into requirements.txt file.

clarifai
torch=2.1.1
transformers==4.36.2
accelerate==0.26.1

Deployment

In order to prepare for deployment we will have to build the files. This process will generate *.clarifai zip file which contains all the necessary files to get your model to work on the Clarifai platform.

clarifai build model
note

You need to upload your built file to a cloud storage service in order to obtain a direct download URL for the next step.

Since we have all the files ready let’s proceed to deploy the model using the following commands,

clarifai login
Output
Get your PAT from https://clarifai.com/settings/security and pass it here: <insert your pat here>
clarifai upload model --url <url> --user-app <your_user_id>/<your_app_id> --id <your_model_id>

Example

For the demo, we are going to upload an open-source visual segmentation model from Huggingface. Run the following commands step by step on a Google Colab instance or your local machine.

The first step is to install the required libraries,

pip install clarifai

Using the Clarifai CLI users can initialize a model from the Clarifai Examples repository into your working directory by executing the following,

clarifai create model --from-example --working-dir my_dir
note

The --working-dir parameter will create a directory.

From the list of available models let’s choose a visual segmenter as an example,

Image Output

The CLI will then clone all the required files for visual_segmenter directly onto the working directory.

Image Output

Once we are inside the my_dir directory, we can download the model checkpoint from HuggingFace into a checkpoint directory.

huggingface-cli download mattmdjaga/segformer_b2_clothes --local-dir my_dir/checkpoint --local-dir-use-symlinks False --exclude *.safetensors optimizer.pt
Output
Consider using `hf_transfer` for faster downloads. This solution comes with some limitations. See https://huggingface.co/docs/huggingface_hub/hf_transfer for more details.
Fetching 14 files: 0% 0/14 [00:00<?, ?it/s]downloading https://huggingface.co/mattmdjaga/segformer_b2_clothes/resolve/f6ac72992f938a1d0073fb5e5a06fd781f19f9a2/README.md to /root/.cache/huggingface/hub/tmpks451wck
downloading https://huggingface.co/mattmdjaga/segformer_b2_clothes/resolve/f6ac72992f938a1d0073fb5e5a06fd781f19f9a2/onnx/model.onnx to /root/.cache/huggingface/hub/tmp5jpha5yd
downloading https://huggingface.co/mattmdjaga/segformer_b2_clothes/resolve/f6ac72992f938a1d0073fb5e5a06fd781f19f9a2/.gitignore to /root/.cache/huggingface/hub/tmp2hpe7iu0
downloading https://huggingface.co/mattmdjaga/segformer_b2_clothes/resolve/f6ac72992f938a1d0073fb5e5a06fd781f19f9a2/onnx/config.json to /root/.cache/huggingface/hub/tmpufkrlq2_
downloading https://huggingface.co/mattmdjaga/segformer_b2_clothes/resolve/f6ac72992f938a1d0073fb5e5a06fd781f19f9a2/onnx/preprocessor_config.json to /root/.cache/huggingface/hub/tmp8ts928m7
downloading https://huggingface.co/mattmdjaga/segformer_b2_clothes/resolve/f6ac72992f938a1d0073fb5e5a06fd781f19f9a2/handler.py to /root/.cache/huggingface/hub/tmpu8jtbelz
downloading https://huggingface.co/mattmdjaga/segformer_b2_clothes/resolve/f6ac72992f938a1d0073fb5e5a06fd781f19f9a2/config.json to /root/.cache/huggingface/hub/tmpzkygsvw4

README.md: 100% 4.54k/4.54k [00:00<00:00, 13.5MB/s]
downloading https://huggingface.co/mattmdjaga/segformer_b2_clothes/resolve/f6ac72992f938a1d0073fb5e5a06fd781f19f9a2/.gitattributes to /root/.cache/huggingface/hub/tmp6fd8ko4x

.gitignore: 100% 29.0/29.0 [00:00<00:00, 94.1kB/s]

onnx/config.json: 100% 1.72k/1.72k [00:00<00:00, 8.88MB/s]

onnx/preprocessor_config.json: 100% 431/431 [00:00<00:00, 2.32MB/s]

model.onnx: 0% 0.00/110M [00:00<?, ?B/s]

config.json: 100% 1.73k/1.73k [00:00<00:00, 7.93MB/s]


handler.py: 100% 1.54k/1.54k [00:00<00:00, 8.31MB/s]


.gitattributes: 100% 1.48k/1.48k [00:00<00:00, 6.78MB/s]
downloading https://huggingface.co/mattmdjaga/segformer_b2_clothes/resolve/f6ac72992f938a1d0073fb5e5a06fd781f19f9a2/pytorch_model.bin to /root/.cache/huggingface/hub/tmpx_viyrvb
Fetching 14 files: 7% 1/14 [00:00<00:08, 1.55it/s]downloading https://huggingface.co/mattmdjaga/segformer_b2_clothes/resolve/f6ac72992f938a1d0073fb5e5a06fd781f19f9a2/preprocessor_config.json to /root/.cache/huggingface/hub/tmpiltxv0t_
downloading https://huggingface.co/mattmdjaga/segformer_b2_clothes/resolve/f6ac72992f938a1d0073fb5e5a06fd781f19f9a2/scheduler.pt to /root/.cache/huggingface/hub/tmppjzdy4m0
downloading https://huggingface.co/mattmdjaga/segformer_b2_clothes/resolve/f6ac72992f938a1d0073fb5e5a06fd781f19f9a2/training_args.bin to /root/.cache/huggingface/hub/tmph_tb7wt3
downloading https://huggingface.co/mattmdjaga/segformer_b2_clothes/resolve/f6ac72992f938a1d0073fb5e5a06fd781f19f9a2/trainer_state.json to /root/.cache/huggingface/hub/tmpl41baift
downloading https://huggingface.co/mattmdjaga/segformer_b2_clothes/resolve/f6ac72992f938a1d0073fb5e5a06fd781f19f9a2/rng_state.pth to /root/.cache/huggingface/hub/tmp8w6kv5ay


preprocessor_config.json: 100% 271/271 [00:00<00:00, 864kB/s]


trainer_state.json: 0% 0.00/291k [00:00<?, ?B/s]


pytorch_model.bin: 0% 0.00/110M [00:00<?, ?B/s]

trainer_state.json: 100% 291k/291k [00:00<00:00, 1.99MB/s]



pytorch_model.bin: 10% 10.5M/110M [00:00<00:01, 70.5MB/s]

training_args.bin: 100% 3.32k/3.32k [00:00<00:00, 9.70MB/s]

model.onnx: 10% 10.5M/110M [00:00<00:04, 24.4MB/s]

rng_state.pth: 100% 14.6k/14.6k [00:00<00:00, 32.1MB/s]


scheduler.pt: 100% 627/627 [00:00<00:00, 2.31MB/s]



pytorch_model.bin: 29% 31.5M/110M [00:00<00:00, 111MB/s]


pytorch_model.bin: 48% 52.4M/110M [00:00<00:00, 146MB/s]
model.onnx: 19% 21.0M/110M [00:00<00:02, 34.0MB/s]


pytorch_model.bin: 67% 73.4M/110M [00:00<00:00, 156MB/s]
model.onnx: 29% 31.5M/110M [00:00<00:01, 43.8MB/s]


pytorch_model.bin: 86% 94.4M/110M [00:00<00:00, 172MB/s]
pytorch_model.bin: 100% 110M/110M [00:00<00:00, 139MB/s]

model.onnx: 48% 52.4M/110M [00:01<00:00, 65.3MB/s]
model.onnx: 67% 73.4M/110M [00:01<00:00, 86.4MB/s]
model.onnx: 86% 94.4M/110M [00:01<00:00, 107MB/s]
model.onnx: 100% 110M/110M [00:01<00:00, 75.2MB/s]
Fetching 14 files: 100% 14/14 [00:02<00:00, 6.75it/s]
/content/examples/model_upload/visual_segmenter/segformer-b2/checkpoint

Next, install the dependencies from the requirements.txt file,

pip install -r my_dir/requirements.txt

Before moving on to the build process we have to make some changes in the clarifai_config.yml file. You will have to add clarifai_model_id and clarifai_user_app_id with the respective values. Example changes made to the clarifai_config.yml file are given below,

clarifai_model:
clarifai_model_id: 'segmentation_model'
clarifai_user_app_id: '8tzpjy1a841y/transfer_learn_3'
description: ''
inference_parameters: []
labels:
- background
- hat
- hair
- sunglass
- upper-clothes
- skirt
- pants
- dress
- belt
- left-shoe
- right-shoe
- face
- left-leg
- right-leg
- left-arm
- right-arm
- bag
- scarf
type: visual-segmenter
serving_backend:
triton:
max_batch_size: 4

After installing the dependencies and modifying the config file, we have to build the model and upload the model.clarifai file to cloud storage.

clarifai build model ./my_dir
Output
======================================= test session starts ========================================

platform linux -- Python 3.10.12, pytest-7.4.4, pluggy-1.4.0

rootdir: /content/examples/model_upload/visual_segmenter

plugins: anyio-3.7.1

collected 2 items

segformer-b2/test.py The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.

0it [00:00, ?it/s]

2024-04-09 09:57:46.703098: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered

2024-04-09 09:57:46.703174: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered

2024-04-09 09:57:46.805602: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

2024-04-09 09:57:49.466995: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

..

========================================= warnings summary =========================================

segformer-b2/test.py::CustomTest::test_default_cases

segformer-b2/test.py::CustomTest::test_specific_case1

/usr/local/lib/python3.10/dist-packages/transformers/models/segformer/image_processing_segformer.py:99: FutureWarning: The `reduce_labels` parameter is deprecated and will be removed in a future version. Please use `do_reduce_labels` instead.

warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

================================== 2 passed, 2 warnings in 20.81s ==================================

Start building...

0% 0/7 [00:00&lt;?, ?it/s]NOTE: skipping ['requirements.txt', '.cache', '__pycache__']

copying inference.py...: 100% 7/7 [00:00&lt;00:00, 8.16it/s]

Model building in progress; the duration may vary depending on the size of checkpoints/assets...

Finished. Your model is located at ./segformer-b2/model.clarifai
note

You can use the model from this URL for the model upload demo: https://s3.amazonaws.com/samples.clarifai.com/model.clarifai.

Now let's log in to the Clarifai using CLI,

clarifai login
Output
Get your PAT from https://clarifai.com/settings/security and pass it here: <insert your pat here>

The last and final step is to upload the model onto Clarifai’s platform,

clarifai upload model my_dir --url https://s3.amazonaws.com/samples.clarifai.com/model.clarifai
Output
======================================= test session starts ========================================

platform linux -- Python 3.10.12, pytest-7.4.4, pluggy-1.4.0

rootdir: /content/examples/model_upload/visual_segmenter

plugins: anyio-3.7.1

collected 2 items

segformer-b2/test.py 2024-04-09 10:25:02.328582: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered

2024-04-09 10:25:02.328646: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered

2024-04-09 10:25:02.330329: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

2024-04-09 10:25:03.910020: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

..

========================================= warnings summary =========================================

segformer-b2/test.py::CustomTest::test_default_cases

segformer-b2/test.py::CustomTest::test_specific_case1

/usr/local/lib/python3.10/dist-packages/transformers/models/segformer/image_processing_segformer.py:99: FutureWarning: The `reduce_labels` parameter is deprecated and will be removed in a future version. Please use `do_reduce_labels` instead.

warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

================================== 2 passed, 2 warnings in 16.90s ==================================

Success!

Model version: fac1b8a204554f7196871f106be75d8d