Model Upload
Learn how to perform model upload using Clarifai SDKs
Users can now upload their custom-built models into production using Clarifai SDKs. The Clarifai SDKs offers features like a command-line interface, easy implementation, and testing in Python to make the process of deploying the model easier.
Additionally, for serving configurations, the serving_backend
section contains custom settings, including options for NVIDIA Triton. With Triton, users can leverage high-performance GPU computation for inference tasks. NVIDIA Triton Inference Server stands out as a powerful and versatile platform. It streamlines the deployment and execution of machine learning models for inference tasks, offering a professional solution for developers and data scientists seeking to bridge the gap between model development and real-world applications. Its emphasis on framework flexibility, performance optimization, scalability, and ease of integration makes it a compelling choice for maximizing the impact of machine learning models across various industries.
Prerequisites
- Setting up the Clarifai SDKs along with PAT. Refer to the installation and configuration with the PAT token here.
Guide to get your PAT
- Create a project directory named
your_model_dir
and run the following commands,
clarifai create model --type text-to-text --working-dir your_model_dir
cd your_model_dir
The Clarifai SDKs will then create all the necessary files required for the deployment process inside your_model_dir
.
your_model_dir
├── clarifai_config.yaml
├── inference.py
├── test.py
└── requirements.txt
- inference.py: The crucial file where users need to implement their Python code.
- clarifai_config.yaml: Contains all necessary configurations for model test, build and upload
- test.py: Predefined test cases to evaluate inference.py.
- requirements.text: Equivalent to a normal Python project's requirements.txt.
Implementation
The next step involves the implementation of an inference class inside inference.py
, for your custom model. There are 2 functions inside the class that you must implement:
- init: a method to load the model, called once.
- predict: a function designed to generate predictions based on the provided inputs and inference parameters. This method includes a docstring inherited from its parent, providing information on input, parameters, and output types. Refer to the docstring to confirm that the outputs of this method adhere to the correct Clarifai Output Type, as errors may occur otherwise.
Below is an example template of inference.py for a text-to-text model,
- Python
class InferenceModel(TextToText):
"""User model inference class."""
def __init__(self) -> None:
"""
Load inference time artifacts that are called frequently .e.g. models, tokenizers, etc.
in this method so they are loaded only once for faster inference.
"""
# current directory
self.base_path: Path = os.path.dirname(__file__)
def predict(self, input_data: list,
inference_parameters: Dict[str, Union[str, float, int, bool]]) -> list:
""" Custom prediction function for `text-to-text` (also called as `text generation`) model.
Args:
input_data (List[str]): List of text
inference_parameters (Dict[str, Union[str, float, int, bool]]): your inference parameters
Returns:
list of TextOutput
"""
raise NotImplementedError()
Consider a scenario where we are going to use a HuggingFace Text-To-Text model, the inference class would look like this:
- Python
import os
from typing import Dict, Union
from clarifai.models.model_serving.model_config import *
import torch
from transformers import AutoTokenizer
import transformers
class InferenceModel(TextToText):
"""User model inference class."""
def __init__(self) -> None:
"""
Load inference time artifacts that are called frequently .e.g. models, tokenizers, etc.
in this method so they are loaded only once for faster inference.
"""
# current directory
self.base_path = os.path.dirname(__file__)
# where you save hf checkpoint in your working dir e.i. `your_model_dir`
model_path = os.path.join(self.base_path, "checkpoint")
self.tokenizer = AutoTokenizer.from_pretrained(model_path)
self.pipeline = transformers.pipeline(
"text-generation",
model=model_path,
torch_dtype=torch.float16,
device_map="auto",
)
def predict(self, input_data: list,
inference_parameters: Dict[str, Union[str, float, int]]) -> list:
""" Custom prediction function for `text-to-text` (also called as `text generation`) model.
Args:
input_data (List[str]): List of text
inference_parameters (Dict[str, Union[str, float, int]]): your inference parameters
Returns:
list of TextOutput
"""
output_sequences = self.pipeline(
input_data,
eos_token_id=self.tokenizer.eos_token_id,
**inference_parameters)
# wrap outputs in Clarifai defined output
return [TextOutput(each[0]) for each in output_sequences]
Setup Configuration File
The clarifai_config.yaml
contains all the required files for testing, building and uploading a model.
The config file will have the following structure,
clarifai_model:
clarifai_model_id:
clarifai_user_app_id:
description:
inference_parameters: (*)
labels: (*)
type: (**)
serving_backend:
triton: (***)
max_batch_size:
image_shape:
The clarifai_model
includes configurations necessary for the building, testing, and uploading process. It comprises several attributes: clarifai_model_id
, which specifies the model ID on the platform; clarifai_user_app_id
, which denotes the user ID and App ID on the platform, separated by '/'; and description
, providing a brief model description. These attributes are crucial for the model upload process, though if not provided, they can be passed during the upload command. Additionally, there are optional attributes: inference_parameters, allowing customization of model prediction methods for testing and uploading purposes; and labels, requiring manual insertion of concept names essential for specific model types. The type attribute, generated upon initializing the working directory, must not be modified.
For serving configurations, the serving_backend
section contains custom settings, including triton
options such as max_batch_size
, determining the maximum number of inputs for prediction, and image_shape
, applicable solely for image input models, specifying the width and height of input images. The default max_batch_size is 1, but if the model supports batch inference, it can be adjusted for enhanced GPU performance. The image_shape default is [-1, -1], indicating acceptance of any image size.
Testing
This test serves two primary purposes aimed at optimizing the testing and validation procedures:
-
Implementation Validation: Before proceeding with the build or upload operations, users can utilize this feature to conduct a comprehensive assessment of their implementation. This step ensures the accuracy and preparedness of the model for deployment. The test encompasses the validation of custom configuration settings outlined in the
clarifai_config.yaml
file. -
Inference Parameter Management: Users are provided with the convenience of adding or updating inference parameters directly within the
clarifai_config.yaml
file. Additionally, the system performs automatic validation during the inference process to guarantee the accuracy and compatibility of these parameters with the model's requirements. The test ensures that only defined inference parameters with appropriate values can be utilized.
Example test case for text input,
- Python
def test_text_input(self):
text: list = ["Tell me about Clarifai", "How deploy model to Clarifai"]
outputs = self.model.predict(text, temperature=0.9) # In term of inference parameters for the above example, it will PASSED
outputs = self.model.predict(text, top_k=10) # And this one will FAILED since `top_k` param is not defined when init self.model
Click here to know more about test files and clarifai_config.yaml file.
Each model built for inference with triton requires certain dependencies & dependency versions to be installed for successful inference execution. Therefore the next step is to add the required dependencies into requirements.txt
file.
clarifai
torch=2.1.1
transformers==4.36.2
accelerate==0.26.1
Deployment
In order to prepare for deployment we will have to build the files. This process will generate *.clarifai
zip file which contains all the necessary files to get your model to work on the Clarifai platform.
clarifai build model
You need to upload your built file to a cloud storage service in order to obtain a direct download URL for the next step.
Since we have all the files ready let’s proceed to deploy the model using the following commands,
clarifai login
Output
Get your PAT from https://clarifai.com/settings/security and pass it here: <insert your pat here>
clarifai upload model --url <url> --user-app <your_user_id>/<your_app_id> --id <your_model_id>
Example
For the demo, we are going to upload an open-source visual segmentation model from Huggingface. Run the following commands step by step on a Google Colab instance or your local machine.
The first step is to install the required libraries,
pip install clarifai
Using the Clarifai CLI users can initialize a model from the Clarifai Examples repository into your working directory by executing the following,
clarifai create model --from-example --working-dir my_dir
The --working-dir
parameter will create a directory.
From the list of available models let’s choose a visual segmenter as an example,
Image Output
![](/img/python-sdk/model_upload1.png)
The CLI will then clone all the required files for visual_segmenter
directly onto the working directory.
Image Output
![](/img/python-sdk/model_upload2.png)
Once we are inside the my_dir
directory, we can download the model checkpoint from HuggingFace into a checkpoint
directory.
huggingface-cli download mattmdjaga/segformer_b2_clothes --local-dir my_dir/checkpoint --local-dir-use-symlinks False --exclude *.safetensors optimizer.pt
Output
Consider using `hf_transfer` for faster downloads. This solution comes with some limitations. See https://huggingface.co/docs/huggingface_hub/hf_transfer for more details.
Fetching 14 files: 0% 0/14 [00:00<?, ?it/s]downloading https://huggingface.co/mattmdjaga/segformer_b2_clothes/resolve/f6ac72992f938a1d0073fb5e5a06fd781f19f9a2/README.md to /root/.cache/huggingface/hub/tmpks451wck
downloading https://huggingface.co/mattmdjaga/segformer_b2_clothes/resolve/f6ac72992f938a1d0073fb5e5a06fd781f19f9a2/onnx/model.onnx to /root/.cache/huggingface/hub/tmp5jpha5yd
downloading https://huggingface.co/mattmdjaga/segformer_b2_clothes/resolve/f6ac72992f938a1d0073fb5e5a06fd781f19f9a2/.gitignore to /root/.cache/huggingface/hub/tmp2hpe7iu0
downloading https://huggingface.co/mattmdjaga/segformer_b2_clothes/resolve/f6ac72992f938a1d0073fb5e5a06fd781f19f9a2/onnx/config.json to /root/.cache/huggingface/hub/tmpufkrlq2_
downloading https://huggingface.co/mattmdjaga/segformer_b2_clothes/resolve/f6ac72992f938a1d0073fb5e5a06fd781f19f9a2/onnx/preprocessor_config.json to /root/.cache/huggingface/hub/tmp8ts928m7
downloading https://huggingface.co/mattmdjaga/segformer_b2_clothes/resolve/f6ac72992f938a1d0073fb5e5a06fd781f19f9a2/handler.py to /root/.cache/huggingface/hub/tmpu8jtbelz
downloading https://huggingface.co/mattmdjaga/segformer_b2_clothes/resolve/f6ac72992f938a1d0073fb5e5a06fd781f19f9a2/config.json to /root/.cache/huggingface/hub/tmpzkygsvw4
README.md: 100% 4.54k/4.54k [00:00<00:00, 13.5MB/s]
downloading https://huggingface.co/mattmdjaga/segformer_b2_clothes/resolve/f6ac72992f938a1d0073fb5e5a06fd781f19f9a2/.gitattributes to /root/.cache/huggingface/hub/tmp6fd8ko4x
.gitignore: 100% 29.0/29.0 [00:00<00:00, 94.1kB/s]
onnx/config.json: 100% 1.72k/1.72k [00:00<00:00, 8.88MB/s]
onnx/preprocessor_config.json: 100% 431/431 [00:00<00:00, 2.32MB/s]
model.onnx: 0% 0.00/110M [00:00<?, ?B/s]
config.json: 100% 1.73k/1.73k [00:00<00:00, 7.93MB/s]
handler.py: 100% 1.54k/1.54k [00:00<00:00, 8.31MB/s]
.gitattributes: 100% 1.48k/1.48k [00:00<00:00, 6.78MB/s]
downloading https://huggingface.co/mattmdjaga/segformer_b2_clothes/resolve/f6ac72992f938a1d0073fb5e5a06fd781f19f9a2/pytorch_model.bin to /root/.cache/huggingface/hub/tmpx_viyrvb
Fetching 14 files: 7% 1/14 [00:00<00:08, 1.55it/s]downloading https://huggingface.co/mattmdjaga/segformer_b2_clothes/resolve/f6ac72992f938a1d0073fb5e5a06fd781f19f9a2/preprocessor_config.json to /root/.cache/huggingface/hub/tmpiltxv0t_
downloading https://huggingface.co/mattmdjaga/segformer_b2_clothes/resolve/f6ac72992f938a1d0073fb5e5a06fd781f19f9a2/scheduler.pt to /root/.cache/huggingface/hub/tmppjzdy4m0
downloading https://huggingface.co/mattmdjaga/segformer_b2_clothes/resolve/f6ac72992f938a1d0073fb5e5a06fd781f19f9a2/training_args.bin to /root/.cache/huggingface/hub/tmph_tb7wt3
downloading https://huggingface.co/mattmdjaga/segformer_b2_clothes/resolve/f6ac72992f938a1d0073fb5e5a06fd781f19f9a2/trainer_state.json to /root/.cache/huggingface/hub/tmpl41baift
downloading https://huggingface.co/mattmdjaga/segformer_b2_clothes/resolve/f6ac72992f938a1d0073fb5e5a06fd781f19f9a2/rng_state.pth to /root/.cache/huggingface/hub/tmp8w6kv5ay
preprocessor_config.json: 100% 271/271 [00:00<00:00, 864kB/s]
trainer_state.json: 0% 0.00/291k [00:00<?, ?B/s]
pytorch_model.bin: 0% 0.00/110M [00:00<?, ?B/s]
trainer_state.json: 100% 291k/291k [00:00<00:00, 1.99MB/s]
pytorch_model.bin: 10% 10.5M/110M [00:00<00:01, 70.5MB/s]
training_args.bin: 100% 3.32k/3.32k [00:00<00:00, 9.70MB/s]
model.onnx: 10% 10.5M/110M [00:00<00:04, 24.4MB/s]
rng_state.pth: 100% 14.6k/14.6k [00:00<00:00, 32.1MB/s]
scheduler.pt: 100% 627/627 [00:00<00:00, 2.31MB/s]
pytorch_model.bin: 29% 31.5M/110M [00:00<00:00, 111MB/s]
pytorch_model.bin: 48% 52.4M/110M [00:00<00:00, 146MB/s]
model.onnx: 19% 21.0M/110M [00:00<00:02, 34.0MB/s]
pytorch_model.bin: 67% 73.4M/110M [00:00<00:00, 156MB/s]
model.onnx: 29% 31.5M/110M [00:00<00:01, 43.8MB/s]
pytorch_model.bin: 86% 94.4M/110M [00:00<00:00, 172MB/s]
pytorch_model.bin: 100% 110M/110M [00:00<00:00, 139MB/s]
model.onnx: 48% 52.4M/110M [00:01<00:00, 65.3MB/s]
model.onnx: 67% 73.4M/110M [00:01<00:00, 86.4MB/s]
model.onnx: 86% 94.4M/110M [00:01<00:00, 107MB/s]
model.onnx: 100% 110M/110M [00:01<00:00, 75.2MB/s]
Fetching 14 files: 100% 14/14 [00:02<00:00, 6.75it/s]
/content/examples/model_upload/visual_segmenter/segformer-b2/checkpoint
Next, install the dependencies from the requirements.txt file,
pip install -r my_dir/requirements.txt
Before moving on to the build process we have to make some changes in the clarifai_config.yml
file. You will have to add clarifai_model_id
and clarifai_user_app_id
with the respective values.
Example changes made to the clarifai_config.yml
file are given below,
clarifai_model:
clarifai_model_id: 'segmentation_model'
clarifai_user_app_id: '8tzpjy1a841y/transfer_learn_3'
description: ''
inference_parameters: []
labels:
- background
- hat
- hair
- sunglass
- upper-clothes
- skirt
- pants
- dress
- belt
- left-shoe
- right-shoe
- face
- left-leg
- right-leg
- left-arm
- right-arm
- bag
- scarf
type: visual-segmenter
serving_backend:
triton:
max_batch_size: 4
After installing the dependencies and modifying the config file, we have to build the model and upload the model.clarifai
file to cloud storage.
clarifai build model ./my_dir
Output
======================================= test session starts ========================================
platform linux -- Python 3.10.12, pytest-7.4.4, pluggy-1.4.0
rootdir: /content/examples/model_upload/visual_segmenter
plugins: anyio-3.7.1
collected 2 items
segformer-b2/test.py The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
0it [00:00, ?it/s]
2024-04-09 09:57:46.703098: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-09 09:57:46.703174: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-09 09:57:46.805602: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-04-09 09:57:49.466995: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
..
========================================= warnings summary =========================================
segformer-b2/test.py::CustomTest::test_default_cases
segformer-b2/test.py::CustomTest::test_specific_case1
/usr/local/lib/python3.10/dist-packages/transformers/models/segformer/image_processing_segformer.py:99: FutureWarning: The `reduce_labels` parameter is deprecated and will be removed in a future version. Please use `do_reduce_labels` instead.
warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================== 2 passed, 2 warnings in 20.81s ==================================
Start building...
0% 0/7 [00:00<?, ?it/s]NOTE: skipping ['requirements.txt', '.cache', '__pycache__']
copying inference.py...: 100% 7/7 [00:00<00:00, 8.16it/s]
Model building in progress; the duration may vary depending on the size of checkpoints/assets...
Finished. Your model is located at ./segformer-b2/model.clarifai
You can use the model from this URL for the model upload demo: https://s3.amazonaws.com/samples.clarifai.com/model.clarifai
.
Now let's log in to the Clarifai using CLI,
clarifai login
Output
Get your PAT from https://clarifai.com/settings/security and pass it here: <insert your pat here>
The last and final step is to upload the model onto Clarifai’s platform,
clarifai upload model my_dir --url https://s3.amazonaws.com/samples.clarifai.com/model.clarifai
Output
======================================= test session starts ========================================
platform linux -- Python 3.10.12, pytest-7.4.4, pluggy-1.4.0
rootdir: /content/examples/model_upload/visual_segmenter
plugins: anyio-3.7.1
collected 2 items
segformer-b2/test.py 2024-04-09 10:25:02.328582: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-09 10:25:02.328646: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-09 10:25:02.330329: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-04-09 10:25:03.910020: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
..
========================================= warnings summary =========================================
segformer-b2/test.py::CustomTest::test_default_cases
segformer-b2/test.py::CustomTest::test_specific_case1
/usr/local/lib/python3.10/dist-packages/transformers/models/segformer/image_processing_segformer.py:99: FutureWarning: The `reduce_labels` parameter is deprecated and will be removed in a future version. Please use `do_reduce_labels` instead.
warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================== 2 passed, 2 warnings in 16.90s ==================================
Success!
Model version: fac1b8a204554f7196871f106be75d8d