Model Export
Learn how to perform model export using Clarifai SDKs
Using the Clarifai SDKs, you can export models trained on the Clarifai Portal into a .tar
file by specifying the model URL. This feature enables version control and facilitates seamless integration into various environments.
The exported .tar
file contains the model architecture, weights, and relevant training artifacts, providing a portable and deployment-ready package. Overall, model export via the Clarifai SDKs offers users greater flexibility and control over their machine learning workflows.
Note that the model export functionality is only supported for specific model types on our platform.
Before using the Python SDK, Node.js SDK, or any of our gRPC clients, ensure they are properly installed on your machine. Refer to their respective installation guides for instructions on how to install and initialize them.
- Python SDK
# Import necessary libraries
import os
from clarifai.client.model import Model
# Set the Clarifai API key as an environment variable (replace with your actual key)
os.environ['CLARIFAI_PAT'] = "YOUR_PAT"
# Create a Model object using the model URL from the Clarifai portal
model = Model("Model URL From Portal")
# Set the model version ID (replace with the ID from the Clarifai portal)
model.model_version.id = "Model Version ID From Portal"
# Export the model to the specified path
model.export("path to save .tar file")
Output
Exporting model: 100%|██████████████████████████████████| 912M/912M [00:22<00:00, 39.8MB/s]
2024-04-18 11:44:30 INFO clarifai.client.model: Model ID model_classifier model.py:991
with version 78a14f11871a4d5fa9dfa462fc81c1aa
exported successfully to
/home/adithyansukumar/work/output/model.tar
Before moving on to deployment, unpack the model.tar
file to get the required files.
The unpacked model.tar
folder structure will look like this,
├── model
├── 1
│ ├── lib
│ ├── model.onnx
│ ├── model.py
├── config.pbtxt
├── labels.txt
├── requirements.txt
├── triton_conda-cp3.8-72f240d2.tar.gz
└── triton_server_info.proto
Model Inference Using ONNX
ONNX inference provides developers and data scientists with a standardized, efficient method for deploying machine learning models in production. By promoting interoperability across platforms and frameworks, ONNX simplifies deployment, enhances flexibility, and can improve performance.
Acting as a universal bridge, ONNX enables seamless model execution without the need for repeated retraining or framework-specific conversions. This results in significant time and resource savings, making ONNX a powerful tool for scaling machine learning solutions across diverse environments.
Click here to learn more about ONNX.
Install the requirements.txt
file with pip install requirements.txt
.
Below is an example of running predictions on a model using ONNX runtime. We are going to use model.onnx
file we received after unpacking the model.tar
file.
- Python SDK
import onnx # Library for working with ONNX models
import onnxruntime as ort # Library for running ONNX models
import numpy as np # Library for numerical operations
import cv2 # Library for image processing
# Commented-out code for model verification (uncomment if needed)
# onnx_model = onnx.load("model/1/model.onnx")
# onnx.checker.check_model(onnx_model)
# Load the input image using OpenCV
input_image = cv2.imread('ramen.png')
# Expand the image dimension to match model input requirements
input_array = np.expand_dims(input_image, axis=0)
# Swap axes and convert to float32 for potential model input requirements
input_array = np.swapaxes(input_array, 1, 3).astype(np.float32)
# Create an inference session using the ONNX model
ort_sess = ort.InferenceSession('model/1/model.onnx') # replace with correct path to onnx model file
# Run inference on the model with the preprocessed input
outputs = ort_sess.run(None, {'input': input_array})
# Extract the predicted class index from the output
predicted = outputs[0][0].argmax(0)
# Read class labels from a text file (replace filename if different)
with open(f'model/labels.txt') as f:
labels = f.readlines()
# Print the predicted class label based on the index and labels list
print(labels[predicted])
Output
id-ramen
Deployment Using Nvidia Triton
Once you've trained powerful machine learning models, deploying them efficiently for real-world applications becomes essential. NVIDIA Triton Inference Server serves as a robust bridge between your trained models and production environments.
As an open-source platform, Triton is purpose-built to optimize and streamline the deployment and execution of machine learning models for inference, enabling high-performance, scalable, and flexible model serving across diverse use cases.
Click here to learn more about Nvidia Triton.
Before we deploy our model we have to first set up the triton package on our local machine.
Make sure that you have Docker installed on your system. Follow the steps in this page to install Docker.
Execute the following command to run the triton inference container in your machine,
docker run --shm-size=1g --ulimit memlock=-1 -p 9000:9000 -p 9001:9001 -p 9002:9002 --ulimit stack=67108864 -ti nvcr.io/nvidia/tritonserver:23.03-py3 -v $(pwd):/run
Output
root@c1ed01adc0c6:/run#
Use a common directory to run the container and to extract the model.tar
file.
Once you are inside the container then execute the following to start the triton server,
cd /run
tritonserver --model-repository=/run --model-control-mode=explicit --disable-auto-complete-config --backend-config=python3,python-runtime=/usr/bin/python3 --backend-directory=/opt/tritonserver/backends --http-port=9000 --grpc-port=9001 --metrics-port=9002 --log-verbose=5 --load-model=model
If you have followed the steps correctly, you should receive an output that looks similar to the one shown here,
Output
I0418 10:22:19.527473 16089 server.cc:610]
+---------+---------------------------------------+---------------------------------------+
| Backend | Path | Config |
+---------+---------------------------------------+---------------------------------------+
| python | /opt/tritonserver/backends/python/lib | {"cmdline":{"auto-complete-config":"f |
| | triton_python.so | alse","min-compute-capability":"6.000 |
| | | 000","backend-directory":"/opt/triton |
| | | server/backends","default-max-batch-s |
| | | ize":"4"}} |
| | | |
+---------+---------------------------------------+---------------------------------------+
I0418 10:22:19.527498 16089 model_lifecycle.cc:264] ModelStates()
I0418 10:22:19.527539 16089 server.cc:653]
+-------+---------+--------+
| Model | Version | Status |
+-------+---------+--------+
| model | 1 | READY |
+-------+---------+--------+
I0418 10:22:19.652489 16089 metrics.cc:747] Collecting metrics for GPU 0: NVIDIA RTX 6000 Ada Generation
I0418 10:22:19.652537 16089 metrics.cc:747] Collecting metrics for GPU 1: NVIDIA RTX 6000 Ada Generation
I0418 10:22:19.652547 16089 metrics.cc:747] Collecting metrics for GPU 2: NVIDIA RTX 6000 Ada Generation
I0418 10:22:19.652557 16089 metrics.cc:747] Collecting metrics for GPU 3: NVIDIA RTX 6000 Ada Generation
I0418 10:22:19.652564 16089 metrics.cc:747] Collecting metrics for GPU 4: NVIDIA RTX 6000 Ada Generation
I0418 10:22:19.652573 16089 metrics.cc:747] Collecting metrics for GPU 5: NVIDIA RTX 6000 Ada Generation
I0418 10:22:19.652581 16089 metrics.cc:747] Collecting metrics for GPU 6: NVIDIA RTX 6000 Ada Generation
I0418 10:22:19.652588 16089 metrics.cc:747] Collecting metrics for GPU 7: NVIDIA RTX 6000 Ada Generation
I0418 10:22:19.653107 16089 metrics.cc:640] Collecting CPU metrics
I0418 10:22:19.653341 16089 tritonserver.cc:2364]
+----------------------------------+------------------------------------------------------+
| Option | Value |
+----------------------------------+------------------------------------------------------+
| server_id | triton |
| server_version | 2.32.0 |
| server_extensions | classification sequence model_repository model_repos |
| | itory(unload_dependents) schedule_policy model_confi |
| | guration system_shared_memory cuda_shared_memory bin |
| | ary_tensor_data parameters statistics trace logging |
| model_repository_path[0] | /run |
| model_control_mode | MODE_EXPLICIT |
| startup_models_0 | model |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| cuda_memory_pool_byte_size{1} | 67108864 |
| cuda_memory_pool_byte_size{2} | 67108864 |
| cuda_memory_pool_byte_size{3} | 67108864 |
| cuda_memory_pool_byte_size{4} | 67108864 |
| cuda_memory_pool_byte_size{5} | 67108864 |
| cuda_memory_pool_byte_size{6} | 67108864 |
| cuda_memory_pool_byte_size{7} | 67108864 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
| cache_enabled | 0 |
+----------------------------------+------------------------------------------------------+
I0418 10:22:19.654269 16089 grpc_server.cc:4888] === GRPC KeepAlive Options ===
I0418 10:22:19.654283 16089 grpc_server.cc:4889] keepalive_time_ms: 7200000
I0418 10:22:19.654288 16089 grpc_server.cc:4891] keepalive_timeout_ms: 20000
I0418 10:22:19.654293 16089 grpc_server.cc:4893] keepalive_permit_without_calls: 0
I0418 10:22:19.654299 16089 grpc_server.cc:4895] http2_max_pings_without_data: 2
I0418 10:22:19.654305 16089 grpc_server.cc:4897] http2_min_recv_ping_interval_without_data_ms: 300000
I0418 10:22:19.654312 16089 grpc_server.cc:4900] http2_max_ping_strikes: 2
I0418 10:22:19.654320 16089 grpc_server.cc:4902] ==============================
I0418 10:22:19.655139 16089 grpc_server.cc:227] Ready for RPC 'Check', 0
I0418 10:22:19.655203 16089 grpc_server.cc:227] Ready for RPC 'ServerLive', 0
I0418 10:22:19.655219 16089 grpc_server.cc:227] Ready for RPC 'ServerReady', 0
I0418 10:22:19.655228 16089 grpc_server.cc:227] Ready for RPC 'ModelReady', 0
I0418 10:22:19.655237 16089 grpc_server.cc:227] Ready for RPC 'ServerMetadata', 0
I0418 10:22:19.655246 16089 grpc_server.cc:227] Ready for RPC 'ModelMetadata', 0
I0418 10:22:19.655257 16089 grpc_server.cc:227] Ready for RPC 'ModelConfig', 0
I0418 10:22:19.655270 16089 grpc_server.cc:227] Ready for RPC 'SystemSharedMemoryStatus', 0
I0418 10:22:19.655277 16089 grpc_server.cc:227] Ready for RPC 'SystemSharedMemoryRegister', 0
I0418 10:22:19.655286 16089 grpc_server.cc:227] Ready for RPC 'SystemSharedMemoryUnregister', 0
I0418 10:22:19.655293 16089 grpc_server.cc:227] Ready for RPC 'CudaSharedMemoryStatus', 0
I0418 10:22:19.655299 16089 grpc_server.cc:227] Ready for RPC 'CudaSharedMemoryRegister', 0
I0418 10:22:19.655307 16089 grpc_server.cc:227] Ready for RPC 'CudaSharedMemoryUnregister', 0
I0418 10:22:19.655316 16089 grpc_server.cc:227] Ready for RPC 'RepositoryIndex', 0
I0418 10:22:19.655322 16089 grpc_server.cc:227] Ready for RPC 'RepositoryModelLoad', 0
I0418 10:22:19.655330 16089 grpc_server.cc:227] Ready for RPC 'RepositoryModelUnload', 0
I0418 10:22:19.655340 16089 grpc_server.cc:227] Ready for RPC 'ModelStatistics', 0
I0418 10:22:19.655348 16089 grpc_server.cc:227] Ready for RPC 'Trace', 0
I0418 10:22:19.655355 16089 grpc_server.cc:227] Ready for RPC 'Logging', 0
I0418 10:22:19.655371 16089 grpc_server.cc:445] Thread started for CommonHandler
I0418 10:22:19.655525 16089 grpc_server.cc:3952] New request handler for ModelInferHandler, 0
I0418 10:22:19.655567 16089 grpc_server.cc:2844] Thread started for ModelInferHandler
I0418 10:22:19.655706 16089 grpc_server.cc:3952] New request handler for ModelInferHandler, 0
I0418 10:22:19.655748 16089 grpc_server.cc:2844] Thread started for ModelInferHandler
I0418 10:22:19.655870 16089 grpc_server.cc:4348] New request handler for ModelStreamInferHandler, 0
I0418 10:22:19.655901 16089 grpc_server.cc:2844] Thread started for ModelStreamInferHandler
I0418 10:22:19.655909 16089 grpc_server.cc:4977] Started GRPCInferenceService at 0.0.0.0:9001
I0418 10:22:19.656156 16089 http_server.cc:3518] Started HTTPService at 0.0.0.0:9000
I0418 10:22:19.726777 16089 http_server.cc:186] Started Metrics Service at 0.0.0.0:9002
Since the inference server is up and running successfully, let's create an inference script that will communicate with the server and return the prediction. Below is an example inference script that does image classification using the exported model,
- Python SDK
# Import necessary libraries
import numpy as np
import cv2
from tritonclient.grpc import (
InferenceServerClient, # Client for interacting with Triton server
InferInput, # Represents an input to the model
InferRequestedOutput, # Represents an output requested for inference
)
# Define model name in our case the name is 'model'
model_name = 'model'
# Connect to Triton server running on localhost
triton_client = InferenceServerClient('127.0.0.1:9001')
# Load the specified model onto the Triton server
triton_client.load_model(model_name)
# Load the input image using OpenCV
input_image = cv2.imread('ramen.png') # replace with any image file
# Expand the image dimension to match model input requirements
input_array = np.expand_dims(input_image, axis=0)
# Define the model input object with its name, shape, and data type
model_input = InferInput('input', input_array.shape, 'UINT8')
# Set the data for the model input from the NumPy array
model_input.set_data_from_numpy(input_array)
# Run inference on the model with the provided input
res = triton_client.infer(
model_name=model_name, # Specify the model to use
inputs=[model_input], # List of input objects
outputs=[InferRequestedOutput('probs')], # List of requested outputs
)
# Read class labels from a text file
with open(f'{model_name}/labels.txt') as f:
labels = f.readlines()
# Get the index of the class with the highest probability from the output
predicted_class_index = np.argmax(res.as_numpy('probs'))
# Print the predicted class label based on the index and labels list
print(labels[predicted_class_index])
Output
id-ramen