Model Export
Learn how to perform model export using Clarifai SDKs
Using the Clarifai SDKs, you can export the model you have trained on the Clarifai portal into a .tar file by specifying the model URL. This feature allows users to version control their trained models and seamlessly integrate them into different environments. The exported .tar file encapsulates the model architecture, weights, and any additional training artifacts, making it a portable archive for deployment. Overall, the ability to export models via the Clarifai SDKs empowers users with greater flexibility and control over their machine-learning workflows.
- Python
# Import necessary libraries
import os
from clarifai.client.model import Model
# Set the Clarifai API key as an environment variable (replace with your actual key)
os.environ['CLARIFAI_PAT'] = "YOUR_PAT"
# Create a Model object using the model URL from the Clarifai portal
model = Model("Model URL From Portal")
# Set the model version ID (replace with the ID from the Clarifai portal)
model.model_version.id = "Model Version ID From Portal"
# Export the model to the specified path
model.export("path to save .tar file")
Output
Exporting model: 100%|██████████████████████████████████| 912M/912M [00:22<00:00, 39.8MB/s]
2024-04-18 11:44:30 INFO clarifai.client.model: Model ID model_classifier model.py:991
with version 78a14f11871a4d5fa9dfa462fc81c1aa
exported successfully to
/home/adithyansukumar/work/output/model.tar
Before moving on to deployment, unpack the model.tar
file to get the required files.
The unpacked model.tar
folder structure will look like this,
├── model
├── 1
│ ├── lib
│ ├── model.onnx
│ ├── model.py
├── config.pbtxt
├── labels.txt
├── requirements.txt
├── triton_conda-cp3.8-72f240d2.tar.gz
└── triton_server_info.proto
Model Inference Using ONNX
ONNX inference equips developers and data scientists with a standardized and efficient approach to deploying machine learning models in production environments. It fosters flexibility, simplifies deployment, and offers the potential for performance improvements, making it a valuable tool for unlocking the power of machine-learning models across a wide range of applications. ONNX inference acts as a bridge, enabling the seamless execution of this model across diverse platforms and frameworks that support ONNX. This eliminates the need for repetitive model retraining or framework-specific conversions, resulting in significant time and resource savings.
Visit this page to learn more about ONNX.
Install the requirements.txt
file with pip install requirements.txt
.
Below is an example of running predictions on a model using ONNX runtime. We are going to use model.onnx
file we received after unpacking the model.tar
file.
- Python
import onnx # Library for working with ONNX models
import onnxruntime as ort # Library for running ONNX models
import numpy as np # Library for numerical operations
import cv2 # Library for image processing
# Commented-out code for model verification (uncomment if needed)
# onnx_model = onnx.load("model/1/model.onnx")
# onnx.checker.check_model(onnx_model)
# Load the input image using OpenCV
input_image = cv2.imread('ramen.png')
# Expand the image dimension to match model input requirements
input_array = np.expand_dims(input_image, axis=0)
# Swap axes and convert to float32 for potential model input requirements
input_array = np.swapaxes(input_array, 1, 3).astype(np.float32)
# Create an inference session using the ONNX model
ort_sess = ort.InferenceSession('model/1/model.onnx') # replace with correct path to onnx model file
# Run inference on the model with the preprocessed input
outputs = ort_sess.run(None, {'input': input_array})
# Extract the predicted class index from the output
predicted = outputs[0][0].argmax(0)
# Read class labels from a text file (replace filename if different)
with open(f'model/labels.txt') as f:
labels = f.readlines()
# Print the predicted class label based on the index and labels list
print(labels[predicted])
Output
id-ramen
Deployment Using Nvidia Triton
Imagine you've trained powerful machine learning models and want to deploy them efficiently for real-world applications. NVIDIA Triton Inference Server acts as the bridge between your trained models and the real world. It's an open-source software specifically designed to optimize and streamline the process of deploying and running machine learning models for inference tasks.
Click here to learn more about Nvidia Triton.
Before we deploy our model we have to first set up the triton package on our local machine.
Make sure that you have Docker installed on your system. Follow the steps in this page to install Docker.
Execute the following command to run the triton inference container in your machine,
docker run --shm-size=1g --ulimit memlock=-1 -p 9000:9000 -p 9001:9001 -p 9002:9002 --ulimit stack=67108864 -ti nvcr.io/nvidia/tritonserver:23.03-py3 -v $(pwd):/run
Output
root@c1ed01adc0c6:/run#
Use a common directory to run the container and to extract the model.tar
file.
Once you are inside the container then execute the following to start the triton server,
cd /run
tritonserver --model-repository=/run --model-control-mode=explicit --disable-auto-complete-config --backend-config=python3,python-runtime=/usr/bin/python3 --backend-directory=/opt/tritonserver/backends --http-port=9000 --grpc-port=9001 --metrics-port=9002 --log-verbose=5 --load-model=model
If you have followed the steps correctly, you should receive an output that looks similar to the one shown here,
Output
I0418 10:22:19.527473 16089 server.cc:610]
+---------+---------------------------------------+---------------------------------------+
| Backend | Path | Config |
+---------+---------------------------------------+---------------------------------------+
| python | /opt/tritonserver/backends/python/lib | {"cmdline":{"auto-complete-config":"f |
| | triton_python.so | alse","min-compute-capability":"6.000 |
| | | 000","backend-directory":"/opt/triton |
| | | server/backends","default-max-batch-s |
| | | ize":"4"}} |
| | | |
+---------+---------------------------------------+---------------------------------------+
I0418 10:22:19.527498 16089 model_lifecycle.cc:264] ModelStates()
I0418 10:22:19.527539 16089 server.cc:653]
+-------+---------+--------+
| Model | Version | Status |
+-------+---------+--------+
| model | 1 | READY |
+-------+---------+--------+
I0418 10:22:19.652489 16089 metrics.cc:747] Collecting metrics for GPU 0: NVIDIA RTX 6000 Ada Generation
I0418 10:22:19.652537 16089 metrics.cc:747] Collecting metrics for GPU 1: NVIDIA RTX 6000 Ada Generation
I0418 10:22:19.652547 16089 metrics.cc:747] Collecting metrics for GPU 2: NVIDIA RTX 6000 Ada Generation
I0418 10:22:19.652557 16089 metrics.cc:747] Collecting metrics for GPU 3: NVIDIA RTX 6000 Ada Generation
I0418 10:22:19.652564 16089 metrics.cc:747] Collecting metrics for GPU 4: NVIDIA RTX 6000 Ada Generation
I0418 10:22:19.652573 16089 metrics.cc:747] Collecting metrics for GPU 5: NVIDIA RTX 6000 Ada Generation
I0418 10:22:19.652581 16089 metrics.cc:747] Collecting metrics for GPU 6: NVIDIA RTX 6000 Ada Generation
I0418 10:22:19.652588 16089 metrics.cc:747] Collecting metrics for GPU 7: NVIDIA RTX 6000 Ada Generation
I0418 10:22:19.653107 16089 metrics.cc:640] Collecting CPU metrics
I0418 10:22:19.653341 16089 tritonserver.cc:2364]
+----------------------------------+------------------------------------------------------+
| Option | Value |
+----------------------------------+------------------------------------------------------+
| server_id | triton |
| server_version | 2.32.0 |
| server_extensions | classification sequence model_repository model_repos |
| | itory(unload_dependents) schedule_policy model_confi |
| | guration system_shared_memory cuda_shared_memory bin |
| | ary_tensor_data parameters statistics trace logging |
| model_repository_path[0] | /run |
| model_control_mode | MODE_EXPLICIT |
| startup_models_0 | model |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| cuda_memory_pool_byte_size{1} | 67108864 |
| cuda_memory_pool_byte_size{2} | 67108864 |
| cuda_memory_pool_byte_size{3} | 67108864 |
| cuda_memory_pool_byte_size{4} | 67108864 |
| cuda_memory_pool_byte_size{5} | 67108864 |
| cuda_memory_pool_byte_size{6} | 67108864 |
| cuda_memory_pool_byte_size{7} | 67108864 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
| cache_enabled | 0 |
+----------------------------------+------------------------------------------------------+
I0418 10:22:19.654269 16089 grpc_server.cc:4888] === GRPC KeepAlive Options ===
I0418 10:22:19.654283 16089 grpc_server.cc:4889] keepalive_time_ms: 7200000
I0418 10:22:19.654288 16089 grpc_server.cc:4891] keepalive_timeout_ms: 20000
I0418 10:22:19.654293 16089 grpc_server.cc:4893] keepalive_permit_without_calls: 0
I0418 10:22:19.654299 16089 grpc_server.cc:4895] http2_max_pings_without_data: 2
I0418 10:22:19.654305 16089 grpc_server.cc:4897] http2_min_recv_ping_interval_without_data_ms: 300000
I0418 10:22:19.654312 16089 grpc_server.cc:4900] http2_max_ping_strikes: 2
I0418 10:22:19.654320 16089 grpc_server.cc:4902] ==============================
I0418 10:22:19.655139 16089 grpc_server.cc:227] Ready for RPC 'Check', 0
I0418 10:22:19.655203 16089 grpc_server.cc:227] Ready for RPC 'ServerLive', 0
I0418 10:22:19.655219 16089 grpc_server.cc:227] Ready for RPC 'ServerReady', 0
I0418 10:22:19.655228 16089 grpc_server.cc:227] Ready for RPC 'ModelReady', 0
I0418 10:22:19.655237 16089 grpc_server.cc:227] Ready for RPC 'ServerMetadata', 0
I0418 10:22:19.655246 16089 grpc_server.cc:227] Ready for RPC 'ModelMetadata', 0
I0418 10:22:19.655257 16089 grpc_server.cc:227] Ready for RPC 'ModelConfig', 0
I0418 10:22:19.655270 16089 grpc_server.cc:227] Ready for RPC 'SystemSharedMemoryStatus', 0
I0418 10:22:19.655277 16089 grpc_server.cc:227] Ready for RPC 'SystemSharedMemoryRegister', 0
I0418 10:22:19.655286 16089 grpc_server.cc:227] Ready for RPC 'SystemSharedMemoryUnregister', 0
I0418 10:22:19.655293 16089 grpc_server.cc:227] Ready for RPC 'CudaSharedMemoryStatus', 0
I0418 10:22:19.655299 16089 grpc_server.cc:227] Ready for RPC 'CudaSharedMemoryRegister', 0
I0418 10:22:19.655307 16089 grpc_server.cc:227] Ready for RPC 'CudaSharedMemoryUnregister', 0
I0418 10:22:19.655316 16089 grpc_server.cc:227] Ready for RPC 'RepositoryIndex', 0
I0418 10:22:19.655322 16089 grpc_server.cc:227] Ready for RPC 'RepositoryModelLoad', 0
I0418 10:22:19.655330 16089 grpc_server.cc:227] Ready for RPC 'RepositoryModelUnload', 0
I0418 10:22:19.655340 16089 grpc_server.cc:227] Ready for RPC 'ModelStatistics', 0
I0418 10:22:19.655348 16089 grpc_server.cc:227] Ready for RPC 'Trace', 0
I0418 10:22:19.655355 16089 grpc_server.cc:227] Ready for RPC 'Logging', 0
I0418 10:22:19.655371 16089 grpc_server.cc:445] Thread started for CommonHandler
I0418 10:22:19.655525 16089 grpc_server.cc:3952] New request handler for ModelInferHandler, 0
I0418 10:22:19.655567 16089 grpc_server.cc:2844] Thread started for ModelInferHandler
I0418 10:22:19.655706 16089 grpc_server.cc:3952] New request handler for ModelInferHandler, 0
I0418 10:22:19.655748 16089 grpc_server.cc:2844] Thread started for ModelInferHandler
I0418 10:22:19.655870 16089 grpc_server.cc:4348] New request handler for ModelStreamInferHandler, 0
I0418 10:22:19.655901 16089 grpc_server.cc:2844] Thread started for ModelStreamInferHandler
I0418 10:22:19.655909 16089 grpc_server.cc:4977] Started GRPCInferenceService at 0.0.0.0:9001
I0418 10:22:19.656156 16089 http_server.cc:3518] Started HTTPService at 0.0.0.0:9000
I0418 10:22:19.726777 16089 http_server.cc:186] Started Metrics Service at 0.0.0.0:9002
Since the inference server is up and running successfully, let's create an inference script that will communicate with the server and return the prediction. Below is an example inference script that does image classification using the exported model,
- Python
# Import necessary libraries
import numpy as np
import cv2
from tritonclient.grpc import (
InferenceServerClient, # Client for interacting with Triton server
InferInput, # Represents an input to the model
InferRequestedOutput, # Represents an output requested for inference
)
# Define model name in our case the name is 'model'
model_name = 'model'
# Connect to Triton server running on localhost
triton_client = InferenceServerClient('127.0.0.1:9001')
# Load the specified model onto the Triton server
triton_client.load_model(model_name)
# Load the input image using OpenCV
input_image = cv2.imread('ramen.png') # replace with any image file
# Expand the image dimension to match model input requirements
input_array = np.expand_dims(input_image, axis=0)
# Define the model input object with its name, shape, and data type
model_input = InferInput('input', input_array.shape, 'UINT8')
# Set the data for the model input from the NumPy array
model_input.set_data_from_numpy(input_array)
# Run inference on the model with the provided input
res = triton_client.infer(
model_name=model_name, # Specify the model to use
inputs=[model_input], # List of input objects
outputs=[InferRequestedOutput('probs')], # List of requested outputs
)
# Read class labels from a text file
with open(f'{model_name}/labels.txt') as f:
labels = f.readlines()
# Get the index of the class with the highest probability from the output
predicted_class_index = np.argmax(res.as_numpy('probs'))
# Print the predicted class label based on the index and labels list
print(labels[predicted_class_index])
Output
id-ramen