Skip to main content

Deploy a Model

Deploy a model into your created cluster and nodepool


Clarifai’s Compute Orchestration provides efficient capabilities for you to deploy any model on any compute infrastructure, at any scale.

You can configure your compute environment and deploy your models into nodepools with your preferred settings, optimizing for both cost and scalability.

With model deployment, you can quickly take a trained model and set it up for inference.

Via the UI

Create a Deployment

note

Each model or workflow can only have one deployment per nodepool.

To deploy a model, navigate to your cluster or nodepool and click the Deploy model button in the page.

Alternatively, navigate to a model's page, go to the Deployments tab, and click the Deploy model or Deploy this model button.

You’ll be redirected to a page where you can customize the compute configurations for deploying your model.

  • Deployment details — Create a deployment ID and description that helps identify your model version and selected compute combination.

  • Model and version — Select an already trained model and the version you want to deploy.

  • Cluster —Select or create a cluster.

  • Nodepool — Select or create a nodepool to deploy your model considering your performance goals. The details of the dedicated cluster and nodepool you’ve selected will be displayed.

  • Advanced Settings — Optionally, you can click the collapsible section to configure the following settings:

    • Model Replicas — This specifies the minimum and maximum range of model replicas to deploy, adjusting based on your performance needs and anticipated workload. Adding replicas enables horizontal scaling, where the workload is distributed across several instances of the model rather than relying on a single one. However, increasing them consumes more resources and may lead to higher costs. Each node in your nodepool can host multiple replicas, depending on model size and available resources.
    node autoscaling range

    Click here to find out how to set up node autoscaling ranges to automatically adjust your infrastructure based on traffic demand.

    • Scale Up Delay — This sets the waiting period (in seconds) before adding resources in response to rising demand.
    • Scale Down Delay — This sets the waiting period (in seconds) before reducing resources after a demand decrease. Note that your nodepool will only scale down to the minimum number of replica(s) configured.
    • Traffic History Timeframe — This defines the traffic history period (in seconds) that your deployment will review before making scaling decisions.
    • Scale To Zero Delay — This sets the idle time (in seconds) before scaling down to zero replicas after inactivity.
    • Disable Nodepool Packing — Enabling this option restricts deployments to a single model replica per node. While this can be useful for specific performance needs, it may lead to underutilized nodes and increased costs due to reduced resource efficiency.

After completing the setup, click the Deploy Model button at the bottom of the page to create the deployment.

You’ll then be redirected to the nodepool page, where your deployed model will be listed.

You can find the deployment listed in the Deployment dropdown menu in the model's playground, where you can select it for inferencing.

Via the API

Create a Deployment

To deploy a model within a nodepool you've created, provide the deployment_id and config_filepath parameters to the create_deployment method of the Nodepool class.

You can learn how to create the deployment_config.yaml file, which contains the deployment configuration details, here.

note

Each model or workflow can only have one deployment per nodepool.

from clarifai.client.nodepool import Nodepool
import os

# Set the PAT key
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"

# Initialize the Nodepool instance
nodepool = Nodepool(
user_id="YOUR_USER_ID_HERE",
nodepool_id="test-nodepool",
base_url="https://api.clarifai.com"
)

# Create a new deployment
deployment = nodepool.create_deployment(
deployment_id="test-deployment",
config_filepath="./configs/deployment_config.yaml"
)

After creating it, initialize the Deployment class by providing the user_id and deployment_id parameters.

from clarifai.client.deployment import Deployment

# Initialize the deployment
deployment = Deployment(
user_id="YOUR_USER_ID_HERE",
deployment_id="test-deployment",
base_url="https://api.clarifai.com"
)
Model Inferencing

Once your model is deployed, you can use it for inferencing by calling the appropriate prediction methods. Note that you need to specify the deployment_id parameter for ensure proper routing and execution of your prediction call.