Skip to main content

Deploy a Model

Deploy a model into your created cluster and nodepool


Clarifai’s Compute Orchestration provides efficient capabilities for you to deploy any model on any compute infrastructure, at any scale.

These platform capabilities bring the convenience of serverless autoscaling to any compute, regardless of where it’s deployed and what hardware it’s running on, and scale automatically to meet workload demands.

Compute Orchestration allows you to upload a model, configure your SaaS or self-managed compute, and then deploy your model into your nodepools with your preferred settings cost-efficiently and scalably.

tip

Learn how to make your first deployment quickly on the UI using a pre-configured cluster and nodepool here.

Via the API

Create a Deployment

To deploy a model within a nodepool you've created, provide the deployment_id and config_filepath parameters to the create_deployment method of the Nodepool class.

You can learn how to create the deployment_config.yaml file, which contains the deployment configuration details, here.

note

Each model or workflow can only have one deployment per nodepool.

from clarifai.client.nodepool import Nodepool
import os

# Set the PAT key
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"

# Initialize the Nodepool instance
nodepool = Nodepool(
user_id="YOUR_USER_ID_HERE",
nodepool_id="test-nodepool",
base_url="https://api.clarifai.com"
)

# Create a new deployment
deployment = nodepool.create_deployment(
deployment_id="test-deployment",
config_filepath="./configs/deployment_config.yaml"
)

After creating it, initialize the Deployment class by providing the user_id and deployment_id parameters.

from clarifai.client.deployment import Deployment

# Initialize the deployment
deployment = Deployment(
user_id="YOUR_USER_ID_HERE",
deployment_id="test-deployment",
base_url="https://api.clarifai.com"
)
Model Inferencing

Once your model is deployed, you can use it for inferencing by calling the appropriate prediction methods. Note that you need to specify the deployment_id parameter for ensure proper routing and execution of your prediction call.

Via the UI

Create a Deployment

note

Each model or workflow can only have one deployment per nodepool.

To deploy a model, navigate to your cluster or nodepool and click the Deploy model button in the page.

Alternatively, navigate to a model's page, go to the Deployments tab, and click the Deploy model or Deploy this model button.

You’ll be redirected to a page where you can customize the compute configurations for deploying your model.

  • Deployment details — Create a deployment ID and description that helps identify your model version and selected compute combination.

  • Model and version — Select an already trained model and the version you want to deploy.

  • Cluster —Select or create a cluster.

  • Nodepool — Select or create a nodepool to deploy your model considering your performance goals. The details of the dedicated cluster and nodepool you’ve selected will be displayed.

  • Advanced Settings — Optionally, you can click the collapsible section to configure the following settings:

    • Model Replicas — This specifies the minimum and maximum range of model replicas to deploy, adjusting based on your performance needs and anticipated workload. Adding replicas enables horizontal scaling, where the workload is distributed across several instances of the model rather than relying on a single one. However, increasing them consumes more resources and may lead to higher costs. Each node in your nodepool can host multiple replicas, depending on model size and available resources.
    • Scale Up Delay — This sets the waiting period (in seconds) before adding resources in response to rising demand.
    • Scale Down Delay — This sets the waiting period (in seconds) before reducing resources after a demand decrease. Note that your nodepool will only scale down to the minimum number of replica(s) configured.
    • Traffic History Timeframe — This defines the traffic history period (in seconds) that your deployment will review before making scaling decisions.
    • Scale To Zero Delay — This sets the idle time (in seconds) before scaling down to zero replicas after inactivity.

After completing the setup, click the Deploy Model button at the bottom of the page to create the deployment.

You’ll then be redirected to the nodepool page, where your deployed model will be listed.

You can find the deployment listed in the Deployment dropdown menu in the model's playground, where you can select it for inferencing.