Skip to main content

Deploy a Model

Deploy a model into your created cluster and nodepool


Clarifai’s Compute Orchestration provides efficient capabilities for you to deploy any model on any compute infrastructure, at any scale.

You can configure your compute environment and deploy your models into nodepools with your preferred settings, optimizing for both cost and scalability.

With model deployment, you can quickly take a trained model and set it up for inference.

tip

Learn how deployment works when making a prediction using our Compute Orchestration capabilities here.

Via the UI

note

Each model or workflow can only have one deployment per nodepool.

Step 1: Start Creating a Deployment

To create a deployment, navigate to the model’s page and click the Deploy Model button.

You can also open the Activity tab to check if the model is already running on any compute environments. This tab displays the compute requirements needed for successfully deploying the model, allowing you to choose a nodepool that meets those requirements.

Alternatively, to create a deployment, go to the specific cluster or nodepool where you want the deployment to run, then click the Deploy Model button on that page.

Step 2: Select a Model

You’ll be redirected to a page where you can configure the compute settings for your deployment.

If you haven’t already selected a trained model, you can do so here. By default, the latest version of the model will be used, unless you switch the version toggle off to manually select a different version.

The model’s compute requirements will also be displayed, helping you select a compatible cluster and nodepool that meet those specifications.

Step 3: Select Cluster and Nodepool

Choose an existing cluster and nodepool — or create new ones — based on your model’s compute requirements and performance goals.

Once selected, the details of the chosen cluster and nodepool will be displayed for your review.

Step 4: Provide Deployment ID

Provide a deployment ID to uniquely identify your deployment.

You can also add an optional description to provide additional context and make it easier to recognize later.

Step 5: Configure Advanced Settings

You can also configure advanced deployment settings if needed. If you choose not to, the default values will be applied automatically.

  • Model Replicas — This specifies the minimum and maximum range of model replicas to deploy, adjusting based on your performance needs and anticipated workload. Adding replicas enables horizontal scaling, where the workload is distributed across several instances of the model rather than relying on a single one. However, increasing them consumes more resources and may lead to higher costs. Each node in your nodepool can host multiple replicas, depending on model size and available resources.
node autoscaling range

Click here to find out how to set up node autoscaling ranges to automatically adjust your infrastructure based on traffic demand.

  • Scale Up Delay — This sets the waiting period (in seconds) before adding resources in response to rising demand.
  • Scale Down Delay — This sets the waiting period (in seconds) before reducing resources after a demand decrease. Note that your nodepool will only scale down to the minimum number of replica(s) configured.
  • Scale To Zero Delay — This sets the idle time (in seconds) before scaling down to zero replicas after inactivity.
  • Traffic History Timeframe — This defines the traffic history period (in seconds) that your deployment will review before making scaling decisions.
  • Disable Nodepool Packing — Packing refers to placing multiple replicas on the same node to improve resource utilization and reduce costs. When set to false (default), replicas may be packed together for efficiency. When set to true, deployments are restricted to a single model replica per node, which can improve isolation or meet specific performance needs, but may result in underutilized nodes and higher costs.

Step 6: Finalize and Create the Deployment

After completing the setup, click the Deploy Model button to create the deployment. You’ll be redirected to the nodepool page, where your deployed model will be listed.

You can also find the deployment listed in the Activity tab within the model's page. From there, you can select it to run inferences.

Via the API

Create a Deployment

To deploy a model within a nodepool you've created, provide the deployment_id and config_filepath parameters to the create_deployment method of the Nodepool class.

You can learn how to create the deployment_config.yaml file, which contains the deployment configuration details, here.

note

Each model or workflow can only have one deployment per nodepool.

from clarifai.client.nodepool import Nodepool

# Set PAT as an environment variable
# export CLARIFAI_PAT=YOUR_PAT_HERE # Unix-Like Systems
# set CLARIFAI_PAT=YOUR_PAT_HERE # Windows

# Initialize the Nodepool instance
nodepool = Nodepool(
user_id="YOUR_USER_ID_HERE",
nodepool_id="test-nodepool"
)

# Create a new deployment
deployment = nodepool.create_deployment(
deployment_id="test-deployment",
config_filepath="./configs/deployment_config.yaml"
)
Example Output
[INFO] 14:45:29.871319 Deployment with ID 'test-deployment' is created:
code: SUCCESS
description: "Ok"
req_id: "sdk-python-11.7.5-1eb407b9e125478287d552fb76bc37dd"

After creating it, initialize the Deployment class by providing the user_id and deployment_id parameters.

from clarifai.client.deployment import Deployment

# Initialize the deployment
deployment = Deployment(
user_id="YOUR_USER_ID_HERE",
deployment_id="test-deployment"
)

Restrict Deployments

You can specify the type of compute cluster an existing model you own is deployed to. By setting the deploy_restriction value, you can patch a model and define whether it runs on shared or dedicated resources.

These are the values you can set:

  • 0 (USAGE_RESTRICTION_NOT_SET) — The default where no explicit restriction is set.
  • 1 (NO_LIMITS) — The model can be deployed on any kind of compute (shared or dedicated). There are no policy constraints.
  • 2 (SHARED_COMPUTE_ONLY) — The model can only run on shared compute resources. This is typically cheaper but may have lower isolation or performance guarantees.
  • 3 (DEDICATED_COMPUTE_ONLY) — The model can only run on dedicated compute resources. This is used when you need guaranteed performance, security isolation, or compliance.
curl -X PATCH "https://api.clarifai.com/v2/users/YOUR_USER_ID_HERE/apps/YOUR_APP_ID_HERE/models" \
-H "Authorization: Key YOUR_PAT_HERE" \
-H "Content-Type: application/json" \
-d '{
"models": [
{
"id": "YOUR_MODEL_ID_HERE",
"deploy_restriction": 2
}
],
"action": "merge"
}'
Example Output
{
"status": {
"code": 10000,
"description": "Ok",
"req_id": "b6af331eac444e76b88abea88d2d4579"
},
"models": [{
"id": "upload55",
"name": "upload55",
"created_at": "2025-08-21T17:05:33.491470Z",
"modified_at": "2025-09-09T18:36:48.844230Z",
"app_id": "uploaded-models",
"model_version": {
"id": "991d5569b152462aad563cfc24faf477",
"created_at": "2025-08-21T17:05:34.086694Z",
"status": {
"code": 21100,
"description": "Model is trained and ready for deployment"
},
"completed_at": "2025-08-21T17:05:41.982881Z",
"visibility": {
"gettable": 10
},
"app_id": "uploaded-models",
"user_id": "alfrick",
"metadata": {},
"output_info": {
"output_config": {
"max_concepts": 0,
"min_value": 0
},
"message": "Show output_info with: GET /models/{model_id}/output_info",
"fields_map": {},
"params": {
"max_tokens": 512,
"secrets": [],
"temperature": 1
}
},
"input_info": {
"fields_map": {}
},
"train_info": {},
"import_info": {},
"inference_compute_info": {
"cpu_limit": "1",
"cpu_memory": "13Gi",
"cpu_requests": "1",
"cpu_memory_requests": "2Gi",
"num_accelerators": 1,
"accelerator_memory": "15Gi",
"accelerator_type": ["NVIDIA-*"]
},
"method_signatures": [{
"name": "generate",
"method_type": 2,
"description": "This method streams multiple outputs instead of returning just one.\nIt takes an input string and yields a sequence of outputs.",
"input_fields": [{
"name": "text1",
"type": 1
}],
"output_fields": [{
"name": "return",
"type": 1,
"iterator": true
}]
}]
},
"user_id": "alfrick",
"model_type_id": "text-to-text",
"visibility": {
"gettable": 10
},
"metadata": {},
"presets": {},
"toolkits": [],
"use_cases": [],
"languages": [],
"languages_full": [],
"check_consents": [],
"workflow_recommended": false,
"featured_order": 0,
"deploy_restriction": 2,
"open_router_info": {
"params": {}
}
}]
}