Skip to main content

Create Clusters and Nodepools

Set up capabilities that match your computational needs


A compute cluster serves as the main environment where models are deployed, whether for training or inference. Each cluster can contain multiple nodepools, which are groups of virtual machine instances with similar configurations (such as CPU/GPU type, memory).

After creating a custom cluster, you can configure nodepools within it to optimize resource usage. These nodepools will help tailor the infrastructure to meet the specific hardware, performance, cost, or regulatory compliance of your machine learning needs.

For example, you may create a nodepool for GPU-intensive tasks and another for lighter workloads running on CPUs.

With clusters and nodepools, you can organize and manage (orchestrate) the compute resources necessary for running your models and workflows.

Shared Compute

By default, Clarifai offers a Shared SaaS (Serverless) compute cluster with a nodepool. This allows you to quickly get started without configuring any underlying compute instances (such as clusters, nodepools, and accelerators) — the pre-configured nodepool dynamically scales resources based on your model's needs.

If you opt for the shared cluster, you do not need to make any setup configurations on the Compute Orchestration pane — your models will automatically be deployed using it.

Note that the default setup is ideal for general use cases and may not meet more specific or demanding performance scenarios.

Via the API

Prerequisites

Installation

To begin, install the latest version of the clarifai Python package.

pip install --upgrade clarifai

Note that if you want to use the Clarifai CLI, you'll need to authenticate your CLI session with Clarifai. Learn how to do that here.

Get a PAT

You need a PAT (Personal Access Token) key to authenticate your connection to the Clarifai platform. You can generate it in your Personal Settings page by navigating to the Security section.

Then, set it as an environment variable in your script.

import os
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE" # replace with your own PAT key

Set up Project Directory

  • Create an overarching directory to store your project files.
  • Inside this directory, create a Python file for your Compute Orchestration code.
  • Create a configs folder to store your YAML configuration files for clusters, nodepools, and deployments.

Then, create the following files in the configs folder:

1. compute_cluster_config.yaml:

compute_cluster:
id: "test-compute-cluster"
description: "My AWS compute cluster"
cloud_provider:
id: "aws"
region: "us-east-1"
managed_by: "clarifai"
cluster_type: "dedicated"
visibility:
gettable: 10

2. nodepool_config.yaml:

nodepool:
id: "test-nodepool"
compute_cluster:
id: "test-compute-cluster"
description: "First nodepool in AWS in a proper compute cluster"
instance_types:
- id: "g5.xlarge"
compute_info:
cpu_limit: "8"
cpu_memory: "16Gi"
accelerator_type:
- "a10"
num_accelerators: 1
accelerator_memory: "40Gi"
node_capacity_type:
capacity_types:
- 1
- 2
max_instances: 1

3. deployment_config.yaml:

We'll use this later to deploy the model.

deployment:
id: "test-deployment"
description: "some random deployment"
autoscale_config:
min_replicas: 0
max_replicas: 1
traffic_history_seconds: 100
scale_down_delay_seconds: 30
scale_up_delay_seconds: 30
disable_packing: false
worker:
model:
id: "apparel-clusterering"
model_version:
id: "cc911f6b0ed748efb89e3d1359c146c4"
user_id: "clarifai"
app_id: "main"
scheduling_choice: 4
nodepools:
- id: "test-nodepool"
compute_cluster:
id: "test-compute-cluster"

Create a Cluster

To create a new compute cluster, pass the compute_cluster_id and config_filepath as arguments to the create_compute_cluster method of the User class.

from clarifai.client.user import User
import os

# Set the PAT key
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"

# Initialize the client
client = User(
user_id="YOUR_USER_ID_HERE",
base_url="https://api.clarifai.com"
)

# Create a new compute cluster
compute_cluster = client.create_compute_cluster(
compute_cluster_id="test-compute-cluster",
config_filepath="./configs/compute_cluster_config.yaml"
)

After creating it, initialize the ComputeCluster class by providing the user_id and compute_cluster_id parameters.

Initialization is essential because it establishes the specific user and compute cluster context, which allows the subsequent operations to accurately target and manage the intended resources.

from clarifai.client.compute_cluster import ComputeCluster

# Initialize the ComputeCluster instance
compute_cluster = ComputeCluster(
user_id="YOUR_USER_ID_HERE",
compute_cluster_id="test-compute-cluster",
base_url="https://api.clarifai.com"
)

Create a Nodepool

To create a new nodepool, use the create_nodepool method with the nodepool_id and config_filepath parameters.

from clarifai.client.compute_cluster import ComputeCluster
import os

# Set the PAT key
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"

# Initialize the ComputeCluster instance
compute_cluster = ComputeCluster(
user_id="YOUR_USER_ID_HERE",
compute_cluster_id="test-compute-cluster",
base_url="https://api.clarifai.com"
)

# Create a new nodepool
nodepool = compute_cluster.create_nodepool(
nodepool_id="test-nodepool",
config_filepath="./configs/nodepool_config.yaml"
)

After creating it, initialize the Nodepool class by providing the user_id and nodepool_id parameters.

from clarifai.client.nodepool import Nodepool

# Initialize the Nodepool instance
nodepool = Nodepool(
user_id="YOUR_USER_ID_HERE",
nodepool_id="test-nodepool",
base_url="https://api.clarifai.com"
)

Via the UI

Create a Cluster

Log in to the Clarifai platform and select the Compute option in the top navigation bar.


Alternatively, you can click the Compute settings button found in the Deployments tab on a model's viewer page or anywhere this button appears on the platform.


You’ll be redirected to the Compute Orchestration pane. Then, click the Create a new cluster button.

You’ll be redirected to a page, where you can specify the configurations for your new cluster.

  • Cluster ID — Provide an ID that helps identify the cluster to use when deploying your models. We recommend an easy-to-remember ID that’s related to the cluster’s use case.

  • Cluster Description — Optionally, provide a short description that summarizes the details related to the cluster.

  • Cluster Type — Choose the type of cluster you want to use:

    • Dedicated Clarifai-managed cloud compute — Run workloads in dedicated compute instances in Clarifai's cloud.
    • Dedicated self-managed compute (coming soon) — Bring your own existing dedicated compute resources, either from cloud or on-premise instances. If you're interested in using your own cloud or on-prem compute, let us know by sending feedback or contacting our support department.
  • Instance Settings — Select your preferred cloud provider and geographic region for deploying your models (more options are coming soon). Note that the choice depends on several factors, including performance needs, costs, and regulatory compliance. For example, selecting an instance type closer to your users reduces network latency, leading to faster response times.

  • Personal Access Token (PAT) — Select a PAT that is used to verify your identity when connecting to the cluster. Note that if the selected PAT is deleted, the associated compute resources will no longer function. You can generate a new PAT by clicking the "Create new Personal Access Token" link at the bottom of the corresponding dropdown list or by going to your Personal Settings page by navigating to the Security section.

After configuring the settings, click the Continue button in the upper-right corner. You’ll be redirected to a page where you can create a nodepool related to the cluster.

Create a Nodepool

After clicking the Continue button upon creating a cluster, you’ll be redirected to a page where you can specify the configurations for your new nodepool.

Alternatively, you can create a new nodepool from an existing cluster by clicking the Create a Nodepool button in the upper-right corner of the cluster's page.

These are the configurations options you can set for your new nodepool:

  • Instance Configuration — Provide an ID that helps identify the nodepool to use when deploying your models. We recommend an easy-to-remember ID that’s related to the nodepool’s use case. Optionally, provide a short description that summarizes the details related to the nodepool.

  • Node Autoscaling Range — Specify the minimum and maximum number of nodes that the system can automatically scale within a nodepool, based on the workload demand. This means that the system will spin up more nodes to handle increased traffic or scale down when demand decreases to optimize costs. For instance, you can set your nodepool to scale between 1 and 5 nodes, depending on how many requests your model is processing. A minimum value of 1 (rather than 0) prevents cold start delays after inactivity, which is essential for meeting latency requirements, though it ensures that at least one node will always be running, which incurs compute costs. Alternatively, setting the minimum to 0 eliminates costs during idle periods but may introduce cold start delays when traffic resumes.

  • Instance Type — Select the instance type you would like the deployment to run on. You can find an explanation of the available instance types here.

  • Spot Instances (default is off) — Enable this option if you want to rent spare, unused compute capacity at significantly lower prices compared to regular on-demand instances. If no spot instances are available, Clarifai will automatically fall back to on-demand instances. Note that spot instances can be terminated if capacity is needed elsewhere, making your node temporarily unavailable. For greater reliability, leave this option unchecked to use only on-demand instances.

After configuring the settings, click the Create button in the upper-right corner. You'll then be redirected to your cluster's page, where the newly created nodepool will be listed in a table.

If you click on a nodepool listed in the table, you'll be taken to its individual page where you can view its detailed information, such as the cluster type, instance type, and any resource deployments associated with it.