Skip to main content

Compute Orchestration

Train and deploy any model on any compute infrastructure, at any scale


note

Compute Orchestration is currently in Public Preview. To request access, please contact us here.

Clarifai’s Compute Orchestration offers a streamlined solution for managing the infrastructure required for training, deploying, and scaling machine learning models and workflows.

This flexible system supports any compute instance — across various hardware providers and deployment methods — and provides automatic scaling to match workload demands.

Click here to learn more about our Compute Orchestration system.

Tips
  • Run the following command to clone the repository containing various Compute Orchestration examples: git clone https://github.com/Clarifai/examples.git. After cloning, navigate to the ComputeOrchestration folder to follow along with the tutorial.

  • For a step-by-step tutorial, check the CRUD operations notebook.

Clarifai CLI

Clarifai provides a user-friendly command line interface (CLI) that simplifies Compute Orchestration tasks. With the CLI, you can easily manage the infrastructure required for deploying and scaling machine learning models, even without extensive MLOps expertise. This tool makes it easy to set up clusters, configure nodepools, and deploy models directly from the command line. You can follow its step-by-step tutorial provided here.

Prerequisites

Installation

To begin, install the latest version of the clarifai Python package.

pip install --upgrade clarifai

Get a PAT

You need a PAT (Personal Access Token) key to authenticate your connection to the Clarifai platform. You can generate it in your Personal Settings page by navigating to the Security section.

Then, set it as an environment variable in your script.

import os
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE" # replace with your own PAT key

Set up Project Directory

  • Create a directory to store your project files.
  • Inside this directory, create a Python file for your Compute Orchestration code.
  • Create a configs folder to store your YAML configuration files for clusters, nodepools, and deployments.

In the configs folder:

1. compute_cluster_config.yaml:

compute_cluster:
id: "test-compute-cluster"
description: "My AWS compute cluster"
cloud_provider:
id: "aws"
region: "us-east-1"
managed_by: "clarifai"
cluster_type: "dedicated"
visibility:
gettable: 10

2. nodepool_config.yaml:

nodepool:
id: "test-nodepool"
compute_cluster:
id: "test-compute-cluster"
description: "First nodepool in AWS in a proper compute cluster"
instance_types:
- id: "g5.xlarge"
compute_info:
cpu_limit: "8"
cpu_memory: "16Gi"
accelerator_type:
- "a10"
num_accelerators: 1
accelerator_memory: "40Gi"
node_capacity_type:
capacity_types:
- 1
- 2
max_instances: 1

3. deployment_config.yaml:

deployment:
id: "test-deployment"
description: "some random deployment"
autoscale_config:
min_replicas: 0
max_replicas: 1
traffic_history_seconds: 100
scale_down_delay_seconds: 30
scale_up_delay_seconds: 30
enable_packing: true
worker:
model:
id: "got-ocr-2_0"
model_version:
id: "5d92321db5d341b5b4cf407ab34f618f"
user_id: "stepfun-ai"
app_id: "ocr"
scheduling_choice: 4
nodepools:
- id: "test-nodepool"
compute_cluster:
id: "test-compute-cluster"

Optionally, if you want to use the Clarifai CLI, create a login configuration file for storing your account credentials:

user_id: "YOUR_USER_ID_HERE"
pat: "YOUR_PAT_HERE"

Then, authenticate your CLI session with Clarifai using the stored credentials in the configuration file:

$ clarifai login --config <config-filepath>

Cluster Operations

Create a Compute Cluster

To create a new compute cluster, pass the compute_cluster_id and config_filepath as arguments to the create_compute_cluster method of the User class.

from clarifai.client.user import User
import os

# Set the PAT key
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"

# Initialize the client
client = User(
user_id="YOUR_USER_ID_HERE",
base_url="https://api.clarifai.com"
)

# Create a new compute cluster
compute_cluster = client.create_compute_cluster(
compute_cluster_id="test-compute-cluster",
config_filepath="./configs/compute_cluster_config.yaml"
)

Get a Cluster

To get a specific compute cluster, pass the compute_cluster_id to the compute_cluster method of the User class.

from clarifai.client.user import User
import os

# Set the PAT key
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"

# Initialize the client
client = User(
user_id="YOUR_USER_ID_HERE",
base_url="https://api.clarifai.com"
)

# Get and print the compute cluster
compute_cluster = client.compute_cluster(
compute_cluster_id="test-compute-cluster"
)
print(compute_cluster)

List All Clusters

To list your existing compute clusters, call the list_compute_clusters method of the User class.

from clarifai.client.user import User
import os

# Set the PAT key
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"

# Initialize the client
client = User(
user_id="YOUR_USER_ID_HERE",
base_url="https://api.clarifai.com"
)

# Get and print all the compute clusters
all_compute_clusters = list(
client.list_compute_clusters()
)
print(all_compute_clusters)

Initialize the ComputeCluster Class

To initialize the ComputeCluster class, provide the user_id and compute_cluster_id as parameters.

Initialization is essential because it establishes the specific user and compute cluster context, which allows the subsequent operations to accurately target and manage the intended resources.

from clarifai.client.compute_cluster import ComputeCluster

# Initialize the ComputeCluster instance
compute_cluster = ComputeCluster(
user_id="YOUR_USER_ID_HERE",
compute_cluster_id="test-compute-cluster",
base_url="https://api.clarifai.com"
)

Nodepool Operations

Create a Nodepool

To create a new nodepool, use the create_nodepool method with the nodepool_id and config_filepath as parameters.

from clarifai.client.compute_cluster import ComputeCluster
import os

# Set the PAT key
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"

# Initialize the ComputeCluster instance
compute_cluster = ComputeCluster(
user_id="YOUR_USER_ID_HERE",
compute_cluster_id="test-compute-cluster",
base_url="https://api.clarifai.com"
)

# Create a new nodepool
nodepool = compute_cluster.create_nodepool(
nodepool_id="test-nodepool",
config_filepath="./configs/nodepool_config.yaml"
)

Get a Nodepool

To get a specific nodepool, provide the nodepool_id to the nodepool method of the ComputeCluster class.

from clarifai.client.compute_cluster import ComputeCluster
import os

# Set the PAT key
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"

# Initialize the ComputeCluster instance
compute_cluster = ComputeCluster(
user_id="YOUR_USER_ID_HERE",
compute_cluster_id="test-compute-cluster",
base_url="https://api.clarifai.com"
)

# Get and print the nodepool
nodepool = compute_cluster.nodepool(
nodepool_id="test-nodepool"
)
print(nodepool)

List All Nodepools

To list the existing nodepools, call the list_nodepools method of the ComputeCluster class.

from clarifai.client.compute_cluster import ComputeCluster
import os

# Set the PAT key
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"

# Initialize the ComputeCluster instance
compute_cluster = ComputeCluster(
user_id="YOUR_USER_ID_HERE",
compute_cluster_id="test-compute-cluster",
base_url="https://api.clarifai.com"
)

# Get and print all the nodepools
all_nodepools = list(
compute_cluster.list_nodepools()
)
print(all_nodepools)

Initialize the Nodepool Class

To initialize the Nodepool class, provide the user_id and nodepool_id parameters.

from clarifai.client.nodepool import Nodepool

# Initialize the Nodepool instance
nodepool = Nodepool(
user_id="YOUR_USER_ID_HERE",
nodepool_id="test-nodepool",
base_url="https://api.clarifai.com"
)

Deployment Operations

Create a Deployment

To deploy a model within a nodepool, provide the deployment_id and config_filepath parameters to the create_deployment method of the Nodepool class.

note

Each model or workflow can only have one deployment per nodepool.

from clarifai.client.nodepool import Nodepool
import os

# Set the PAT key
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"

# Initialize the Nodepool instance
nodepool = Nodepool(
user_id="YOUR_USER_ID_HERE",
nodepool_id="test-nodepool",
base_url="https://api.clarifai.com"
)

# Create a new deployment
deployment = nodepool.create_deployment(
deployment_id="test-deployment",
config_filepath="./configs/deployment_config.yaml"
)

Get a Deployment

To get a specific deployment, provide the deployment_id to the deployment method of the Nodepool class.

from clarifai.client.nodepool import Nodepool
import os

# Set the PAT key
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"

# Initialize the Nodepool instance
nodepool = Nodepool(
user_id="YOUR_USER_ID_HERE",
nodepool_id="test-nodepool",
base_url="https://api.clarifai.com"
)

## Get and print the deployment
deployment = nodepool.deployment(
deployment_id="test-deployment"
)
print(deployment)

List All Deployments

To list existing deployments, call the list_deployments method of the Nodepool class.

from clarifai.client.nodepool import Nodepool
import os

# Set the PAT key
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"

# Initialize the Nodepool instance
nodepool = Nodepool(
user_id="YOUR_USER_ID_HERE",
nodepool_id="test-nodepool",
base_url="https://api.clarifai.com"
)

# Get and print all the deployments
all_deployments = list(
nodepool.list_deployments()
)
print(all_deployments)

Initialize the Deployment Class

To initialize the Deployment class, provide the user_id and deployment_id parameters.

from clarifai.client.deployment import Deployment

# Initialize the deployment
deployment = Deployment(
user_id="YOUR_USER_ID_HERE",
deployment_id="test-deployment",
base_url="https://api.clarifai.com"
)

Predict With Deployed Model

Once your model is deployed, it can be used to make predictions by calling the appropriate prediction methods. Clarifai's Compute Orchestration system offers different types of prediction calls to suit various use cases.

To ensure proper routing and execution, you must specify the deployment_id parameter. This parameter is essential for routing prediction requests within the Clarifai's Compute Orchestration system.

tip

The following examples illustrate how to make predictions with inputs provided as publicly accessible URLs. Click here to learn how to make predictions using other types of inputs and models.

Unary-Unary Predict Call

This is the simplest type of prediction. In this method, a single input is sent to the model, and it returns a single response. This is ideal for tasks where a quick, non-streaming prediction is required, such as classifying an image.

It supports the following prediction methods:

  • predict_by_url — Use a publicly accessible URL for the input.
  • predict_by_bytes — Pass raw input data directly.
  • predict_by_filepath — Provide the local file path for the input.
from clarifai.client.model import Model

model_url = "https://clarifai.com/stepfun-ai/ocr/models/got-ocr-2_0"

# URL of the image to analyze
image_url = "https://samples.clarifai.com/featured-models/model-ocr-scene-text-las-vegas-sign.png"

# Initialize the model
model = Model(
url=model_url,
pat="YOUR_PAT_HERE"
)

# Make a prediction using the model with the specified compute cluster and nodepool
model_prediction = model.predict_by_url(
image_url,
input_type="image",
deployment_id="test-deployment"
)

# Print the output
print(model_prediction.outputs[0].data.text.raw)

Unary-Stream Predict Call

The Unary-Stream predict call processes a single input, but returns a stream of responses. It is particularly useful for tasks where multiple outputs are generated from a single input, such as generating text completions from a prompt.

It supports the following prediction methods:

  • generate_by_url — Provide a publicly accessible URL and handle the streamed responses iteratively.
  • generate_by_bytes — Use raw input data.
  • generate_by_filepath — Use a local file path for the input.
from clarifai.client.model import Model

model_url = "https://clarifai.com/meta/Llama-3/models/llama-3_2-3b-instruct"

# URL of the prompt text
text_url = "https://samples.clarifai.com/featured-models/falcon-instruction-guidance.txt"

# Initialize the model
model = Model(
url=model_url,
pat="YOUR_PAT_HERE"
)

# Perform unary-stream prediction with the specified compute cluster and nodepool
stream_response = model.generate_by_url(
text_url,
input_type="text",
deployment_id="test-deployment"
)

# Handle the stream of responses
list_stream_response = [response for response in stream_response]

Stream-Stream Predict Call

The stream-stream predict call enables streaming of both inputs and outputs, making it highly effective for processing large datasets or real-time applications.

In this setup, multiple inputs can be continuously sent to the model, and the corresponding predictions are streamed back in real-time. This is ideal for tasks like real-time video processing/predictions or live sensor data analysis.

It supports the following prediction methods:

  • stream_by_url — Stream a list of publicly accessible URLs and receive a stream of predictions. It takes an iterator of inputs and returns a stream of predictions.
  • stream_by_bytes — Stream raw input data.
  • stream_by_filepath — Stream inputs from local file paths.
from clarifai.client.model import Model

model_url = "https://clarifai.com/meta/Llama-3/models/llama-3_2-3b-instruct"

# URL of the prompt text
text_url = "https://samples.clarifai.com/featured-models/falcon-instruction-guidance.txt"

# Initialize the model
model = Model(
url=model_url,
pat="YOUR_PAT_HERE"
)

# Perform stream-stream prediction with the specified compute cluster and nodepool
stream_response = model.stream_by_url(
iter([text_url]),
input_type="text",
deployment_id="test-deployment"
)

# Handle the stream of responses
list_stream_response = [response for response in stream_response]

Delete Resources

Delete Deployments

To delete deployments, pass a list of deployment IDs to the delete_deployments method of the Nodepool class.

from clarifai.client.nodepool import Nodepool
import os

# Set the PAT key
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"

# Initialize the Nodepool instance
nodepool = Nodepool(
user_id="YOUR_USER_ID_HERE",
nodepool_id="test-nodepool",
base_url="https://api.clarifai.com"
)

# Get all the deployments in the nodepool
all_deployments = list(nodepool.list_deployments())

# Extract deployment IDs for deletion
deployment_ids = [deployment.id for deployment in all_deployments]

# Delete a specific deployment by providing its deployment ID
# deployment_ids = ["test-deployment"]

# Delete the deployments
nodepool.delete_deployments(deployment_ids=deployment_ids)

Delete Nodepools

To delete nodepools, provide a list of nodepool IDs to the delete_nodepools method of the ComputeCluster class.

from clarifai.client.compute_cluster import ComputeCluster
import os

# Set the PAT key
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"

# Initialize the ComputeCluster instance
compute_cluster = ComputeCluster(
user_id="YOUR_USER_ID_HERE",
compute_cluster_id="test-compute-cluster",
base_url="https://api.clarifai.com"
)

# Get all nodepools within the compute cluster
all_nodepools = list(compute_cluster.list_nodepools())

# Extract nodepool IDs for deletion
nodepool_ids = [nodepool.id for nodepool in all_nodepools]

# Delete a specific nodepool by providing its ID
# nodepool_ids = ["test-nodepool"]

# Delete the nodepools
compute_cluster.delete_nodepools(nodepool_ids=nodepool_ids)

Delete Compute Clusters

To delete compute clusters, provide a list of compute cluster IDs to the delete_compute_clusters method of the User class.

from clarifai.client.user import User
import os

# Set the PAT key
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE"

# Initialize the User client
client = User(
user_id="YOUR_USER_ID_HERE",
base_url="https://api.clarifai.com"
)

# Get all compute clusters associated with the user
all_compute_clusters = list(client.list_compute_clusters())

# Extract compute cluster IDs for deletion
compute_cluster_ids = [compute_cluster.id for compute_cluster in all_compute_clusters]

# Delete a specific nodepool by providing its ID
# compute_cluster_ids = ["test-compute-cluster"]

# Delete the compute clusters
client.delete_compute_clusters(compute_cluster_ids=compute_cluster_ids)