Skip to main content

Compute Orchestration

Train and deploy any model on any compute infrastructure, at any scale


note

Compute Orchestration is currently in Public Preview. To request access, please contact us here.

Clarifai’s Compute Orchestration offers a streamlined solution for managing the infrastructure required for training, deploying, and scaling machine learning models and workflows.

These flexible capabilities support any compute instance — across various hardware providers and deployment methods — and provide automatic scaling to match workload demands.

Click here to learn more about our Compute Orchestration capabilities.

Tips
  • Run the following command to clone the repository containing various Compute Orchestration examples: git clone https://github.com/Clarifai/examples.git. After cloning, navigate to the ComputeOrchestration folder to follow along with this tutorial.

  • For a step-by-step tutorial, see the CRUD operations notebook.

  • Clarifai provides a user-friendly command line interface (CLI) that simplifies Compute Orchestration tasks. You can follow its step-by-step tutorial provided here.

Prerequisites

Installation

To begin, install the latest version of the clarifai Python package.

pip install --upgrade clarifai

Get a PAT

You need a PAT (Personal Access Token) key to authenticate your connection to the Clarifai platform. You can generate it in your Personal Settings page by navigating to the Security section.

Then, set it as an environment variable in your script.

import os
os.environ["CLARIFAI_PAT"] = "YOUR_PAT_HERE" # replace with your own PAT key

Set up Project Directory

  • Create a directory to store your project files.
  • Inside this directory, create a Python file for your Compute Orchestration code.
  • Create a configs folder to store your YAML configuration files for clusters, nodepools, and deployments.

Then, create the following files in the configs folder:

1. compute_cluster_config.yaml:

compute_cluster:
id: "test-compute-cluster"
description: "My AWS compute cluster"
cloud_provider:
id: "aws"
region: "us-east-1"
managed_by: "clarifai"
cluster_type: "dedicated"
visibility:
gettable: 10

2. nodepool_config.yaml:

nodepool:
id: "test-nodepool"
compute_cluster:
id: "test-compute-cluster"
description: "First nodepool in AWS in a proper compute cluster"
instance_types:
- id: "g5.xlarge"
compute_info:
cpu_limit: "8"
cpu_memory: "16Gi"
accelerator_type:
- "a10"
num_accelerators: 1
accelerator_memory: "40Gi"
node_capacity_type:
capacity_types:
- 1
- 2
max_instances: 1

3. deployment_config.yaml:

deployment:
id: "test-deployment"
description: "some random deployment"
autoscale_config:
min_replicas: 0
max_replicas: 1
traffic_history_seconds: 100
scale_down_delay_seconds: 30
scale_up_delay_seconds: 30
disable_packing: false
worker:
model:
id: "apparel-clusterering"
model_version:
id: "cc911f6b0ed748efb89e3d1359c146c4"
user_id: "clarifai"
app_id: "main"
scheduling_choice: 4
nodepools:
- id: "test-nodepool"
compute_cluster:
id: "test-compute-cluster"

Optionally, if you want to use the Clarifai CLI, create a login configuration file for storing your account credentials:

user_id: "YOUR_USER_ID_HERE"
pat: "YOUR_PAT_HERE"

Then, authenticate your CLI session with Clarifai using the stored credentials in the configuration file:

$ clarifai login --config <config-filepath>