Compute Orchestration
Train and deploy any model on any compute infrastructure, at any scale
Compute Orchestration is currently in Public Preview. To request access, please contact us here.
Clarifai’s Compute Orchestration provides an efficient system for you to deploy any model on any compute infrastructure, at any scale. This new platform capability brings the convenience of serverless autoscaling to any environment, regardless of deployment location or hardware, and dynamically scales resources to meet workload demands.
Clarifai handles the containerization, model packing, time slicing, and other performance optimizations on your behalf.
Previously, our platform supported the following deployment options:
-
Shared SaaS (Serverless) — This is our default offering, which abstracts away infrastructure management and allows users to easily deploy models without worrying about the underlying compute resources. In this option, Clarifai maintains multi-tenant GPU pools users can access on-demand.
-
Full Platform Deployment — This option is designed for organizations with high-security requirements. It deploys both the Clarifai control and compute planes into the user’s preferred cloud, on-premises, or air-gapped infrastructure, ensuring full isolation.
With Compute Orchestration, we are now providing users with the ability to manage any compute planes and access dedicated compute options. This system enables our enterprise customers to deploy production models with enhanced control, performance, and scalability — while addressing specific problems around compute costs, latency, and control over hosted models.
Compute Orchestration allows us to provide the following additional deployment options — all of which can be customized with your preferred settings for autoscaling, cold start, and more, ensuring maximum cost efficiency and performance:
-
Dedicated SaaS — Provides exclusive access to Clarifai-managed nodes with customizable configurations. This is currently available in AWS US-East region, with plans to expand to other cloud providers and hardware options.
-
Self-Managed VPC (Virtual Private Cloud) — Users securely connect their own cloud provider VPC, enabling Clarifai to orchestrate deployments within the user’s cloud environment while leveraging existing cloud compute or spend commitments.
-
Self-Managed On-Premises — Users securely connect their own on-premises or bare-metal infrastructure to leverage existing compute investments, which Clarifai then orchestrates for model deployment.
-
Multi-Site Deployment — Supports deployments across multiple self-managed compute sources, with potential for future multi-cloud or multi-region dedicated SaaS solutions.
If you’re not using Compute Orchestration, the Shared SaaS (Serverless) deployment remains the default option.
Compute Clusters and Nodepools
We use clusters and nodepools to organize and manage the compute resources required for the Compute Orchestration system.
A compute cluster in Clarifai acts as the overarching computational environment where models are executed, whether for training or inference. A nodepool refers to a set of dedicated nodes (virtual machine instances) within a cluster that share similar configurations and resources, such as CPU or GPU type, memory size, and other performance parameters.
Cluster configuration lets you specify where and how your models are run, ensuring better performance, lower latency, and adherence to regional regulations. You can specify a cloud provider, such as AWS, that will provide the underlying compute infrastructure for hosting and running your models. You can also specify the geographic location of the data center where the compute resources will be hosted.
Nodepools are an important part of how compute resources are operated within a cluster. They provide flexibility in choosing the type of instances used to run your machine learning models and workflows and help determine how resources are scaled to meet demand.
Nodepools specify the accelerator and instance that will run your models and other workloads. Accelerators are specialized hardware resources, such as GPUs or dedicated ML chips used for computation.
Each nodepool can run containers or workloads, and you can have multiple nodepools within a single cluster to support different types of workloads or performance requirements. These nodes execute tasks like model training, inference, and workflow orchestration within a compute cluster.
With compute orchestration, you can ensure these nodepools are properly scaled up or down depending on the workload's size, complexities, and costs.
Benefits of Compute Orchestration
1. Performance and Deployment Flexibility
-
It provides access to a wide range of accelerator options tailored to your use case. You can configure multiple compute clusters each tailored to your AI development stage, performance requirements, and budget. You can also run affordable proof of concepts or compute-heavy LLMs or LVMs in production all from a single product.
-
It offers flexibility to make deployments in any cloud service provider or on-premise environment, allowing users to leverage their hardware of choice without being locked into a single vendor. Or, you can make deployments in Clarifai’s compute to avoid having to worry about managing infrastructure.
-
You can customize auto-scaling settings to prevent cold-start issues and handle traffic swings; and scale down to zero for cost efficiency. The ability to scale from zero to infinity ensures both flexibility and cost management.
-
Just like with our previous offerings, we ensure efficient resource usage and cost savings through bin-packing (running multiple models per GPU), time slicing, and other optimizations.
2. Enhanced Security
-
Users can run compute planes within their own cloud service provider or on-premise environments and securely connect to Clarifai’s control plane, while only having to open outbound ports for traffic. This reduces networking complexities and security risks compared to opening inbound access or configuring cloud IAM access within your VPC.
-
Nodepool-based compute allows users to keep their resources isolated and provides precise control over scaling models and nodes. This allows users to specify where models are executed, addressing compliance and security needs for regulated industries.
-
Clarifai offers fine-grained access control across apps, teams, users, and compute resources.
3. Cloud and Compute Cost Efficiency
- Organizations with pre-committed cloud spend or compute contracts with major cloud service providers, like AWS, Azure, or GCP, or existing GPU and hardware investments, can efficiently leverage their compute using Clarifai Compute Orchestration.
4. New Inference Capabilities and Developer Experience Improvements
-
Resourceful features such as inference streaming improve time-to-first-token for LLM generations.
-
Faster cold starts and optimized frameworks improve performance for critical workloads.
-
Continuous batching is available to reduce costs by processing multiple inference requests in batches.
-
Clarifai containerizes your desired models into Docker images, ensuring model package requirements are encapsulated in a portable environment and dependencies are handled automatically.
-
Low-latency deployment minimizes gRPC hops, speeding up communication.
-
New model types are easily supported with a unified protobuf format, and local inference runners allow users to test models before deploying to the cloud.
📄️ Clusters and Nodepools
Set up your compute cluster and nodepool
📄️ How to Deploy a Model
Deploy any model anywhere, at any scale
📄️ Managing Your Compute
Edit and delete your clusters and nodepools