Skip to main content

Compute Orchestration

Orchestrate your AI workloads better, avoid vendor lock-in, and use compute spend efficiently


note

Compute Orchestration is currently in Public Preview. To request access, please contact us here.

Clarifai’s Compute Orchestration provides efficient capabilities for you to deploy any model on any compute infrastructure, at any scale. These new platform capabilities bring the convenience of serverless autoscaling to any environment, regardless of deployment location or hardware, and dynamically scale resources to meet workload demands.

Clarifai handles the containerization, model packing, time slicing, and other performance optimizations on your behalf.

Previously, our platform supported the following deployment options:

  • Shared SaaS (Serverless) — This is our default offering, which abstracts away infrastructure management and allows users to easily deploy models without worrying about the underlying compute resources. In this option, Clarifai maintains multi-tenant GPU pools users can access on-demand.

  • Full Platform Deployment — This option is designed for organizations with high-security requirements. It deploys both the Clarifai control and compute planes into the user’s preferred cloud, on-premises, or air-gapped infrastructure, ensuring full isolation.

With Compute Orchestration, we are now providing users with the ability to manage any compute planes and access dedicated compute options. These capabilities enable our enterprise customers to deploy production models with enhanced control, performance, and scalability — while addressing specific problems around compute costs, latency, and control over hosted models.

Compute Orchestration allows us to provide the following additional deployment options — all of which can be customized with your preferred settings for autoscaling, cold start, and more, ensuring maximum cost efficiency and performance:

  • Dedicated SaaS — Provides exclusive access to Clarifai-managed nodes with customizable configurations. This is currently available in AWS US-East region, with plans to expand to other cloud providers and hardware options.

  • Self-Managed VPC (Virtual Private Cloud) — Users securely connect their own cloud provider VPC, enabling Clarifai to orchestrate deployments within the user’s cloud environment while leveraging existing cloud compute or spend commitments.

  • Self-Managed On-Premises — Users securely connect their own on-premises or bare-metal infrastructure to leverage existing compute investments, which Clarifai then orchestrates for model deployment.

  • Multi-Site Deployment — Supports deployments across multiple self-managed compute sources, with potential for future multi-cloud or multi-region dedicated SaaS solutions.

info

If you’re not using Compute Orchestration, the Shared SaaS (Serverless) deployment remains the default option.

Compute Clusters and Nodepools

We use clusters and nodepools to organize and manage the compute resources required for the Compute Orchestration capabilities.

Cluster

A compute cluster in Clarifai acts as the overarching computational environment where models are executed, whether for training or inference.

nodepool

A nodepool refers to a set of dedicated nodes (virtual machine instances) within a cluster that share similar configurations and resources, such as CPU or GPU type, memory size, and other performance parameters.

Cluster configuration lets you specify where and how your models are run, ensuring better performance, lower latency, and adherence to regional regulations. You can specify a cloud provider, such as AWS, that will provide the underlying compute infrastructure for hosting and running your models. You can also specify the geographic location of the data center where the compute resources will be hosted.

Nodepools are an important part of how compute resources are operated within a cluster. They provide flexibility in choosing the type of instances used to run your machine learning models and workflows and help determine how resources are scaled to meet demand.

Nodepools specify the accelerator and instance that will run your models and other workloads. Accelerators are specialized hardware resources, such as GPUs or dedicated ML chips used for computation.

Each nodepool can run containers or workloads, and you can have multiple nodepools within a single cluster to support different types of workloads or performance requirements. These nodes execute tasks like model training, inference, and workflow orchestration within a compute cluster.

With compute orchestration, you can ensure these nodepools are properly scaled up or down depending on the workload's size, complexities, and costs.

Benefits of Compute Orchestration

1. Performance and Deployment Flexibility

  • It provides access to a wide range of accelerator options tailored to your use case. You can configure multiple compute clusters each tailored to your AI development stage, performance requirements, and budget. You can also run affordable proof of concepts or compute-heavy LLMs or LVMs in production all from a single product.

  • It offers flexibility to make deployments in any cloud service provider, on-premises, or air-gapped environment, allowing users to leverage their hardware of choice without being locked into a single vendor. Or, you can make deployments in Clarifai’s compute to avoid having to worry about managing infrastructure.

  • You can customize auto-scaling settings to prevent cold-start issues and handle traffic swings; and scale down to zero for cost efficiency. The ability to scale from zero to infinity ensures both flexibility and cost management.

  • Just like with our previous offerings, we ensure efficient resource usage and cost savings through bin-packing (running multiple models per GPU), time slicing, and other optimizations.

2. Enhanced Security

  • Users can run compute planes within their own cloud service provider or on-premise environments and securely connect to Clarifai’s control plane, while only having to open outbound ports for traffic. This reduces networking complexities and security risks compared to opening inbound access or configuring cloud Identity and Access Management (IAM) access roles within your VPC.

  • Nodepool-based compute allows users to keep their resources isolated and provides precise control over scaling models and nodes. This allows users to specify where models are executed, addressing compliance and security needs for regulated industries.

  • Clarifai offers fine-grained access control across apps, teams, users, and compute resources.

  • Users can group CPU and GPU types into dedicated scaling nodepools, enabling them to handle diverse workloads or team-specific requirements while enhancing security and resource management.

3. Use Compute Cost-Efficiently and Abstract Away Complexity

  • An intuitive control plane enables users to efficiently govern access to AI resources, monitor performance, and manage costs. Clarifai’s expertly designed platform takes care of dependencies and optimizations, offering features like model packing, streamlined dependency management, and customizable autoscaling options — including scale-to-zero for both model replicas and compute nodes.

  • The advanced optimizations deliver exceptional efficiency, with model packing reducing compute usage by up to 3.7x and enabling support for over 1.6 million inputs per second with an impressive 99.9997% reliability. Depending on the chosen configuration, customers can achieve cost savings of at least 60%, and in some cases, up to 90%.

  • Organizations with pre-committed cloud spend or compute contracts with major cloud service providers, like AWS, Azure, or GCP, or existing GPU and hardware investments, can efficiently leverage their compute using Clarifai Compute Orchestration.

4. New Inference Capabilities and Developer Experience Improvements

  • Resourceful features such as inference streaming improve time-to-first-token for LLM generations.

  • Faster cold starts and optimized frameworks improve performance for critical workloads.

  • Continuous batching is available to reduce costs by processing multiple inference requests in batches.

  • Clarifai containerizes your desired models into Docker images, ensuring model package requirements are encapsulated in a portable environment and dependencies are handled automatically.

  • Low-latency deployment minimizes gRPC hops, speeding up communication.

  • New model types are easily supported with a unified protobuf format, and local inference runners allow users to test models before deploying to the cloud.