Deployments

Note: Although the video above showcases an older version of our UI, it still accurately illustrates how to deploy models on our platform.

Deployments bring your models to life by running them on dedicated compute, ready to serve predictions at scale with configurable replicas, autoscaling, and cost controls.

A deployment defines how a model runs on a selected nodepool, acting as the bridge between your model and the underlying infrastructure.

For advanced use cases, a single model can be deployed across multiple nodepools to optimize for different workloads, availability, or performance requirements.

With deployments you can:

Serve models with low-latency, production-grade inference
Auto-scale replicas based on traffic demand
Monitor health, usage, and cost in real time
Run on dedicated infrastructure with full resource isolation

📄️ Create Clusters and Nodepools

Set up capabilities that match your computational needs

📄️ Deploy a Model

Deploy a model into your created cluster and nodepool

📄️ Manage Your Compute

Edit and delete deployments, clusters, and nodepools