Deployments
Note: Although the video above showcases an older version of our UI, it still accurately illustrates how to deploy models on our platform.
Deployments bring your models to life by running them on dedicated compute, ready to serve predictions at scale with configurable replicas, autoscaling, and cost controls.
A deployment defines how a model runs on a selected nodepool, acting as the bridge between your model and the underlying infrastructure.
For advanced use cases, a single model can be deployed across multiple nodepools to optimize for different workloads, availability, or performance requirements.
With deployments you can:
- Serve models with low-latency, production-grade inference
- Auto-scale replicas based on traffic demand
- Monitor health, usage, and cost in real time
- Run on dedicated infrastructure with full resource isolation
📄️ Create Clusters and Nodepools
Set up capabilities that match your computational needs
📄️ Deploy a Model
Deploy a model into your created cluster and nodepool
📄️ Manage Your Compute
Edit and delete deployments, clusters, and nodepools