Deploy Your First Model

Quickly set up infrastructure for inference

Clarifai provides an intuitive interface that makes it easy to provision compute infrastructure for running your models.

Deployment allows you to configure and activate the infrastructure needed to serve model predictions. With just a few simple steps, you can deploy a trained model and start generating predictions.

note

Cloud model deployment via Clarifai requires a paid plan. You can try local deployment with Local Runners for free.

Start by logging in to your Clarifai account, or sign up for a new one to unlock access to the platform’s powerful AI capabilities.

Step 2: Get a Model

The Clarifai’s platform offers a wide selection of cutting-edge models ready for integration into your AI projects.

To quickly find a model, open the collapsible left sidebar and select the Home option. On the homepage, explore the Trending AI Models section, where you’ll find popular and ready-to-use models highlighted for easy access.

After finding a model, click the DEPLOY THE MODEL button in the bottom right corner of its information card.

For this example, we'll use the Llama-3_2-3B-Instruct model.

Step 3: Review Your Compute Instances

After clicking the button, a pop-up window will appear showing the available pre-configured compute instance for deployment, along with a pre-filled Personal Access Token (PAT) for authentication.

Review the options and choose the one that best fits your needs.

Basic Compute — Recommended for development and quick tests, offering reliable, low-cost performance. For this example, we'll go with this option.
Advanced Compute — Ideal for large-scale production inference or training of complex models.

tip

If you prefer more control and want to deploy the model using an existing cluster and nodepool, click the provided link in the pop-up window. This will allow you to customize the deployment based on your needs.

Then, click the Deploy button.

A toast notification will appear at the top of the page confirming that a cluster and nodepool have been successfully created using the pre-configured settings, and the model has been deployed within this infrastructure.

You’ll then be automatically redirected to the newly created nodepool page, where you can view your compute settings and the deployed model.

Step 4: Run Inferences

To make a prediction using your deployed model, start by navigating to its individual page. You can do this by clicking the model listed on the nodepool page.

Next, on the deployed model’s page, click the Open in Playground button in the upper-right corner.

Note: On the model’s individual page, open the Compute tab to view details of the compute environment where it’s deployed.

You’ll be taken to the Playground interface, where you can enter your prompt in the message box to run inferences using your deployed model. You can also try one of the predefined prompt examples.

Next, submit your request by clicking the arrow icon in the message box or pressing Enter on your keyboard.

The response will be streamed directly on the interface, allowing you to view the output in real time.

Step 1: Sign Up or Log In​

Step 2: Get a Model​

Step 3: Review Your Compute Instances​

Step 4: Run Inferences​

Step 1: Sign Up or Log In

Step 2: Get a Model

Step 3: Review Your Compute Instances

Step 4: Run Inferences