Skip to main content

Evaluating Models

Learn about our model evaluation tools

After successfully training your model, it’s crucial to test its performance before deploying it in a production environment. Our model evaluation tool allows you to perform cross-validation on a specified model version, which provides insights into its performance metrics.

Once the evaluation is complete, you can view various metrics about the model’s behavior. This helps you to:

  • Refine the model further and enhance its performance.
  • Understand the model's strengths and weaknesses before deploying it in a real-world scenario.
  • Perform a comparison between different versions to select the best performing one.
model types supported

We currently support evaluating the following model types: visual classifiers, visual detectors, text classifiers, transfer learning models, and fine-tuned LLMs.

How It Works

Evaluating classifier models is performed using K-split cross-validation on the provided test data.

cross validation

This is how the cross-validation process works:

  1. Data splitting: Randomly set aside a 1/K subset of the evaluation data as a test set.
  2. Model training: Train a new model on the remaining data.
  3. Prediction: Use the newly trained model to make predictions on the test set.
  4. Comparison: Compare the predictions with the actual labels of the test set.
  5. Repetition: Repeat steps 1 to 4 across K-splits to average out the evaluation results.


To successfully run the evaluation on a model, it must meet the following criteria:

  • It should be a custom-trained model with a version you've created.
  • It should have at least two concepts.
  • There should be at least ten evaluation training inputs per concept (although at least 50 inputs per concept is recommended for more reliable results).

The evaluation may result in an error if the model version doesn’t satisfy the requirements above.

Cross-App Evaluation

Cross-app evaluation refers to evaluating the performance of a model version using datasets within your current application or those from another application under your ownership. This means that you can assess how well your model performs across various contexts or use cases by leveraging datasets from separate applications within your ownership.

You can run the evaluation on a specific version of your custom model on the Community platform. To do so, navigate to the version table of your model and locate the version of the model you want to evaluate.

The Evaluation Dataset column in the table allows you to select a dataset to use to assess the performance of your model version. If you click the field, a drop-down list emerges, enabling you to select a dataset version to perform the evaluation.

This selection can include datasets within your current application or those from another application under your ownership, which facilitates cross-app evaluation.

Also, if you click the Evaluate with a new Dataset button, a small window will pop up. Within this window, you can choose a new dataset not included in the drop-down list, along with its version, for conducting the evaluation. If you do not select a dataset version, the latest one will be automatically used.

After selecting the dataset you want to use for the evaluation, click the Calculate button. This will start the evaluation process.

The evaluation may take some time. Once complete, the Calculate button will become a View Results button, which you can click to see the evaluation results.

Interpreting Evaluations

To learn how to interpret the evaluation results for your classification and detection models, check the next sections.