Skip to main content

Evaluating Models

Learn about model evaluation tools

Now that you've successfully trained the model, you may want to test its performance before using it in a production environment. The model evaluation tool allows you to perform cross-validation on a specified model version.

Once the evaluation is complete, you can view the various metrics that inform the model’s performance.

How It Works

Model evaluation performs a K-split cross-validation on data you used to train your custom model.

cross validation

In the cross-validation process, it will:

  1. Set aside a random 1/K subset of the training data and designate it as a test set;
  2. Train a new model with the remaining training data;
  3. Pass the test set data through this new model to make predictions;
  4. Compare the predictions against the test set’s actual labels;
  5. Repeat steps 1) through 4) across K splits to average out the evaluation results.


To run the evaluation on your custom model, it should meet the following criteria:

  • It should be a custom-trained model version with the following:
    1. At least two concepts;
    2. At least ten training inputs per concept (at least 50 inputs per concept is recommended).

The evaluation may result in an error if the model version doesn’t satisfy the requirements above.

Running Evaluation

You can run the evaluation on a specific version of your custom model on the Community platform.

To do so, on your application's page, select the App Models option on the collapsible left sidebar. On your Models listing page, select the model you'd like to evaluate its performance.

You'll be redirected to the individual model's page. Under the Versions tab, go to the version of the model you want to evaluate its performance. And under the ROC column, click the Calculate button, which will start the evaluation process.

The evaluation may take up to 30 minutes. Once it is complete, the Calculate button will become a View Results button.

Click the View Results button to see the evaluation results.

For more information on how to interpret the evaluation results, check the next Interpreting Evaluations section.