Skip to main content

Evaluating Models

Evaluate a model's performance


Now that you've successfully trained the model, you may want to test its performance before using it in a production environment. The Model Evaluation tool allows you to perform a cross validation on a specified model version. Once the evaluation is complete, you can view the various metrics that inform the model’s performance.

How It Works

Model Evaluation performs a K-split cross validation on data you used to train your custom model.

cross validation

In the cross validation process, it will:

  1. Set aside a random 1/K subset of the training data and designate as a test set;
  2. Train a new model with the remaining training data;
  3. Pass the test set data through this new model to make predictions;
  4. Compare the predictions against the test set’s actual labels; and,
  5. Repeat steps 1) through 4) across K splits to average out the evaluation results.

Requirements

To run the evaluation on your custom model, it should meet the following criteria:

  • It should be a custom trained model version with:
    1. At least 2 concepts.
    2. At least 10 training inputs per concept (at least 50 inputs per concept is recommended).
caution

The evaluation may result in an error if the model version doesn’t satisfy the requirements above.

Running Evaluation

Below is an example of how you would run an evaluation on a specific version of a custom model.

Note that the initialization code used here is outlined in detail on the client installation page.

####################################################################################
# In this section, we set the user authentication, app ID, and the model's
# details. Change these strings to run your own example.
###################################################################################

USER_ID = 'YOUR_USER_ID_HERE'
# Your PAT (Personal Access Token) can be found in the portal under Authentification
PAT = 'YOUR_PAT_HERE'
APP_ID = 'YOUR_APP_ID_HERE'
# Change these to evaluate your own model
MODEL_ID = 'YOUR_MODEL_ID_HERE'
MODEL_VERSION_ID = 'YOUR_MODEL_VERSION_HERE'

##########################################################################
# YOU DO NOT NEED TO CHANGE ANYTHING BELOW THIS LINE TO RUN THIS EXAMPLE
##########################################################################

from clarifai_grpc.channel.clarifai_channel import ClarifaiChannel
from clarifai_grpc.grpc.api import resources_pb2, service_pb2, service_pb2_grpc
from clarifai_grpc.grpc.api.status import status_code_pb2

channel = ClarifaiChannel.get_grpc_channel()
stub = service_pb2_grpc.V2Stub(channel)

metadata = (('authorization', 'Key ' + PAT),)

userDataObject = resources_pb2.UserAppIDSet(user_id=USER_ID, app_id=APP_ID)

post_model_version_metrics = stub.PostModelVersionMetrics(
service_pb2.PostModelVersionMetricsRequest(
user_app_id=userDataObject,
model_id=MODEL_ID,
version_id=MODEL_VERSION_ID
),
metadata=metadata
)

if post_model_version_metrics.status.code != status_code_pb2.SUCCESS:
print(post_model_version_metrics.status)
raise Exception("Evaluate model failed, status: " + post_model_version_metrics.status.description)

Once the evaluation is complete, you can retrieve the results and analyze the performance of your custom model.

We'll talk about how to interpret a model's evaluation results in the next section.

tip

You can also learn how to perform evaluation on the Portal here.