Skip to main content

Evaluating Models

Evaluate a model's performance


Now that you've successfully trained the model, you may want to test its performance before using it in a production environment. The Model Evaluation tool allows you to perform a cross validation on a specified model version. Once the evaluation is complete, you can view the various metrics that inform the model’s performance.

How It Works

Model Evaluation performs a K-split cross validation on data you used to train your custom model.

cross validation

In the cross validation process, it will:

  1. Set aside a random 1/K subset of the training data and designate as a test set;
  2. Train a new model with the remaining training data;
  3. Pass the test set data through this new model to make predictions;
  4. Compare the predictions against the test set’s actual labels; and,
  5. Repeat steps 1) through 4) across K splits to average out the evaluation results.

Requirements

To run the evaluation on your custom model, it should meet the following criteria:

  • It should be a custom trained model version with:
    1. At least 2 concepts.
    2. At least 10 training inputs per concept (at least 50 inputs per concept is recommended).
caution

The evaluation may result in an error if the model version doesn’t satisfy the requirements above.

info

The initialization code used in the following examples is outlined in detail on the client installation page.

Running Evaluation

tip

If evaluating an embedding-classifier model type, you need to set use_kfold to false in the eval_info.params of the evaluation request. Here is an example: params.update({"dataset_id": DATASET_ID, "use_kfold": False})

PostModelVersionEvaluations

Below is an example of how you would use the PostModelVersionEvaluations method to run an evaluation on a specific version of a custom model.

############################################################################################
# In this section, we set the user authentication, app ID, and model evaluation details.
# Change these strings to run your own example.
###########################################################################################

USER_ID = "YOUR_USER_ID_HERE"
# Your PAT (Personal Access Token) can be found in the Account's Security section
PAT = "YOUR_PAT_HERE"
APP_ID = "YOUR_APP_ID_HERE"
# Change these to make your own evaluations
MODEL_ID = "YOUR_MODEL_ID_HERE"
MODEL_VERSION_ID = "YOUR_MODEL_VERSION_HERE"
DATASET_ID = "YOUR_DATASET_ID_HERE"

##########################################################################
# YOU DO NOT NEED TO CHANGE ANYTHING BELOW THIS LINE TO RUN THIS EXAMPLE
##########################################################################

from clarifai_grpc.channel.clarifai_channel import ClarifaiChannel
from clarifai_grpc.grpc.api import resources_pb2, service_pb2, service_pb2_grpc
from clarifai_grpc.grpc.api.status import status_code_pb2
from google.protobuf.struct_pb2 import Struct

channel = ClarifaiChannel.get_grpc_channel()
stub = service_pb2_grpc.V2Stub(channel)

params = Struct()
params.update({"dataset_id": DATASET_ID})

metadata = (("authorization", "Key " + PAT),)

userDataObject = resources_pb2.UserAppIDSet(user_id=USER_ID, app_id=APP_ID)

post_model_evaluations = stub.PostModelVersionEvaluations(
service_pb2.PostModelVersionEvaluationsRequest(
user_app_id=userDataObject,
model_id=MODEL_ID,
model_version_id=MODEL_VERSION_ID,
eval_metrics=[
resources_pb2.EvalMetrics(
eval_info=resources_pb2.EvalInfo(params=params),
)
],
),
metadata=metadata,
)

if post_model_evaluations.status.code != status_code_pb2.SUCCESS:
print(post_model_evaluations.status)
raise Exception("Failed response, status: " + post_model_evaluations.status.description)

print(post_model_evaluations)

PostEvaluations

Below is an example of how you would use the PostEvaluations method to run an evaluation on a specific version of a custom model. The method allows you to choose models and datasets from different apps that you have access to.

############################################################################################
# In this section, we set the user authentication, app ID, and model evaluation details.
# Change these strings to run your own example.
###########################################################################################

USER_ID = "YOUR_USER_ID_HERE"
# Your PAT (Personal Access Token) can be found in the Account's Security section
PAT = "YOUR_PAT_HERE"
APP_ID = "YOUR_APP_ID_HERE"
# Change these to make your own evaluations
MODEL_APP_ID = "YOUR_MODEL_APP_ID_HERE"
MODEL_USER_ID = "YOUR_MODEL_USER_ID_HERE"
MODEL_ID = "YOUR_MODEL_ID_HERE"
MODEL_VERSION_ID = "YOUR_MODEL_VERSION_HERE"
DATASET_USER_ID = "YOUR_DATASET_USER_ID_HERE"
DATASET_APP_ID = "YOUR_DATASET_APP_ID_HERE"
DATASET_ID = "YOUR_DATASET_ID_HERE"
DATASET_VERSION_ID = "YOUR_DATASET_VERSION_ID_HERE"

##########################################################################
# YOU DO NOT NEED TO CHANGE ANYTHING BELOW THIS LINE TO RUN THIS EXAMPLE
##########################################################################

from clarifai_grpc.channel.clarifai_channel import ClarifaiChannel
from clarifai_grpc.grpc.api import resources_pb2, service_pb2, service_pb2_grpc
from clarifai_grpc.grpc.api.status import status_code_pb2

channel = ClarifaiChannel.get_grpc_channel()
stub = service_pb2_grpc.V2Stub(channel)

metadata = (("authorization", "Key " + PAT),)

userDataObject = resources_pb2.UserAppIDSet(user_id=USER_ID, app_id=APP_ID)

post_model_evaluations = stub.PostEvaluations(
service_pb2.PostEvaluationsRequest(
user_app_id=userDataObject,
eval_metrics=[
resources_pb2.EvalMetrics(
model=resources_pb2.Model(
app_id=MODEL_APP_ID,
user_id=MODEL_USER_ID,
id=MODEL_ID,
model_version=resources_pb2.ModelVersion(id=MODEL_VERSION_ID),
),
ground_truth_dataset=resources_pb2.Dataset(
user_id=DATASET_USER_ID,
app_id=DATASET_APP_ID,
id=DATASET_ID,
version=resources_pb2.DatasetVersion(id=DATASET_VERSION_ID),
),

)
],
),
metadata=metadata,
)

if post_model_evaluations.status.code != status_code_pb2.SUCCESS:
print(post_model_evaluations.status)
raise Exception(
"Failed response, status: " + post_model_evaluations.status.description
)

print(post_model_evaluations)

Once the evaluation is complete, you can retrieve the results and analyze the performance of your custom model.

We'll talk about how to interpret a model's evaluation results in the next section.

tip

You can also learn how to perform evaluation on the Portal here.