Skip to main content

Model Training And Evaluation Overview

Get a brief overview about model training and evaluation using Clarifai Python SDK


Model Training

Model training is the process of feeding data to an algorithm and iteratively adjusting its internal parameters to enable it to make accurate predictions on unseen data. After defining the model architecture, you can initiate the training process using the Clarifai Python SDK. During training, the SDK provides valuable feedback on the model's progress, allowing you to monitor metrics such as accuracy and loss. The structure followed during model training is app creation -> data upload -> model creation -> setting training configuration -> model training.

Click here to learn more about model training.

tip

Clone this repository to get the dataset used for training.

import os
from clarifai.client.user import User
#importing load_module_dataloader for calling the dataloader object in dataset.py in the local data folder
from clarifai.datasets.upload.utils import load_module_dataloader

#Replace your PAT
os.environ['CLARIFAI_PAT'] = "YOUR_PAT"

#replace your "user_id"
client = User(user_id="user_id")
app = client.create_app(app_id="demo_train", base_workflow="Universal")

# Construct the path to the dataset folder
CSV_PATH = os.path.join(os.getcwd().split('/models/model_train')[0],'datasets/upload/data/imdb.csv')

# Create a Clarifai dataset with the specified dataset_id
dataset = app.create_dataset(dataset_id="text_dataset")
# Upload the dataset using the provided dataloader and get the upload status
dataset.upload_from_csv(csv_path=CSV_PATH,input_type='text',csv_type='raw', labels=True)

MODEL_ID = "model_text_classifier"
MODEL_TYPE_ID = "text-classifier"

# Create a model by passing the model name and model type as parameter
model = app.create_model(model_id=MODEL_ID, model_type_id=MODEL_TYPE_ID)

# get the model parameters
model_params = model.get_params(template='HuggingFace_AdvancedConfig')
concepts = [concept.id for concept in app.list_concepts()]
# update the concept field in model parameters
model.update_params(dataset_id = 'text_dataset',concepts = ["id-pos","id-neg"])

import time
#Starting the training
model_version_id = model.train()

#Checking the status of training
while True:
status = model.training_status(version_id=model_version_id,training_logs=False)
if status.code == 21106: #MODEL_TRAINING_FAILED
print(status)
break
elif status.code == 21100: #MODEL_TRAINED
print(status)
break
else:
print("Current Status:",status)
print("Waiting---")
time.sleep(120)
Output

Model Evaluation

Model evaluation is the process by which we monitor the model's performance on the dataset. The Clarifai Python SDK allows you to evaluate the model in two ways. Firstly, you can receive evaluation metrics for each dataset split separately. The Mode.evaluate() method will run the evaluation on the model by using the dataset passed as a parameter. Each evaluation is marked by eval_id. This allows users to run multiple evaluations using different datasets.

info

Evaluation is currently supported for the following model types: Embedding Classifier, Text Classifier, Visual Classifier, and Visual Detector.

#importing load_module_dataloader for calling the dataloader object in dataset.py in the local data folder
from clarifai.datasets.upload.utils import load_module_dataloader


# Evaluate the model using the specified dataset ID 'text_dataset' and evaluation ID 'one'.
model.evaluate(dataset_id='text_dataset', eval_id='one')

# Retrieve the evaluation result for the evaluation ID 'one'.
train_result = model.get_eval_by_id(eval_id="one")

# Construct the path to the test dataset folder
CSV_PATH = os.path.join(os.getcwd().split('/models/model_train')[0],'datasets/upload/data/test_imdb.csv')

# Create a Clarifai dataset with the specified dataset_id
test_dataset = app.create_dataset(dataset_id="test_text_dataset")
# Upload the dataset using the provided dataloader and get the upload status
test_dataset.upload_from_csv(csv_path=CSV_PATH,input_type='text',csv_type='raw', labels=True)

# Evaluate the model using the specified test text dataset identified as 'test_text_dataset'
# and the evaluation identifier 'two'.
model.evaluate(dataset_id='test_text_dataset', eval_id='two')

# Retrieve the evaluation result with the identifier 'two'.
test_result = model.get_eval_by_id("two")

# Print the summary of the evaluation result.
print("train result:",train_result.summary)
print("test result:",test_result.summary)

Output
train result:
macro_avg_roc_auc: 0.6499999761581421
macro_std_roc_auc: 0.07468751072883606
macro_avg_f1_score: 0.75
macro_avg_precision: 0.6000000238418579
macro_avg_recall: 0.5
test result:
macro_avg_roc_auc: 0.6161290407180786
macro_std_roc_auc: 0.1225806474685669
macro_avg_f1_score: 0.7207207679748535
macro_avg_precision: 0.5633803009986877
macro_avg_recall: 0.5

The SDK also has a feature called EvalResultCompare. This method allows users to compare the outputs from different evaluations.

from clarifai.utils.evaluation import EvalResultCompare

# Creating an instance of EvalResultCompare class with specified models and datasets
eval_result = EvalResultCompare(models=[model], datasets=[dataset, test_dataset])

# Printing a detailed summary of the evaluation result
print(eval_result.detailed_summary())
Output
(  Concept  Accuracy (ROC AUC)  Total Labeled  Total Predicted  True Positives  \
0 id-pos 0.725 80 0 0
0 id-neg 0.575 120 200 120
0 id-pos 0.739 31 0 0
0 id-neg 0.494 40 71 40

False Negatives False Positives Recall Precision F1 \
0 80 0 0.0 1.0000 0.000000
0 0 80 1.0 0.6000 0.750000
0 31 0 0.0 1.0000 0.000000
0 0 31 1.0 0.5634 0.720737

Dataset
0 text_dataset2
0 text_dataset2
0 test_text_dataset3
0 test_text_dataset3 ,
Total Concept Accuracy (ROC AUC) Total Labeled \
0 Dataset:text_dataset2 0.650000 200
0 Dataset:test_text_dataset3 0.616129 71

Total Predicted True Positives False Negatives False Positives Recall \
0 200 120 80 80 0.60000
0 71 40 31 31 0.56338

Precision F1
0 0.760000 0.670588
0 0.754028 0.644909 )