Text Classifier

Learn about our text classifier model type

Input: Text

Output: Concepts

Text classifier is a type of deep fine-tuned model designed to automatically categorize or classify text data into predefined categories or concepts. This is a common task in natural language processing (NLP) and has a wide range of applications, including sentiment analysis, spam detection, topic categorization, and more.

info

The text classifier model type also comes with various templates that give you the control to choose the specific architecture used by your neural network, as well as define a set of hyperparameters you can use to fine-tune the way your model learns.

You may choose a text classifier model type in cases where:

You need an automated way to process and categorize large amounts of textual data, enabling applications that require efficient and accurate text categorization.
You need a text classification model to learn new features not recognized by the existing Clarifai models. In that case, you may need to "deep fine-tune" your custom model and integrate it directly within your workflows.
You have a custom-tailored dataset, accurate labels, and the expertise and time to fine-tune models.

Example Use Case

A company wants to monitor customer sentiment towards its products by analyzing online reviews. They receive a large number of product reviews on their website and social media platforms. To efficiently understand customer opinions, they can employ a text classifier model to automatically classify these reviews as positive, negative, or neutral.

tip

You can explore the step-by-step tutorial on fine-tuning the GPT-Neo LoRA template for text classification tasks here.

Create and Train Text Classifier

Let's demonstrate how to create and train a text classifier model using our API.

info

Before using the Python SDK, Node.js SDK, or any of our gRPC clients, ensure they are properly installed on your machine. Refer to their respective installation guides for instructions on how to install and initialize them.

Step 1: App Creation

Let's start by creating an app.

Python SDK

from clarifai.client.user import User
#replace your "user_id"
client = User(user_id="user_id")
app = client.create_app(app_id="demo_train", base_workflow="Universal")

Step 2: Dataset Upload

Next, let’s upload the dataset that will be used to train the model to the app.

You can find the dataset we used here.

Python SDK

#importing load_module_dataloader for calling the dataloader object in dataset.py in the local data folder
from clarifai.datasets.upload.utils import load_module_dataloader


# Construct the path to the dataset folder
CSV_PATH = os.path.join(os.getcwd().split('/models/model_train')[0],'datasets/upload/data/imdb.csv')


# Create a Clarifai dataset with the specified dataset_id 
dataset = app.create_dataset(dataset_id="text_dataset")
# Upload the dataset using the provided dataloader and get the upload status
dataset.upload_from_csv(csv_path=CSV_PATH,input_type='text',csv_type='raw', labels=True)

Step 3: Model Creation

Let's list all the available trainable model types in the Clarifai platform.

Python SDK

print(app.list_trainable_model_types())

Output

['visual-classifier',
 'visual-detector',
 'visual-segmenter',
 'visual-embedder',
 'clusterer',
 'text-classifier',
 'embedding-classifier',
 'text-to-text']

Next, let's select the text-classifier model type and use it to create a model.

Python SDK

MODEL_ID = "model_text_classifier"
MODEL_TYPE_ID = "text-classifier"

# Create a model by passing the model name and model type as parameter
model = app.create_model(model_id=MODEL_ID, model_type_id=MODEL_TYPE_ID)

Step 4: Template Selection

Let's list all the available training templates in the Clarifai platform.

Python SDK

print(model.list_training_templates())

Output

['HF_GPTNeo_125m_lora',
 'HF_GPTNeo_2p7b_lora',
 'HF_Llama_2_13b_chat_GPTQ_lora',
 'HF_Llama_2_7b_chat_GPTQ_lora',
 'HF_Mistral_7b_instruct_GPTQ_lora',
 'HuggingFace_AdvancedConfig']

Next, let's choose the 'HuggingFace_AdvancedConfig' template to use for training our model.

Python SDK

# get the model parameters
model_params = model.get_params(template='HuggingFace_AdvancedConfig')

Step 5: Set Up Model Parameters

You can customize the model parameters as needed before starting the training process.

Python SDK

# get the model parameters
model_params = model.get_params(template='HuggingFace_AdvancedConfig')
concepts = [concept.id for concept in app.list_concepts()]
# update the concept field in model parameters
model.update_params(dataset_id = 'text_dataset',concepts = ["id-pos","id-neg"])

Output

{'dataset_id': 'text_dataset',
 'dataset_version_id': '',
 'concepts': ['id-pos', 'id-neg'],
 'train_params': {'invalid_data_tolerance_percent': 5.0,
  'template': 'HuggingFace_AdvancedConfig',
  'model_config': {'problem_type': 'multi_label_classification',
   'pretrained_model_name_or_path': 'bert-base-cased',
   'torch_dtype': 'torch.float32'},
  'tokenizer_config': {},
  'trainer_config': {'num_train_epochs': 1.0,
   'auto_find_batch_size': True,
   'output_dir': 'checkpoint'}},
 'inference_params': {'select_concepts': []}}

Step 6: Initiate Model Training

To initiate the model training process, call the model.train() method. The Clarifai API also provides features for monitoring training status and saving training logs to a local file.

note

If the training status code returns MODEL-TRAINED, it means the model has successfully completed training and is ready for use.

Python SDK

import time
#Starting the training
model_version_id = model.train()

#Checking the status of training
while True:
    status = model.training_status(version_id=model_version_id,training_logs=False)
    if status.code == 21106: #MODEL_TRAINING_FAILED
        print(status)
        break
    elif status.code == 21100: #MODEL_TRAINED
        print(status)
        break
    else:
        print("Current Status:",status)
        print("Waiting---")
        time.sleep(120)

Step 7: Model Prediction

After the model is trained and ready to use, you can run some predictions with it.

Python SDK

# Getting the predictions
TEXT = b"This is a great place to work"
model_prediction = model.predict_by_bytes(TEXT, input_type="text")

# Get the output
print('Input: ',TEXT)
for concept in model_prediction.outputs[0].data.concepts:
    print(concept.id,':',round(concept.value,2))

Output

Input:  b'This is a great place to work'

id-neg : 0.56

id-pos : 0.39

Step 8: Model Evaluation

Let’s evaluate the model using both the training and test datasets. We’ll start by reviewing the evaluation metrics for the training dataset.

Python SDK

# Evaluate the model using the specified dataset ID 'text_dataset' and evaluation ID 'one'.
model.evaluate(dataset_id='text_dataset', eval_id='one')

# Retrieve the evaluation result for the evaluation ID 'one'.
result = model.get_eval_by_id(eval_id="one")

# Print the summary of the evaluation result.
print(result.summary)

Output

macro_avg_roc_auc: 0.6499999761581421
macro_std_roc_auc: 0.07468751072883606
macro_avg_f1_score: 0.75
macro_avg_precision: 0.6000000238418579
macro_avg_recall: 0.5

Before evaluating the model on the test dataset, ensure it is uploaded using the data loader. Once uploaded, proceed with the evaluation.

Python SDK

#importing load_module_dataloader for calling the dataloader object in dataset.py in the local data folder
from clarifai.datasets.upload.utils import load_module_dataloader


# Construct the path to the dataset folder
CSV_PATH = os.path.join(os.getcwd().split('/models/model_train')[0],'datasets/upload/data/test_imdb.csv')


# Create a Clarifai dataset with the specified dataset_id
test_dataset = app.create_dataset(dataset_id="test_text_dataset")
# Upload the dataset using the provided dataloader and get the upload status
test_dataset.upload_from_csv(csv_path=CSV_PATH,input_type='text',csv_type='raw', labels=True)

# Evaluate the model using the specified test text dataset identified as 'test_text_dataset'
# and the evaluation identifier 'two'.
model.evaluate(dataset_id='test_text_dataset', eval_id='two')

# Retrieve the evaluation result with the identifier 'two'.
result = model.get_eval_by_id("two")

# Print the summary of the evaluation result.
print(result.summary)

Output

macro_avg_roc_auc: 0.6161290407180786
macro_std_roc_auc: 0.1225806474685669
macro_avg_f1_score: 0.7207207679748535
macro_avg_precision: 0.5633803009986877
macro_avg_recall: 0.5

Finally, to gain deeper insights into the model’s performance, use the EvalResultCompare method to compare results across multiple datasets.

Python SDK

from clarifai.utils.evaluation import EvalResultCompare

# Creating an instance of EvalResultCompare class with specified models and datasets
eval_result = EvalResultCompare(models=[model], datasets=[dataset, test_dataset])

# Printing a detailed summary of the evaluation result
print(eval_result.detailed_summary())

Output

(  Concept  Accuracy (ROC AUC)  Total Labeled  Total Predicted  True Positives  \
id-pos               0.725             80                0               0   
id-neg               0.575            120              200             120   
id-pos               0.739             31                0               0   
id-neg               0.494             40               71              40   
 
    False Negatives  False Positives  Recall  Precision        F1  \
             80                0     0.0     1.0000  0.000000   
              0               80     1.0     0.6000  0.750000   
             31                0     0.0     1.0000  0.000000   
              0               31     1.0     0.5634  0.720737   
 
               Dataset  
     text_dataset2  
     text_dataset2  
test_text_dataset3  
test_text_dataset3  ,
                 Total Concept  Accuracy (ROC AUC)  Total Labeled  \
     Dataset:text_dataset2            0.650000            200   
Dataset:test_text_dataset3            0.616129             71   
 
    Total Predicted  True Positives  False Negatives  False Positives   Recall  \
            200             120               80               80  0.60000   
             71              40               31               31  0.56338   
 
    Precision        F1  
 0.760000  0.670588  
 0.754028  0.644909  )

Example Use Case​

Create and Train Text Classifier​

Step 1: App Creation​

Step 2: Dataset Upload​

Step 3: Model Creation​

Step 4: Template Selection​

Step 5: Set Up Model Parameters​

Step 6: Initiate Model Training​

Step 7: Model Prediction​

Step 8: Model Evaluation​

What did you think of this doc?

Example Use Case

Create and Train Text Classifier

Step 1: App Creation

Step 2: Dataset Upload

Step 3: Model Creation

Step 4: Template Selection

Step 5: Set Up Model Parameters

Step 6: Initiate Model Training

Step 7: Model Prediction

Step 8: Model Evaluation