Text Classifier
Learn about our text classifier model type
Input: Text
Output: Concepts
Text classifier is a type of deep fine-tuned model designed to automatically categorize or classify text data into predefined categories or concepts. This is a common task in natural language processing (NLP) and has a wide range of applications, including sentiment analysis, spam detection, topic categorization, and more.
The text classifier model type also comes with various templates that give you the control to choose the specific architecture used by your neural network, as well as define a set of hyperparameters you can use to fine-tune the way your model learns.
You may choose a text classifier model type in cases where:
- You need an automated way to process and categorize large amounts of textual data, enabling applications that require efficient and accurate text categorization.
- You need a text classification model to learn new features not recognized by the existing Clarifai models. In that case, you may need to "deep fine-tune" your custom model and integrate it directly within your workflows.
- You have a custom-tailored dataset, accurate labels, and the expertise and time to fine-tune models.
Example Use Case
A company wants to monitor customer sentiment towards its products by analyzing online reviews. They receive a large number of product reviews on their website and social media platforms. To efficiently understand customer opinions, they can employ a text classifier model to automatically classify these reviews as positive, negative, or neutral.
You can explore the step-by-step tutorial on fine-tuning the GPT-Neo LoRA template for text classification tasks here.
Create and Train Text Classifier
Let's demonstrate how to create and train a text classifier model using our API.
Before using the Python SDK, Node.js SDK, or any of our gRPC clients, ensure they are properly installed on your machine. Refer to their respective installation guides for instructions on how to install and initialize them.
Step 1: App Creation
Let's start by creating an app.
- Python SDK
from clarifai.client.user import User
#replace your "user_id"
client = User(user_id="user_id")
app = client.create_app(app_id="demo_train", base_workflow="Universal")
Step 2: Dataset Upload
Next, let’s upload the dataset that will be used to train the model to the app.
You can find the dataset we used here.
- Python SDK
#importing load_module_dataloader for calling the dataloader object in dataset.py in the local data folder
from clarifai.datasets.upload.utils import load_module_dataloader
# Construct the path to the dataset folder
CSV_PATH = os.path.join(os.getcwd().split('/models/model_train')[0],'datasets/upload/data/imdb.csv')
# Create a Clarifai dataset with the specified dataset_id
dataset = app.create_dataset(dataset_id="text_dataset")
# Upload the dataset using the provided dataloader and get the upload status
dataset.upload_from_csv(csv_path=CSV_PATH,input_type='text',csv_type='raw', labels=True)
Step 3: Model Creation
Let's list all the available trainable model types in the Clarifai platform.
- Python SDK
print(app.list_trainable_model_types())
Output
['visual-classifier',
'visual-detector',
'visual-segmenter',
'visual-embedder',
'clusterer',
'text-classifier',
'embedding-classifier',
'text-to-text']
Next, let's select the text-classifier model type and use it to create a model.
- Python SDK
MODEL_ID = "model_text_classifier"
MODEL_TYPE_ID = "text-classifier"
# Create a model by passing the model name and model type as parameter
model = app.create_model(model_id=MODEL_ID, model_type_id=MODEL_TYPE_ID)
Step 4: Template Selection
Let's list all the available training templates in the Clarifai platform.
- Python SDK
print(model.list_training_templates())
Output
['HF_GPTNeo_125m_lora',
'HF_GPTNeo_2p7b_lora',
'HF_Llama_2_13b_chat_GPTQ_lora',
'HF_Llama_2_7b_chat_GPTQ_lora',
'HF_Mistral_7b_instruct_GPTQ_lora',
'HuggingFace_AdvancedConfig']
Next, let's choose the 'HuggingFace_AdvancedConfig' template to use for training our model.
- Python SDK
# get the model parameters
model_params = model.get_params(template='HuggingFace_AdvancedConfig')
Step 5: Set Up Model Parameters
You can customize the model parameters as needed before starting the training process.
- Python SDK
# get the model parameters
model_params = model.get_params(template='HuggingFace_AdvancedConfig')
concepts = [concept.id for concept in app.list_concepts()]
# update the concept field in model parameters
model.update_params(dataset_id = 'text_dataset',concepts = ["id-pos","id-neg"])
Output
{'dataset_id': 'text_dataset',
'dataset_version_id': '',
'concepts': ['id-pos', 'id-neg'],
'train_params': {'invalid_data_tolerance_percent': 5.0,
'template': 'HuggingFace_AdvancedConfig',
'model_config': {'problem_type': 'multi_label_classification',
'pretrained_model_name_or_path': 'bert-base-cased',
'torch_dtype': 'torch.float32'},
'tokenizer_config': {},
'trainer_config': {'num_train_epochs': 1.0,
'auto_find_batch_size': True,
'output_dir': 'checkpoint'}},
'inference_params': {'select_concepts': []}}
Step 6: Initiate Model Training
To initiate the model training process, call the model.train() method. The Clarifai API also provides features for monitoring training status and saving training logs to a local file.
If the training status code returns MODEL-TRAINED, it means the model has successfully completed training and is ready for use.
- Python SDK
import time
#Starting the training
model_version_id = model.train()
#Checking the status of training
while True:
status = model.training_status(version_id=model_version_id,training_logs=False)
if status.code == 21106: #MODEL_TRAINING_FAILED
print(status)
break
elif status.code == 21100: #MODEL_TRAINED
print(status)
break
else:
print("Current Status:",status)
print("Waiting---")
time.sleep(120)