Clusterer
Learn how to train a clustering model using Clarifai SDKs
Clusterer models are algorithms used in machine learning and data analysis to group similar data points together into clusters or clusters. These models aim to find patterns and structures within datasets by organizing the data into groups based on similarities in their features. You can learn more about Clusterer here.
App Creation
The first part of model training includes the creation of an app under which the training process takes place. Here we are creating an app with the app id as “demo_train” and the base workflow is set as “Universal”. You can change the base workflows to Empty, Universal, Language Understanding, and General according to your use case.
- Python
from clarifai.client.user import User
#replace your "user_id"
client = User(user_id="user_id")
app = client.create_app(app_id="demo_train", base_workflow="Universal")
Dataset Upload
The next step involves dataset upload. You can upload the dataset to your app so that the model accepts the data directly from the platform. The data used for training in this tutorial is available in the examples repository you have cloned.
- Python
#importing load_module_dataloader for calling the dataloader object in dataset.py in the local data folder
from clarifai.datasets.upload.utils import load_module_dataloader
# Construct the path to the dataset folder
CSV_PATH = os.path.join(os.getcwd().split('/models/model_train')[0],'datasets/upload/data/imdb.csv')
# Create a Clarifai dataset with the specified dataset_id
dataset = app.create_dataset(dataset_id="text_dataset")
# Upload the dataset using the provided dataloader and get the upload status
dataset.upload_from_csv(csv_path=CSV_PATH,input_type='text',csv_type='raw', labels=True)
If you have followed the steps correctly you should receive an output that looks like this,
Output
Choose The Model Type
First let's list the all available trainable model types in the platform,
- Python
print(app.list_trainable_model_types())
Output
['visual-classifier',
'visual-detector',
'visual-segmenter',
'visual-embedder',
'clusterer',
'text-classifier',
'embedding-classifier',
'text-to-text']
Click here to know more about Clarifai Model Types.
Model Creation
From the above list of model types we are going to choose clusterer as it is similar to our use case. Now let's create a model with the above model type.
- Python
MODEL_ID = "model_clusterer"
MODEL_TYPE_ID = "clusterer"
# Create a model by passing the model name and model type as parameter
model = app.create_model(model_id=MODEL_ID, model_type_id=MODEL_TYPE_ID)
Output
Patch Model
After creating a model, you can perform patch operations on it by merging, removing, or overwriting data. By default, all actions support overwriting, but they handle lists of objects in specific ways.
- The
merge
action updates akey:value
pair withkey:new_value
or appends to an existing list. For dictionaries, it merges entries that share the sameid
field. - The
remove
action is only used to delete the model's cover image on the platform UI. - The
overwrite
action completely replaces an existing object with a new one.
Below is an example of performing patch operations on a model, such as updating its description and notes.
- Python
from clarifai.client.app import App
app = App(app_id="YOUR_APP_ID_HERE", user_id="YOUR_USER_ID_HERE", pat="YOUR_PAT_HERE")
# Update the details of the model
app.patch_model(model_id="model_clusterer", action="merge", description="description", notes="notes", toolkits=["OpenAI"], use_cases=["llm"], languages=["en"], image_url="https://samples.clarifai.com/metro-north.jpg")
# Update the model's image by specifying the 'remove' action
app.patch_model(model_id='model_clusterer', action='remove', image_url='https://samples.clarifai.com/metro-north.jpg')
Template Selection
Inside the Clarifai platform there is a template feature. Templates give you the control to choose the specific architecture used by your neural network, as well as define a set of hyperparameters you can use to fine-tune the way your model learns. But when it comes to clustering there is only one default template available.
Setup Model Parameters
You can update the model params to your need before initiating training.
- Python
# Get the params for the selected template
model_params = model.get_params()
print(model_params)
Output
{'train_params': {'base_embed_model': None,
'coarse_clusters': 32.0,
'eval_holdout_fraction': 0.2,
'query_holdout_fraction': 0.1,
'to_be_indexed_queries_fraction': 0.25,
'max_num_query_embeddings': 100.0,
'num_results_per_query': [1.0, 5.0, 10.0, 20.0],
'max_visited': 32.0,
'quota': 1000.0,
'beta': 1.0}}
Initiate Model Training
We can initiate the model training by calling the model.train() method. The Clarifai SDKs also offers features like showing training status and saving training logs in a local file.
If the status code is 'MODEL-TRAINED', then the user can know the Model is Trained and ready to use._
- Python
import time
#Starting the training
model_version_id = model.train()
#Checking the status of training
while True:
status = model.training_status(version_id=model_version_id,training_logs=False)
if status.code == 21106: #MODEL_TRAINING_FAILED
print(status)
break
elif status.code == 21100: #MODEL_TRAINED
print(status)
break
else:
print("Current Status:",status)
print("Waiting---")
time.sleep(120)
Output
Model Prediction
Since the model is trained and ready let’s run some predictions to view the model performance,
- Python
TEXT = b"This is a great place to work"
# get the predictions
model_prediction = model.predict_by_bytes(TEXT, input_type="text")
print(model_prediction.outputs[0].data.clusters)
Output
[id: "22_5"
projection: 0.010116016492247581
projection: -0.035988882184028625
]