Clusterer

Learn about our clusterer model type

Input: Images and videos

Output: Clusters

Clusterer is a type of deep fine-tuned model designed to identify and group similar images or video frames within a dataset. The primary goal of clustering is to discover patterns or relationships among data points based on their inherent characteristics or features, without requiring explicit labels or predefined categories.

Cluster models are often used in conjunction with embedding models to perform visual searches. This is done by first using an embedding model to represent each image as a vector in a lower-dimensional space. The cluster model then uses the mathematical structure of this space to determine which images are "clustered together."

The cluster model type can be used in a wide range of applications, including:

Customer segmentation in marketing: Cluster models can be used to group customers with similar purchasing behaviors, demographics, or preferences.
Anomaly detection in network security: Cluster models can identify unusual patterns in network traffic data, helping detect potential security threats or cyberattacks. Unusual clusters can indicate unauthorized access or malicious activity.
Document clustering in natural language processing: In textual data analysis, cluster models can group similar documents based on their content. This aids in tasks like topic modeling, content summarization, and document organization.

You may choose a visual classifier model type in cases where:

You want to perform visual searches accurately, quickly, and easily. Cluster models and embedding models do not require any labels or custom concepts to be trained. This makes them much more scalable and flexible than traditional methods for visual search, which often require a large amount of labeled data to train.
You need a cluster model to learn new features not recognized by the existing Clarifai models. In that case, you may need to "deep fine-tune" your custom model and integrate it directly within your workflows.
You have a custom-tailored dataset, accurate labels, and the expertise and time to fine-tune models.

Example Use Case

If you want to find all images of cats in your dataset, you can simply use the cluster model to find all images that are clustered together with the embedding of a cat image.

Create and Train a Clusterer

Let's demonstrate how to create and train a clustering model using our API.

info

Before using the Python SDK, Node.js SDK, or any of our gRPC clients, ensure they are properly installed on your machine. Refer to their respective installation guides for instructions on how to install and initialize them.

Step 1: App Creation

Let's start by creating an app.

Python SDK

from clarifai.client.user import User
#replace your "user_id"
client = User(user_id="user_id")
app = client.create_app(app_id="demo_train", base_workflow="Universal")

Step 2: Dataset Upload

Next, let’s upload the dataset that will be used to train the model to the app.

You can find the dataset we used here.

Python SDK

#importing load_module_dataloader for calling the dataloader object in dataset.py in the local data folder
from clarifai.datasets.upload.utils import load_module_dataloader


# Construct the path to the dataset folder
CSV_PATH = os.path.join(os.getcwd().split('/models/model_train')[0],'datasets/upload/data/imdb.csv')


# Create a Clarifai dataset with the specified dataset_id 
dataset = app.create_dataset(dataset_id="text_dataset")
# Upload the dataset using the provided dataloader and get the upload status
dataset.upload_from_csv(csv_path=CSV_PATH,input_type='text',csv_type='raw', labels=True)

Step 3: Model Creation

Let's list all the available trainable model types in the Clarifai platform.

Python SDK

print(app.list_trainable_model_types())

Output

['visual-classifier',
 'visual-detector',
 'visual-segmenter',
 'visual-embedder',
 'clusterer',
 'text-classifier',
 'embedding-classifier',
 'text-to-text']

Next, let's select the clusterer model type and use it to create a model.

Python SDK

MODEL_ID = "model_clusterer"
MODEL_TYPE_ID = "clusterer"

# Create a model by passing the model name and model type as parameter
model = app.create_model(model_id=MODEL_ID, model_type_id=MODEL_TYPE_ID)

Step 4: Patch Model (optional)

After creating a model, you can perform patch operations on it by merging, removing, or overwriting data. By default, all actions support overwriting, but they handle lists of objects in specific ways.

The merge action updates a key:value pair with key:new_value or appends to an existing list. For dictionaries, it merges entries that share the same id field.
The remove action is only used to delete the model's cover image on the platform UI.
The overwrite action completely replaces an existing object with a new one.

Below is an example of performing patch operations on a model, such as updating its description and notes.

Python SDK

from clarifai.client.app import App

app = App(app_id="YOUR_APP_ID_HERE", user_id="YOUR_USER_ID_HERE", pat="YOUR_PAT_HERE")

# Update the details of the model
app.patch_model(model_id="model_clusterer", action="merge", description="description", notes="notes", toolkits=["OpenAI"], use_cases=["llm"], languages=["en"], image_url="https://samples.clarifai.com/metro-north.jpg")

# Update the model's image by specifying the 'remove' action
app.patch_model(model_id='model_clusterer', action='remove', image_url='https://samples.clarifai.com/metro-north.jpg')

Step 5: Set Up Model Parameters

You can customize the model parameters as needed before starting the training process.

Python SDK

# Get the params for the selected template
model_params = model.get_params()
print(model_params)

Output

{'train_params': {'base_embed_model': None,
  'coarse_clusters': 32.0,
  'eval_holdout_fraction': 0.2,
  'query_holdout_fraction': 0.1,
  'to_be_indexed_queries_fraction': 0.25,
  'max_num_query_embeddings': 100.0,
  'num_results_per_query': [1.0, 5.0, 10.0, 20.0],
  'max_visited': 32.0,
  'quota': 1000.0,
  'beta': 1.0}}

Step 6: Initiate Model Training

To initiate the model training process, call the model.train() method. The Clarifai API also provides features for monitoring training status and saving training logs to a local file.

note

If the training status code returns MODEL-TRAINED, it means the model has successfully completed training and is ready for use.

Python SDK

import time
#Starting the training
model_version_id = model.train()

#Checking the status of training
while True:
    status = model.training_status(version_id=model_version_id,training_logs=False)
    if status.code == 21106: #MODEL_TRAINING_FAILED
        print(status)
        break
    elif status.code == 21100: #MODEL_TRAINED
        print(status)
        break
    else:
        print("Current Status:",status)
        print("Waiting---")
        time.sleep(120)

Step 7: Model Prediction

After the model is trained and ready to use, you can run some predictions with it.

Python SDK

TEXT = b"This is a great place to work"

# get the predictions
model_prediction = model.predict_by_bytes(TEXT, input_type="text")

print(model_prediction.outputs[0].data.clusters)

Output

[id: "22_5"

projection: 0.010116016492247581

projection: -0.035988882184028625

]

Example Use Case​

Create and Train a Clusterer​

Step 1: App Creation​

Step 2: Dataset Upload​

Step 3: Model Creation​

Step 4: Patch Model (optional)​

Step 5: Set Up Model Parameters​

Step 6: Initiate Model Training​

Step 7: Model Prediction​

What did you think of this doc?

Example Use Case

Create and Train a Clusterer

Step 1: App Creation

Step 2: Dataset Upload

Step 3: Model Creation

Step 4: Patch Model (optional)

Step 5: Set Up Model Parameters

Step 6: Initiate Model Training

Step 7: Model Prediction