Skip to main content

Tracker

Learn about our tracker operators


Tracker operators are a specific type of agent system operators designed for object tracking in computer vision. Object tracking involves following the movement of objects in a sequence of images or frames in a video. Tracker models use detection-based tracking algorithms that don't require training and help them identify and track objects over time.

The goal of object tracking is to maintain the identity of the object(s) over time, despite changes in position, scale, orientation, and lighting conditions.

tip

Since the tracker operators can be "chained" together with models to automate tasks in a workflow, you can learn how to create workflows here.

BYTE Tracker

Input: frames[…].data.regions[…].data.concepts, frames[…].data.regions[…].region_info.bounding_box

Output: frames[…].data.regions[…].track_id

BYTE Tracker is a multi-object tracking by-detection model built upon the Simple Online and Real-time Tracking (SORT) principles. Multi-object tracking aims to predict the bounding boxes and identities of objects within video sequences. BYTE tracker can also be seen as an enhanced version of the Kalman Filter Hungarian Tracker.

Most tracking techniques retrieve identities by associating detection boxes whose scores are higher than a threshold. Unlike simpler trackers that ditch detections with low confidence scores, BYTE Tracker considers them, too, making it better at handling situations like temporary occlusions or lighting changes.

Typically, it works in two stages:

  1. High Confidence Matches: First, BYTE Tracker focuses on high-scoring detections (bounding boxes around objects). It uses a combination of motion similarity (how much the object moved between frames) and appearance similarity (features extracted from the object) to match these detections with existing tracks (tracklets). A motion prediction technique is then used to predict the position of these tracks in the next frame.

  2. Low Confidence Recovery: Here's where BYTE Tracker differs. It revisits the low confidence detections (discarded by simpler trackers) and unmatched tracklets from the previous stage. Using the same motion similarity metric, BYTE Tracker tries to re-associate these with each other, potentially recovering tracks that were lost due to occlusions or low initial confidence.

With this powerful operator, you can seamlessly integrate object tracking into your detect-track workflows and unlock advanced capabilities. Let's demonstrate how you can use the BYTE Tracker, alongside a detection model, to efficiently track objects in videos.

1. Go to the workflow builder page. Search for the visual-detector option in the left-hand sidebar and drag it onto the empty workspace. Then, use the pop-up that appears on the right-hand sidebar to search for a detection model, such as general-image-detection, and select its version. You can also set the other configuration options — including selecting the concepts you want to filter.

2. Search for the byte-tracker option in the left-hand sidebar and drag it onto the workspace. You can set up its output configuration parameters, which are outlined below.

3. Connect the visual-detector model with the byte-tracker operator and save your workflow.

To observe it in action, navigate to the workflow's individual page and click the + button to input your video. For this example, let's provide this video.

The workflow will analyze the video and identify objects consistently throughout its duration.

alt text

You can try this workflow here.

note

Before trying to access the workflow, please make sure that you have a Clarifai account and are logged in to the Clarifai platform to access the example workflow. If you do not have clarifai account you can signup here.

Centroid Tracker

Input: frames[…].data.regions[…].data.concepts, frames[…].data.regions[…].region_info.bounding_box

Output: frames[…].data.regions[…].track_id

Centroid trackers rely on the Euclidean distance between centroids of regions in different video frames to assign the same track ID to detections of the same object.

Here's a breakdown of how they operate:

  1. Object Detection: In the first step, an object detector or a segmentation model (not part of the centroid tracker itself) identifies objects in each frame of a video. The detector outputs bounding boxes around the identified objects.

  2. Centroid Calculation: For each bounding box, the centroid tracker calculates its centroid. The centroid is simply the center point of the box, typically represented by its X and Y coordinates.

  3. Distance Comparison: The tracker then compares the centroids of objects detected in the current frame with the centroids of objects from the previous frame. It calculates the Euclidean distance, which is a straight-line distance between two points in space.

  4. Track Assignment: Based on a predefined threshold value, the tracker assigns track IDs. Objects in the current frame whose centroids are within a certain distance of a centroid in the previous frame are considered to be the same object and are assigned the same track ID. Objects with centroids exceeding the threshold distance are assumed to be new objects and assigned new track IDs.

Let's demonstrate how you can use the centroid tracker, alongside a detection model, to efficiently track objects in videos.

1. Go to the workflow builder page. Search for the visual-detector option in the left-hand sidebar and drag it onto the empty workspace. Then, use the pop-up that appears on the right-hand sidebar to search for a detection model, such as general-image-detection, and select its version. You can also set the other configuration options — including selecting the concepts you want to filter.

2. Search for the centroid-tracker option in the left-hand sidebar and drag it onto the workspace. You can set up its output configuration parameters, which are outlined below.

3. Connect the visual-detector model with the centroid-tracker operator and save your workflow.

To observe it in action, navigate to the workflow's individual page and click the + button to input your video. For this example, let's provide this video.

The workflow will analyze the video and identify objects consistently throughout its duration.

You can try this workflow here.

note

Before trying to access the workflow, please make sure that you have a Clarifai account and are logged in to the Clarifai platform to access the example workflow. If you do not have clarifai account you can signup here.

Kalman Filter Hungarian Tracker

Output: frames|...].data.regions...].data.concepts, frames[...].data.regions[...].region_info.bounding_box

Input: frames|...].data.regions...] track_id

Kalman filter tracker rely on the Kalman filter algorithm to estimate an object's next position based on its position and velocity in previous frames. Then, detections are matched to predictions using the Hungarian algorithm. This sophisticated model excels in environments where objects move predictably, such as controlled traffic scenes or automated industrial processes.

Key Features:

  • Kalman Filter: Utilizes a state prediction model that accounts for the linear dynamics of moving objects, calculating their future states based on velocities and directional movements.
  • Hungarian Algorithm: Assigns new detections to predicted states by solving an optimization problem that minimizes the total cost of assigning predictions to observations, thus ensuring the best possible tracking accuracy.

Operational Details:

  • State Prediction: The Kalman filter predicts the future location of each tracked object based on its current state and motion prediction. This is crucial for maintaining track continuity in environments with predictable movement patterns.

  • Detection Matching: After prediction, detections from the current frame are matched to these predictions using the Hungarian algorithm. This method effectively handles assignments even under conditions where objects may occlude each other or momentarily disappear from view.

Let's demonstrate how you can use the Kalman filter tracker, alongside a detection model, to efficiently track objects in videos.

  1. Go to the workflow builder page. Search for the visual-detector option in the left-hand sidebar and drag it onto the empty workspace. Then, use the pop-up that appears on the right-hand sidebar to search for a detection model, such as general-image-detection, and select its version. You can also set the other configuration options — including selecting the concepts you want to filter.

  2. Search for the Kalman filter tracker option in the left-hand sidebar and drag it onto the workspace. You can set up its output configuration parameters, which are outlined below.

  3. Connect the visual-detector model with the centroid-tracker operator and save your workflow.

alt text

To observe it in action, navigate to the workflow's individual page and click the + button to input your video. For this example, let's provide this video.

The workflow will analyze the video and identify objects consistently throughout its duration.

alt text

Kalman Reid Tracker

Output: frames|...].data.regions|...].data.concepts

Input: frames|...].data.regions...] track_id

The Kalman Reid Tracker is an advanced version of the Kalman filter tracking system, enhanced with Re-Identification (ReID) capabilities using appearance embeddings. This tracker is particularly effective in environments where objects frequently occlude each other or experience significant appearance changes, such as in crowded urban areas or complex indoor environments.

Key Features:

  • Embedding Utilization: Leverages appearance embeddings to provide a secondary layer of identity verification, which helps in accurately re-identifying objects even after they have been occluded or altered in appearance.
  • Adaptive Tracking: Integrates dynamic adjustments to tracking strategies based on the embedding distances, allowing for more flexible and robust tracking capabilities.

Operational Details

  • Kalman Prediction: Continuously predicts the next positions of objects based on their measured positions and velocities, using a state estimation technique that considers the physics of motion.

  • ReID Application: When there are discrepancies between predicted and observed positions, the system uses appearance embeddings to reassess and realign track IDs. This is particularly useful for maintaining tracking accuracy in scenarios where objects undergo significant appearance changes.

Let's demonstrate how you can use the Kalman Reid tracker, alongside a detection model, to efficiently track objects in videos.

  1. Go to the workflow builder page. Search for the visual-detector option in the left-hand sidebar and drag it onto the empty workspace. Then, use the pop-up that appears on the right-hand sidebar to search for a detection model, such as general-image-detection, and select its version. You can also set the other configuration options — including selecting the concepts you want to filter.

  2. Search for the Kalman Reid tracker option in the left-hand sidebar and drag it onto the workspace. Then, you can set its output configuration parameters, which are outlined below.

  3. Connect the visual-detector model with the centroid-tracker operator and save your workflow.

alt text

To observe it in action, navigate to the workflow's individual page and click the + button to input your video. For this example, let's provide this video.

The workflow will analyze the video and identify objects consistently throughout its duration.

alt text

Tracker Operators Parameters

Here is a table outlining the various output configuration parameters you can configure for each operator (the ✓ symbol represents the operator that supports the parameter).

ParameterDescriptionBYTE TrackerCentroid TrackerKalman Filter Hungarian TrackerKalman Reid Tracker
min_confidenceThis is the minimum confidence score for detections to be considered for tracking
min_visible_framesOnly return tracks with minimum visible frames > min_visible_frames
track_id_prefixPrefix to add on to track and eliminate conflicts
max_disappearedThis is the number of maximum consecutive frames a given object is allowed to be marked as “disappeared” until we need to deregister the object from tracking
new_track_confidence_threshInitialize a new track if the confidence score of the new detection is greater than the setting
confidence_threshThis is used to categorize high score detections for the first association if their scores are greater, and the second association if not
high_confidence_match_threshThe distance threshold for high-score detection
low_confidence_match_threshThe distance threshold for low-score detection
unconfirmed_match_threshThe distance threshold for unconfirmed tracks, usually tracks with only one beginning frame. {“min”: 0, “max”: 1}
max_distanceAssociate tracks with detections only when their distance is below max_distance
filtered_probabilityIf false, return original detection probability; if true, return processed probability from the tracker
max_detectionMaximum detection per frame
has_probability
has_embedding
association_confidenceThe list of association confidences to perform for each round
covariance_errorMagnitude of the uncertainty on the initial state
observation_errorMagnitude of the uncertainty on detection coordinates
distance_metricDistance metric for Hungarian matching
initialization_confidenceConfidence for starting a new track. Must be > min_confidence to have an effect
project_trackHow many frames in total to the project box when detection isn’t recorded for track
use_detect_boxHow many frames to project the last detection box, should be less than project_track_frames (1 is the current frame)
project_without_detectWhether to keep projecting the box forward if no detect is matched
project_fix_box_sizeWhether to fix the box size when the track is in a project state
detect_box_fall_backRely on the detect box if the association error is above this value
keep_track_in_imageIf this is 1, then push the tracker predict to stay inside image boundaries
match_limit_ratioMultiplier to constrain association (< 1 is ignored) based on other associations
match_limit_min_matchesMinimum number of matched tracks needed to invoke match limit
optimal_assignmentIf True, rule out pairs with distance > max_distance before assignment
max_emb_distanceMaximum embedding distance to be considered a re-identification
max_deadMaximum number of frames for track to be dead before we re-assign the ID
var_trackerString that determines how embeddings from multiple timestamps are aggregated, defaults to “na” (most recent embedding overwrites past embeddings)
reid_model_pathThe path to the linker