Tracker
Learn about our tracker operators
Tracker operators are a specific type of agent system operators that are designed for object tracking in computer vision. Object tracking involves following the movement of objects in a sequence of images or frames in a video. Tracker models are trained using machine learning techniques to learn patterns and features that help them identify and track objects over time.
The goal of object tracking is to maintain the identity of the object(s) over time, despite changes in position, scale, orientation, and lighting conditions.
Since the tracker operators can be "chained" together with models to automate tasks in a workflow, you can learn how to create workflows here.
BYTE Tracker
Input: frames[…].data.regions[…].data.concepts
, frames[…].data.regions[…].region_info.bounding_box
Output: frames[…].data.regions[…].track_id
BYTE Tracker is a multi-object tracking by-detection model built upon the Simple Online and Real-time Tracking (SORT) principles. Multi-object tracking aims to predict the bounding boxes and identities of objects within video sequences.
Most tracking techniques retrieve identities by associating detection boxes whose scores are higher than a threshold. Unlike simpler trackers that ditch detections with low confidence scores, BYTE Tracker considers them, too, making it better at handling situations like temporary occlusions or lighting changes.
Typically, it works in two stages:
-
High Confidence Matches: First, BYTE Tracker focuses on high-scoring detections (bounding boxes around objects). It uses a combination of motion similarity (how much the object moved between frames) and appearance similarity (features extracted from the object) to match these detections with existing tracks (tracklets). A motion prediction technique is then used to predict the position of these tracks in the next frame.
-
Low Confidence Recovery: Here's where BYTE Tracker differs. It revisits the low confidence detections (discarded by simpler trackers) and unmatched tracklets from the previous stage. Using the same motion similarity metric, BYTE Tracker tries to re-associate these with each other, potentially recovering tracks that were lost due to occlusions or low initial confidence.
With this powerful operator, you can seamlessly integrate object tracking into your detect-track workflows and unlock advanced capabilities. Let's demonstrate how you can use the BYTE Tracker, alongside a detection model, to efficiently track objects in videos.
1. Go to the workflow builder page. Search for the visual-detector option in the left-hand sidebar and drag it onto the empty workspace. Then, use the pop-up that appears on the right-hand sidebar to search for a detection model, such as general-image-detection, and select its version. You can also set the other configuration options — including selecting the concepts you want to filter.
2. Search for the byte-tracker option in the left-hand sidebar and drag it onto the workspace. You can set up its output configuration parameters, which are outlined below.
3. Connect the visual-detector model with the byte-tracker operator and save your workflow.
To observe it in action, navigate to the workflow's individual page and click the + button to input your video. For this example, let's provide this video.
The workflow will analyze the video and identify objects consistently throughout its duration.
Centroid Tracker
Input: frames[…].data.regions[…].data.concepts
, frames[…].data.regions[…].region_info.bounding_box
Output: frames[…].data.regions[…].track_id
Centroid trackers rely on the Euclidean distance between centroids of regions in different video frames to assign the same track ID to detections of the same object.
Here's a breakdown of how they operate:
-
Object Detection: In the first step, an object detector or a segmentation model (not part of the centroid tracker itself) identifies objects in each frame of a video. The detector outputs bounding boxes around the identified objects.
-
Centroid Calculation: For each bounding box, the centroid tracker calculates its centroid. The centroid is simply the center point of the box, typically represented by its X and Y coordinates.
-
Distance Comparison: The tracker then compares the centroids of objects detected in the current frame with the centroids of objects from the previous frame. It calculates the Euclidean distance, which is a straight-line distance between two points in space.
-
Track Assignment: Based on a predefined threshold value, the tracker assigns track IDs. Objects in the current frame whose centroids are within a certain distance of a centroid in the previous frame are considered to be the same object and are assigned the same track ID. Objects with centroids exceeding the threshold distance are assumed to be new objects and assigned new track IDs.
Let's demonstrate how you can use the centroid tracker, alongside a detection model, to efficiently track objects in videos.
1. Go to the workflow builder page. Search for the visual-detector option in the left-hand sidebar and drag it onto the empty workspace. Then, use the pop-up that appears on the right-hand sidebar to search for a detection model, such as general-image-detection, and select its version. You can also set the other configuration options — including selecting the concepts you want to filter.
2. Search for the centroid-tracker option in the left-hand sidebar and drag it onto the workspace. You can set up its output configuration parameters, which are outlined below.
3. Connect the visual-detector model with the centroid-tracker operator and save your workflow.
To observe it in action, navigate to the workflow's individual page and click the + button to input your video. For this example, let's provide this video.
The workflow will analyze the video and identify objects consistently throughout its duration.
Neural Tracker
Output: Regions
Neural tracker uses neural probabilistic models to perform filtering and association.
Kalman Filter Hungarian Tracker
Output: Regions
Kalman filter trackers rely on the Kalman filter algorithm to estimate the next position of an object based on its position and velocity in previous frames. Then detections are matched to predictions by using the Hungarian algorithm.
Kalman Reid Tracker
Output: Regions
Kalman reid tracker is a Kalman filter tracker that expects the embedding proto field to be populated for detections, and reassigns track IDs based off of the embedding distance.
Neural Lite Tracker
Output: Regions
Neural lite tracker uses lightweight trainable graphical models to infer states of tracks and perform associations using the hybrid similarity of IoU and centroid distance.
Tracker Operators Parameters
Here is a table outlining the various output configuration parameters you can configure for each operator (the ✓ symbol represents the operator that supports the parameter).
Parameter | Description | BYTE Tracker | Centroid Tracker | Neural Tracker | Kalman Filter Hungarian Tracker | Kalman Reid Tracker | Neural Lite Tracker |
---|---|---|---|---|---|---|---|
min_confidence | This is the minimum confidence score for detections to be considered for tracking | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
min_visible_frames | Only return tracks with minimum visible frames > min_visible_frames | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
track_id_prefix | Prefix to add on to track and eliminate conflicts | ✓ | ✓ | ✓ | ✓ | ✓ | |
max_disappeared | This is the number of maximum consecutive frames a given object is allowed to be marked as “disappeared” until we need to deregister the object from tracking | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
new_track_confidence_thresh | Initialize a new track if the confidence score of the new detection is greater than the setting | ✓ | |||||
confidence_thresh | This is used to categorize high score detections for the first association if their scores are greater, and the second association if not | ✓ | |||||
high_confidence_match_thresh | The distance threshold for high-score detection | ✓ | |||||
low_confidence_match_thresh | The distance threshold for low-score detection | ✓ | |||||
unconfirmed_match_thresh | The distance threshold for unconfirmed tracks, usually tracks with only one beginning frame. {“min”: 0, “max”: 1} | ✓ | |||||
max_distance | Associate tracks with detections only when their distance is below max_distance | ✓ | ✓ | ✓ | ✓ | ✓ | |
filtered_probability | If false, return original detection probability; if true, return processed probability from the tracker | ✓ | |||||
max_detection | Maximum detection per frame | ✓ | |||||
has_probability | ✓ | ||||||
has_embedding | ✓ | ||||||
association_confidence | The list of association confidences to perform for each round | ✓ | ✓ | ||||
covariance_error | Magnitude of the uncertainty on the initial state | ✓ | ✓ | ||||
observation_error | Magnitude of the uncertainty on detection coordinates | ✓ | ✓ | ||||
distance_metric | Distance metric for Hungarian matching | ✓ | ✓ | ||||
initialization_confidence | Confidence for starting a new track. Must be > min_confidence to have an effect | ✓ | ✓ | ||||
project_track | How many frames in total to the project box when detection isn’t recorded for track | ✓ | ✓ | ||||
use_detect_box | How many frames to project the last detection box, should be less than project_track_frames (1 is the current frame) | ✓ | ✓ | ||||
project_without_detect | Whether to keep projecting the box forward if no detect is matched | ✓ | ✓ | ||||
project_fix_box_size | Whether to fix the box size when the track is in a project state | ✓ | ✓ | ||||
detect_box_fall_back | Rely on the detect box if the association error is above this value | ✓ | ✓ | ||||
keep_track_in_image | If this is 1, then push the tracker predict to stay inside image boundaries | ✓ | ✓ | ||||
match_limit_ratio | Multiplier to constrain association (< 1 is ignored) based on other associations | ✓ | ✓ | ||||
match_limit_min_matches | Minimum number of matched tracks needed to invoke match limit | ✓ | ✓ | ||||
optimal_assignment | If True, rule out pairs with distance > max_distance before assignment | ✓ | ✓ | ||||
max_emb_distance | Maximum embedding distance to be considered a re-identification | ✓ | |||||
max_dead | Maximum number of frames for track to be dead before we re-assign the ID | ✓ | |||||
var_tracker | String that determines how embeddings from multiple timestamps are aggregated, defaults to “na” (most recent embedding overwrites past embeddings) | ✓ | |||||
reid_model_path | The path to the linker | ✓ | |||||
iou_dist_ratio | If 1.0 purely IoU similarity, if 0.0 purely centroid distance similarity | ✓ | |||||
mortal_th | Mortality threshold | ✓ | |||||
min_box_area | Minimum area of a valid box | ✓ | |||||
min_activity | Returns only tracks with activities above min_activity | ✓ | |||||
nms_iou_th | NMS IoU threshold | ✓ | |||||
shrink_factor | Change box size by shrink_factor | ✓ |