A Complete Review of the OpenCV Object Tracking Algorithms

Introduction

Most modern solutions of the object tracking problem assume the presence of a pre-trained classifier that allows you to accurately determine the object you track, whether it is a car, person, animal, etc. These classifiers are, as a rule, trained on tens to hundreds of thousands of images, which makes it possible to study the patterns of the selected classes and subsequently to detect the object. But, what if the user can’t find a suitable classifier or train their own? In this case, OpenCV object tracking provides solutions that use the “online” or “on the fly” training.

What is object tracking?

The tracking process can be thought of as a combination of two models: the motion model and the appearance model. The motion model tracks the speed and direction of the object's movement, which allows it to predict a new position of the object based on the received data. At the same time, the appearance model is responsible for determining if the object we have selected is inside the frame. In the case of using pre-trained classifiers, the coordinates of the bounding box containing the object would be determined automatically, whereas by using “online” training,  we specify the bounding box manually and the classifier does not have training data, except for those that it can receive while tracking the object. It is worth noting that tracking algorithms can be divided into two groups: single-object tracking and multi-object tracking algorithms, we will consider the former.

Figure 1. Object tracking example. Source: Object Tracking in Videos: Introduction and Common Techniques - AIDETIC BLOG

Still, what is the difference between detecting an object and tracking it using OpenCV object tracking methods? There are several key differences:

  1. Tracking is faster than detection. While the pre-trained classifier needs to detect an object at every frame of the video (which leads to potentially high computational loads), to utilize an object tracker we specify the bounding box of an object once and based on the data on its position, speed, and direction, the tracking process goes faster.
  2. Tracking is more stable. In cases where the tracked object is partially overlapped by another object, the detection algorithm may “lose” it, while the tracking algorithms are more robust to partial occlusion.
  3. Tracking provides more information. If we are not interested in the belonging of an object to a specific class, the tracking algorithm allows us to track the movement path of a specific object, while the detection algorithm cannot.

Let's get some practice!

The OpenCV library provides 8 different object tracking methods using online learning classifiers. Let's dwell on them in more detail. But first of all, make sure that your environment is ready to work. As you have guessed, we need the OpenCV library installed. We suggest you install the opencv-contrib-python library instead of opencv-python to avoid issues during the tracker initialization. Just type in your console:

pip install opencv-contrib-python

Now, let’s create a Jupyter-notebook and declare our trackers.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
import cv2

tracker_types = ['BOOSTING', 'MIL','KCF', 'TLD', 'MEDIANFLOW', 'GOTURN', 'MOSSE', 'CSRT']
tracker_type = tracker_types[5]

if tracker_type == 'BOOSTING':
    tracker = cv2.legacy.TrackerBoosting_create()
if tracker_type == 'MIL':
    tracker = cv2.TrackerMIL_create() 
if tracker_type == 'KCF':
    tracker = cv2.TrackerKCF_create() 
if tracker_type == 'TLD':
    tracker = cv2.legacy.TrackerTLD_create() 
if tracker_type == 'MEDIANFLOW':
    tracker = cv2.legacy.TrackerMedianFlow_create() 
if tracker_type == 'GOTURN':
    tracker = cv2.TrackerGOTURN_create()
if tracker_type == 'MOSSE':
    tracker = cv2.legacy.TrackerMOSSE_create()
if tracker_type == "CSRT":
    tracker = cv2.TrackerCSRT_create()

Note, that several tracking algorithms have been removed from the official OpenCV release and moved to the “legacy” section. Also, the GOTURN tracker requires additional files, so you can download them from this link.

Next, let’s get our video and start tracking.

22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
# Get the video file and read it
video = cv2.VideoCapture("walking.mp4")
ret, frame = video.read()

frame_height, frame_width = frame.shape[:2]
# Resize the video for a more convinient view
frame = cv2.resize(frame, [frame_width//2, frame_height//2])
# Initialize video writer to save the results
output = cv2.VideoWriter(f'{tracker_type}.avi', 
                         cv2.VideoWriter_fourcc(*'XVID'), 60.0, 
                         (frame_width//2, frame_height//2), True)
if not ret:
    print('cannot read the video')
# Select the bounding box in the first frame
bbox = cv2.selectROI(frame, False)
ret = tracker.init(frame, bbox)
# Start tracking
while True:
    ret, frame = video.read()
    frame = cv2.resize(frame, [frame_width//2, frame_height//2])
    if not ret:
        print('something went wrong')
        break
    timer = cv2.getTickCount()
    ret, bbox = tracker.update(frame)
    fps = cv2.getTickFrequency() / (cv2.getTickCount() - timer)
    if ret:
        p1 = (int(bbox[0]), int(bbox[1]))
        p2 = (int(bbox[0] + bbox[2]), int(bbox[1] + bbox[3]))
        cv2.rectangle(frame, p1, p2, (255,0,0), 2, 1)
    else:
        cv2.putText(frame, "Tracking failure detected", (100,80), 
                    cv2.FONT_HERSHEY_SIMPLEX, 0.75,(0,0,255),2)
    cv2.putText(frame, tracker_type + " Tracker", (100,20), 
                cv2.FONT_HERSHEY_SIMPLEX, 0.75, (50,170,50),2)
    cv2.putText(frame, "FPS : " + str(int(fps)), (100,50), 
                cv2.FONT_HERSHEY_SIMPLEX, 0.75, (50,170,50),2)
    cv2.imshow("Tracking", frame)
    output.write(frame)
    k = cv2.waitKey(1) & 0xff
    if k == 27 : break
        
video.release()
output.release()
cv2.destroyAllWindows()

Let’s inspect our code and consider key moments. On Lines 23-24 we get our video file and open it. For faster processing, we decrease the size of each frame on Lines 28, 41 and then create a video writer to save our tracking video with the same size on Lines 30-32.


To manually select the object for tracking we call OpenCV function cv2.selectROI on Line 36 (you will see the window for manual selection) and initialize our tracker with this information on Line 37.

Figure 2. ROI selection.

To update the location of the object we call the .update() method for our frame on Line 46. Next, in the loop, our tracker will process the video frame by frame and save the results in the video file until we stop the loop using commands on Lines 61-62.

Now, let’s briefly consider each OpenCV object tracker methodology and look at the results we get.

BOOSTING Tracker

This method is based on the online version of the AdaBoost algorithm - the algorithm increases the weights of incorrectly classified objects, which allows a weak classifier to “focus” on their detection.

Since the classifier is trained “online”, the user sets the frame in which the tracking object is located. This object is initially treated as a positive result of detection, and objects around it are treated as the background.

Receiving a new image frame, the classifier scores the surrounding detection pixels from the previous frame and the new position of the object will be the area where the score has the maximum value.

Pros: an object is tracked quite accurately, even though the algorithm is already outdated.

Cons: relatively low speed, strong susceptibility to noise and obstacles, and the inability to stop tracking when the object is lost.

Figure 3. BOOSTING tracker results.

MIL (Multiple Instance Learning) Tracker

This algorithm has the same approach as BOOSTING, however, instead of guessing where the tracked object is in the next frame, an approach is used in which several potentially positive objects, called a “bag”, are selected around a positive definite object. A positive “bag” contains at least one positive result.

Pros: more robust to noise, shows fairly good accuracy.

Cons: relatively low speed and the impossibility of stopping tracking when the object is lost.

Figure 4. MIL tracker results.

KCF (Kernelized Correlation Filters) Tracker

Is a combination of two algorithms: BOOSTING and MIL. The concept of the method is that a set of images from a “bag” obtained by the MIL method has many overlapping areas. Correlation filtering applied to these areas makes it possible to track the movement of an object with high accuracy and to predict its further position.

Pros: sufficiently high speed and accuracy, stops tracking when the tracked object is lost.

Cons: inability to continue tracking after the loss of the object.

Figure 5. KCF tracker results.

TLD (Tracking Learning Detection) Tracker

This method allows you to decompose the task of tracking an object into three processes: tracking, learning and detecting. The tracker (based on the MedianFlow tracker) tracks the object, while the detector localizes external signs and corrects the tracker if necessary. The learning part evaluates detection errors and prevents them in the future by recognizing missed or false detections.

Pros: shows relatively good results in terms of resistance to object scaling and overlapping by other objects.

Cons: rather unpredictable behavior, there is the instability of detection and tracking, constant loss of an object, tracking similar objects instead of the selected one.

Figure 6. TLD tracker results.

MedianFlow Tracker

This algorithm is based on the Lucas-Kanade method. The algorithm tracks the movement of the object in the forward and backward directions in time and estimates the error of these trajectories, which allows the tracker to predict the further position of the object in real-time.

Pros: sufficiently high speed and tracking accuracy, if the object isn’t overlapped by other objects and the speed of its movement is not too high. The algorithm quite accurately determines the loss of the object.

Cons: high probability of object loss at high speed of its movement.

Figure 7. MedianFlow tracker results.

GOTURN (Generic Object Tracking Using Regression Network) Tracker

This algorithm is an “offline” tracker since it basically contains a deep convolutional neural network. Two images are fed into the network: “previous” and “current”. In the “previous” image, the position of the object is known, while in the “current” image, the position of the object must be predicted. Thus, both images are passed through a convolutional neural network, the output of which is a set of 4 points representing the coordinates of the predicted bounding box containing the object. Since the algorithm is based on the use of a neural network, the user needs to download and specify the model and weight files for further tracking of the object.

Pros: comparatively good resistance to noise and obstructions.

Cons: the accuracy of tracking objects depends on the data on which the model was trained, which means that the algorithm may poorly track some objects selected by the user. Loses an object and shifts to another if the speed of the first one is too high.

Figure 8. GOTURN tracker results.

MOSSE (Minimum Output Sum of Squared Error) tracker

This algorithm is based on the calculation of adaptive correlations in Fourier space. The filter minimizes the sum of squared errors between the actual correlation output and the predicted correlation output. This tracker is robust to changes in lighting, scale, pose, and non-rigid deformations of the object.

Pros: very high tracking speed, more successful in continuing tracking the object if it was lost.

Cons: high likelihood of continuing tracking if the subject is lost and does not appear in the frame.

Figure 9. MOSSE tracker results.

CSRT (Discriminative Correlation Filter with Channel and Spatial Reliability) tracker

This algorithm uses spatial reliability maps for adjusting the filter support to the part of the selected region from the frame for tracking, which gives an ability to increase the search area and track non-rectangular objects. Reliability indices reflect the quality of the studied filters by channel and are used as weights for localization. Thus, using HoGs and Colornames as feature sets, the algorithm performs relatively well.

Pros: among the previous algorithms it shows comparatively better accuracy, resistance to overlapping by other objects.

Cons: sufficiently low speed, an unstable operation when the object is lost.

Figure 10. CSRT tracker results.

Time to sum up

In this article, we have discussed general ideas of OpenCV object tracking algorithms and compared their performance on the video example. If you are still confused in the decision of what algorithm to choose, we suggest starting with KCF, MOSSE and CSRT object trackers. We hope this guide will lead you to better and faster solutions in your future work on object tracking.

Questions? Ask Us Now!

Other articles