SlideShare a Scribd company logo
1 of 11
Download to read offline
http://www.iaeme.com/IJMET/index.asp 465 editor@iaeme.com
International Journal of Mechanical Engineering and Technology (IJMET)
Volume 10, Issue 05, May 2019, pp. 465-475, Article ID: IJMET_10_05_047
Available online at http://www.iaeme.com/ijmet/issues.asp?JType=IJMET&VType=10&IType=5
ISSN Print: 0976-6340 and ISSN Online: 0976-6359
© IAEME Publication
MULTIPLE HUMAN TRACKING USING
RETINANET FEATURES, SIAMESE NEURAL
NETWORK, AND HUNGARIAN ALGORITHM
Dina Chahyati, Aniati Murni Arymurthy
Machine Learning and Computer Vision Laboratory
Faculty of Computer Science, Universitas Indonesia, Depok, Indonesia
ABSTRACT
Multiple human tracking based on object detection has been a challenge due to its
complexity. Errors in object detection would be propagated to tracking errors. In this
paper, we propose a tracking method that minimizes the error produced by object
detector. We use RetinaNet as object detector and Hungarian algorithm for tracking.
The cost matrix for Hungarian algorithm is calculated using the RetinaNet features,
bounding box center distances, and intersection of unions of bounding boxes. We
interpolate the missing detections in the last step. The proposed method yield 43.2
MOTA for MOT16 benchmark.
Key words: RetinaNet, tracking by detection, Hungarian algorithm, Siamese neural
network, interpolation
Cite this Article: Dina Chahyati, Aniati Murni Arymurthy, Multiple Human Tracking
Using Retinanet Features, Siamese Neural Network, and Hungarian Algorithm,
International Journal of Mechanical Engineering and Technology 10(5), 2019, pp.
465-475.
http://www.iaeme.com/IJMET/issues.asp?JType=IJMET&VType=10&IType=5
1. INTRODUCTION
Multiple object tracking has been a challenge for researchers over a decade. Vehicle and
human tracking have been dominating this field since they are very important for surveillance
system. Human tracking has its own challenges because of a wide variety of human
appearance and severe occlusions in most of the scenes.
In the beginning, researchers focused on tracking by trying to predict the path. Methods
such as Kalman filter and optical flow [1][2][3] were commonly used in this approach.
Nowadays many researchers change the approach to tracking by detection [4]–[6]. Detections
by using HOG or its variation such as DPM are quite popular and has been used in multiple
object tracking (MOT) benchmark [7]. However, DPM has limitation for recognizing more
complex objects and its properties such as variation of clothes, bags, activities, gender, etc.
Our future research topic is to track movements of people with a specific gender, therefore we
cannot rely on DPM for the detection step. A more reasonable option would be to use deep
learning approach for the detection step because they can be trained to detect more specific
categories by using transfer learning.
Dina Chahyati, Aniati Murni Arymurthy
http://www.iaeme.com/IJMET/index.asp 466 editor@iaeme.com
There are many deep learning architectures for image detection such as YOLO, SSD,
RetinaNet [8], etc. In our research, we use RetinaNet for image detection. RetinaNet is a one-
stage object detector that performs better than state of the art two-stage methods when it was
released in late 2017 outperforming YOLOv2 and SSD513. RetinaNet may use VGG,
DenseNet, Mobilenet,,or ResNet as its backbone.
Unfortunately, being the state of the art does not mean perfect. There are still limitations
in the object detection result that makes it difficult to use for object tracking, as it would
propagate errors. There are at least two errors that an object detector commonly suffer from:
false positive (FP) and false negative (FN) detection. FP may appear in three forms as shown
in Fig.1, one is overdetection (two persons detected as three), second is under detection (two
persons detected as one), and the third is inconsistent size.
FN means that the detector cannot find a person in a certain location while there should be
one. There are two categories in false negatives: when a person is detected in some frames but
not in between frames, or when a person is totally undetected all over the scene.
(a) (b) (c)
Figure 1 Problems in detections: (a) overdetection, (b) under detection, (c) inconsistent size
We should be able to minimize these errors if we retrain or fine tune the model. However,
deep learning models need high-performance computing resources or at least GPU for the
training process. In this paper, we propose a way of handling the detection errors in the
tracking process in a limited computing environment, where GPU is not. Nevertheless, even
after fine-tuning, every deep-learning based detection method would still yield errors in the
detection. In this paper, we want to find a way of optimizing the detection result for tracking.
We use RetinaNet for object detection with ResNet50 [9] as the backbone. We then use the
combination of bounding box center, intersection of union, and feature similarity from
Siamese network as input for Hungarian Algorithm as object association technique.
2. RELATED WORK
There are major components that we used in our research: object detector (RetinaNet),
method to associate objects between frames (Hungarian algorithm), method to evaluate the
similarty between two feature vector (Siamese Network), and evaluation of tracking result.
We will explain each of those concepts briefly.
2.1. RETINA NET
We use RetinaNet since it is considered as one of the most cited object detector that is
available for public use. RetinaNet claimed as the first one-stage object detector that matches
the state-of-the-art COCO AP of more complex two-stage detector such as Feature Pyramid
Network (FPN) or variants of Faster R-CNN [10]. Two-stage detector means that in the first
stage the system generates sparse candidate set of object locations or bounding boxes, and in
Multiple Human Tracking Using Retinanet Features, Siamese Neural Network, and Hungarian Algorithm
http://www.iaeme.com/IJMET/index.asp 467 editor@iaeme.com
the second stage the system classifies the object into several classes. One-stage detector does
the classification over dense object detectors that usually performs faster but less accurate
than two-stage detector.
RetinaNet is a single, unified network that consists of a backbone network (compute
convolutional feature map of input image) and two-task specific subnetworks (object
classification and bounding box regression). The key innovations of RetinaNet is the use of a
new loss function called Focal loss to improve the accuracy of the detector while still
maintaining its detection speed.
In our research, we use ResNet as backbone, as suggested by the RetinaNet paper [8]. It
consists of total 216 layers. The feature for each object detected that we used in this paper is a
256-vector extracted from 2D convolutional layers P3, P4, P5, P6, and P7. P3 to P5 are
computed from the output of the corresponding ResNet residual stage (C3 to C5).
As an illustration, suppose we have a typical 1920x1080 MOT image. The image is then
resized such that its longer side is 1333 pixel. The resized image is the divided into anchor
boxes with various sizes. In our case, the image is resized into 1333 x 750 pixel and the
output feature maps are shown in Table 1.
Table 1 RetinaNet Feature Map Size
Layer
Size of Feature Map
Number of
Feature Vector
Anchor Size
P3 167 x 94 x 256 15698 8 x 8
P4 84 x 47 x 256 3948 16 x 16
P5 42 x 24 x 256 1008 32 x 32
P6 21 x 12 x 256 252 64 x 64
P7 11 x 6 x 256 66 128 x 128
Total 20972
Each of those 20972 feature vector sized 256 are then associated with three scale (20
, 21/3
,
22/3
) and three aspect ratio (1:2, 1:1, 2:1) of the original size. All of the output are then
concatenated as input for classification and regression layers in the RetinaNet. The regression
layer is responsible for predicting the appropriate bounding boxes for each feature maps under
9 corresponding scales and ratios. Thus, for our image we have 20972 x 9 = 188748 candidate
bounding boxes.
2.2. Hungarian Algorithm
Suppose we input two consecutive frames, t and t + 1 to RetinaNet and find detected persons
in each frame. Since each frame consists of many persons, we need to associate or assign each
person in frame t and t + 1 in order to track them. One of the most known and simplest
methods of assignment problem is the Hungarian algorithm. Given a cost matrix, Hungarian
algorithm will match each elements of row and columns in order to find the minimum cost, as
illustrated in Fig.2. Person A will be associated with person D, B with E, dan C with F
because this combination yield the minimum total cost.
If we want to use Hungarian algorithm for tracking, we have to provide it with appropriate
cost matrix between each persons in the frames. Other than the position of the bounding box,
we want to take into account the convolutional feature of each object because it should
represents the visual appearance. Suppose we have a pair of feature vector representing two
detected person in different frames, then we must find a metric that evaluate how similar
those vectors are.
Dina Chahyati, Aniati Murni Arymurthy
http://www.iaeme.com/IJMET/index.asp 468 editor@iaeme.com
Figure 2 Example of Hungarian algoritm applied for tracking between frames
The intuition used is that similar objects should have considerably similar feature vector.
This is true for some of the objects but untrue for other ones. We have tried using simple
Euclidean distance, but the result shows that some objects have very far feature difference
even though their appearance look similar, as shown in Fig. 3.
Frame 1 Frame 2
Figure 3 Euclidean distance of feature vectors are not always consistent with their visual appearance
Table 2
Frame 1 Frame 2 Bounding box
distance
Feature
vector
distance
7001 7001 1.00 10.71
133151 133151 3.00 20.30
147751 149263 1.00 45.67
151549 151549 4.12 17.96
151828 151828 3.16 16.47
152520 152520 5.10 7.37
160127 160127 1.00 9.23
163763 163763 1.41
Multiple Human Tracking Using Retinanet Features, Siamese Neural Network, and Hungarian Algorithm
http://www.iaeme.com/IJMET/index.asp 469 editor@iaeme.com
2.3. Siamese Neural Network
The unexpected far distances between similar-looking objects in consecutive frames may be
caused by some noises in the images that is invisible to our eyes. Therefore we need more
robust way to define similarity between two RetinaNet vector features. We uses Siamese
Network introduced by [11]. In their paper, the writer noted that Siamese network is good for
classifying objects that has only few instances per class but the number of classes are many.
Siamese Neural Network (SNN), as the name explains, consists of twin functions GW
which share the same set of parameters W, and a cost module that generates distance
‖ ( ) ( )‖. The input of SNN is a pair of images (x1, x2) and a label Y. The
label Y may be Y = 0 (similar) or Y = 1 (dissimilar). The output of Siamese network is a
similarity number, e.g. if it is less than 0.5 then it is considered similar.
2.4. Evaluation Method
One of the most popular tracking evaluation is MOT evaluation metric [7]. We use 6 tracking
metrics of MOT:
 Recall: percentage of correctly detected targets, compared to the ground truth
 MT: mostly tracked trajectories, more than 80% of ground truth trajectory length is tracked
 FP: number of false positives
 FN: number of missed targets
 IDs: number of ID switches. ID switch happens when a person is detected as different person
due to missed association or is it was occluded by other objects
 MOTA: multi-object tracking accuracy in [0,100], MOTA = 1 – error, where error is defined
as (FN+FP+IDs) divided by the ground truth
3. PROPOSED METHOD
In this section, we discuss more details about the method proposed and the dataset that we
used. The flowchart of our proposed method for the first three frames is shown in Fig. 4. The
process of Frame 4 and the rest are the same with Frame 3. Each frame goes to RetinaNet to
get the bounding box and convolutional (CNN) features.
Figure 4 Flowchart of our method
Dina Chahyati, Aniati Murni Arymurthy
http://www.iaeme.com/IJMET/index.asp 470 editor@iaeme.com
The original RetinaNet code only outputs the best 300 filtered bounding boxes along with
the information of what object might be at the bounding box (person, car, etc). We use
resnet5_coco_best_v2.1.0 model in the RetinaNet that classifies objects into 80 classes, and
take only the “person” class. In order to get the Conv2D feature, we need to extract the
contents of RetinaNet intermediate layers that stores the complete 188748 candidate bounding
boxes (the clip_boxes layer) and find the corresponding Conv2D feature vector. These feature
vector may come from layer P3, P4, P5, P6, or P7. In our figures, the number next to each
bounding box is the number that identifies which box is it from 188748 candidate. Suppose
the box number is 150019, then it came from layer P4, because 150019/9 is within the range
of P4.
After collecting RetinaNet features for every detected person in all frame, we need to
associate same persons in each consecutive frame using Hungarian algorithm. Two key
components of Hungarian algorithm is to define the sets of task (row) and assignee (column)
to be associated and the cost function between each pair of task-assignee. The straightforward
way is to consider persons detected in frame t as task and persons detected in frame t + 1 as
assignee, but since we have occlusion problems, we cannot do this. Instead of pairing the
persons in consecutive frames, we try to pair the persons in frame t + 1 to the set of persons
found in frame 1 to frame t. This would solve the problem when a person is undetected in
intermediate frames due to occlusion or missed detection. In Fig. 5, this process is shown in
the processing of Frame 3, where the input of SNN in this step comes from RetinaNet output
of Frame 3 as assignee and the list of of all tracked person (not only the output of frame t – 1)
as tasks.
Another important components of Hungarian algorithm is how to define the cost between
any pair of detected persons in different frames. In our study we define the cost as
combination of three metrics: the distance of bounding box centers, intersection of union
(IoU) of the bounding box, and the output of Siamese network. The distance of bounding box
centers is used for simplicity, because in most cases, the distance of the same person in
consecutive frames are quite near compare to others.
The intersection of union between two candidate pairs is useful when the bounding box
size of the same person are not the same in consecutive frames, as shown in Fig.1c. We can
see that the height of the same person are detected quite differently. If we only use the
distance of bounding box centers, they will be considered different person because the
distance is too far. By using intersection of union between the two bounding boxes, we can
consider these detected person as the same person.
The SNN that we use to calculate feature similarity consists of three 128-node dense layer
with Relu activation, alternated with two 10% drop-out layers. We trained the SNN with 300
first frames of each training scenes of MOT16. We combine all the data from all scenes to
build a more general SNN that can be used throughout all test scenes of MOT16.
IoU is defined as the ratio between intersection and union of each pair of bounding box, so
the scale is [0,1]. The output of SNN is a number such that the two input images is considered
similar if the output is less than 0.5. To match up with the previous two metrics, the distance
of bounding box centers need to be normalized. We normalize it by dividing with a number
such that it is considered close if the distance is less than 0.5. In our case, we get number 60
as normalizer that comes from the observation of scene 04 MOT16 dataset.
Original Hungarian algorithm accepts square cost matrix where the rows and columns
indicates task and assignee. Further modification allows non-square matrix as input, where the
number of assignment will follow the minimum of row or column size. In our case, this will
be a problem when we have a new person in current frame, since he will be forced to be
matched to existing persons found so far. That is why we need to apply certain threshold to
Multiple Human Tracking Using Retinanet Features, Siamese Neural Network, and Hungarian Algorithm
http://www.iaeme.com/IJMET/index.asp 471 editor@iaeme.com
the cost matrix such that if a detected person in current frame has far distances with our
existing list of persons, then he will be considered new person, and he will be omitted from
the input cost matrix of the Hungarian algorithm. In our experiment we use the threshold such
that if the sum of all three metrics is greater than 1.8, it will be considered new person.
After running the process for all the frames in the scene, we do other adjustments. The
first is to delete all tracks whose length is less than five. This step is important for deleting
tracks caused by false positives (overdetection as shown in Fig. 1). We chose five as threshold
because we notice that in MOT16 train dataset, overdetection of RetinaNet does not appear in
a long range.
Other additional adjustment is to apply interpolation to the resulting track. Some objects
may be miss-detected in certain frame but detected before or after. We tried to add the
bounding box of those missed object to decrease the number of false negatives.
4. RESULT AND DISCUSSION
We used MOT16 dataset as our benchmark. We run the experiments on computer with
specification of Intel Core i7-7700 CPU @3.69 GHz, 8 GB RAM. Since we only use
RetinaNet for testing and not for training nor fine tuning, we did not use the GPU. SNN
training is also done in CPU mode. The result of MOT16 test scenes are shown in Table II.
The detection recall is 49.4, means that less than half of total person are detected.
Table 2 Tracking Result of Proposed Method
Scene Recall MT FP FN IDs MOTA
01 44.0 26.1 % 140 3579 38 41.3
03 51.1 16.9 % 5271 51151 495 45.6
06 69.2 30.8 % 1644 3554 281 52.5
07 49.4 14.8 % 731 8267 138 44.0
08 39.6 20.6 % 937 10108 134 33.2
12 55.3 23.2 % 381 3707 47 50.2
14 36.1 6.1% 717 11805 359 30.3
Overall 49.4 19.8% 9821 92171 1492 43.2
Compared to other method in MOT16 benchmark, we have not beat state of the art tracker
HCC [12] with overall MOTA 49.3 for public detector category. This is because we did not
do any training nor fine-tuning for the detector due to computing environment limitation. We
use existing publicly available model for object detection and use methods that does not
require high computational cost.
Table 3 Comparison with Other Methods
Method MT FP FN IDs MOTA
HISP_T [13] 7.8% 6412 107918 2594 35.9
AM_ADM [14] 7.1% 8503 99891 789 40.1
Ours 19.8% 9821 92171 1492 43.2
LMP [6] 18.2% 6654 86245 481 48.8
HCC [12] 17.8% 5333 86795 391 49.3
We notice that the reason of our low MOTA is caused by the high number of FP, FN and
IDs. Since scene 3 contributes the highest errors, and the corresponding training scene for that
scene is scene 4, we captured examples of FP and FN in scene 4 in Fig. 5. The blue boxes are
the output of RetinaNet. The red boxes surrounding blue are the detection results that are
considered as FP. The green boxes are the FNs, those which are supposed to be there but
Dina Chahyati, Aniati Murni Arymurthy
http://www.iaeme.com/IJMET/index.asp 472 editor@iaeme.com
failed to be detected by RetinaNet. As we can see, RetinaNet failed to detect person if they are
occluded. Persons in the bottom frame like A and B are considered both FP and FN because
the real bounding box in ground truth considers the whole body, while the detection bounding
box only counts what is seen in the image, so the height of the detected and ground truth
object is different. This also cause the FP person C. RetinaNet only sees the upper body part
while the ground truth consider the whole body. Persons in D are also not detected because
only small part of legs are shown. In E, two persons are considered one, because of the severe
occlusion.
The high number of ID switch also remains a problem. We explore the reason and found
that one of the reason is because of the high number of FN and FP. As we can see in Fig. 6,
one of the detection in Frame 15 is considered FP (red box) because it failed to satisfy the IoU
with ground truth threshold. Detection (blue) 150019 is considered to be the lower person in
Frame 15, and detection 151531 is considered FP. In this case, the evaluation program says
that there is only one person in Frame 15 and he is associated as the lower person. However,
the two persons in Frame 16 are detected correctly, since they are not considered FN nor FP.
In this case, in Frame 16, detection 150019 is considered as the upper person. This mistake is
propagated in the tracking process, as shown in Fig. 7. Our method actually tracked correctly
the upper (pale yellow track) and lower (pale green track) person, but it was considered as ID
switch by the evaluator program. As we can see the track was correct when it is extended to
frame 28. Therefore, even though the ID switch number is high for our method, it does not
always represents a mistake.
Figure 5 Examples of FP dan FN in scene 4 frame 50
Figure 6 Example of detection (blue), ground truth (yellow), FP (red), FN (green) bounding boxes
Multiple Human Tracking Using Retinanet Features, Siamese Neural Network, and Hungarian Algorithm
http://www.iaeme.com/IJMET/index.asp 473 editor@iaeme.com
Figure 7 Example of cases that is considered as ID switch (cyan box) while actually it is not
As we have mentioned before in the introduction section, there are five problems that
object detector commonly suffers. The discussion of whether our tracking method has
overcome these problems is as follows.
Overdetection
Overdetection is the situation when two persons are detected as three or more. Fortunately,
RetinaNet does not over detect consistently in consecutive frames. For example, in scene 3,
overdetection of two persons in Fig. 8 occur in frame 7, 8, 9, 33. We tried to overcome this
problem by putting a threshold of track length. If a track length is less than 5, then it is
deleted.
Under detection
Under detection is when two person is detected as one. In our approach, the detection is
assigned to one person, whose similarity is closer.
Inconsistent size
Since we use the information of IoU, even though the size is not consistent, our method were
still able to track correctly.
Undetected in some frames
Since we keep the list of all person detected from frame 1 to frame N and compare the
detection of frame N + 1 with that list, person who were undetected in some frame were still
recognized after skipping some frames. Interpolation in the last step also helps to recover the
missing detections. If we compare Table 2 (with interpolation) and Table 4 (without
interpolation), we can see that interpolation increase the MOTA from 41.4 to 43.2. Even
though it caused unexpected FP by around 2000, but it also decrease the FN by around 6000
as expected.
Undetected in all frames
As for this problem, there is no way to solve this problem unless we improve the accuracy of
the detector. We should fine tune RetinaNet with new examples specific to the dataset in
order to improve the accuracy of the detector. Fine tuning the detector (RetinaNet) is out of
the scope of this paper.
Table 4 Tracking Result of Proposed Method Without Interpolation
Scene Recall MT FP FN IDs MOTA
01 40.9 17.4% 95 3779 43 38.7
03 46.9 10.8% 4193 55516 578 42.3
06 66.4 26.2% 992 3878 296 55.6
07 47.0 9.3% 462 8646 145 43.3
08 38.6 19.0% 725 10283 143 33.4
12 53.5 20.9% 259 3859 59 49.6
14 32.7 4.9% 357 12444 345 28.9
Overall 46.0 15.9% 7083 98405 1609 41.3
Dina Chahyati, Aniati Murni Arymurthy
http://www.iaeme.com/IJMET/index.asp 474 editor@iaeme.com
The example of each problems is shown in Fig. 8. The white boxes represents the
detections that is tracked. The blue box shows the detections that is not included in the final
track because it was suspected to be FP (track length less than 5). The orange boxes are the
interpolated boxes. RetinaNet originally does not detect these bounding boxes, but they were
obtained by interpolation of bounding boxes in the last step.
The drawbacks of our method is the great dependency with the threshold chosen, which is
1.8 in our experiment. If the sum of bounding box center distance, the ratio of IoU and the
feature similarity is greater than 1.8, we consider it a new person that would yield new track.
The moving speed of person or camera would greatly affect the bounding box center distance,
and thus the threshold should be adjusted. This will be the focus of our next research.
Figure 8 Example of tracking result
6. RESULT AND DISCUSSION
Publicly available RetinaNet model without fine tuning are able to detect persons in MOT16
dataset with about 46% recall. This means that there are still problems with the detections
results. Our proposed method shows that without the need of high performance computing
environment, we are able to solve some problems in detections for tracking. Even though the
MOTA out our method has not beat the state the art benchmark for MOT16, we hope that our
result can be a baseline for other methods with similar circumstances, i.e. making use of
convolutional features from deep learning architecture object detectors, but working in non-
GPU computing environment.
REFERENCES
[1] X. Li, K. Wang, W. Wang, and Y. Li, “A multiple object tracking method using Kalman
filter,” 2010 IEEE Int. Conf. Inf. Autom. ICIA 2010, vol. 1, no. 1, pp. 1862–1866, 2010.
[2] S. Shantaiya, K. Verma, and K. Mehta, “Multiple Object Tracking using Kalman Filter
and Optical Flow,” Eur. J. Adv. Eng. Technol., vol. 2, no. 2, pp. 34–39, 2015.
[3] A. Kulkarni and E. Rani, “KALMAN Filter Based Multiple Object Tracking System,” Int.
J. Electron. Commun. Instrum. Eng. Res. Dev., vol. 8, no. 2, pp. 1–6, 2018.
[4] L. Leal-Taix, C. Canton-Ferrer, and K. Schindler, “Learning by tracking : Siamese CNN
for robust target association,” in CVPR, 2016, pp. 33–40.
[5] K. Zhang, Q. Liu, Y. Wu, and M. Yang, “Robust Visual Tracking via Convolutional
Networks Without Training,” IEEE Trans. IMAGE Process., vol. 25, no. 4, pp. 1779–
1792, 2016.
[6] S. Tang, M. Andriluka, B. Andres, and B. Schiele, “Multiple People Tracking by Lifted
Multicut and Person Re-identification,” in CVPR, 2017.
[7] A. Milan, L. Leal-Taixe, I. Reid, S. Roth, and K. Schindler, “MOT16: A Benchmark for
Multi-Object Tracking,” pp. 1–12, 2016.
Multiple Human Tracking Using Retinanet Features, Siamese Neural Network, and Hungarian Algorithm
http://www.iaeme.com/IJMET/index.asp 475 editor@iaeme.com
[8] T. Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal Loss for Dense Object
Detection,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2017-Octob, pp. 2999–3007, 2017.
[9] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in
CVPR, 2016, pp. 770–778.
[10] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN : Towards Real-Time Object
Detection with Region Proposal Networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol.
39, no. 6, pp. 1–14, 2017.
[11] R. Hadsell, S. Chopra, and Y. LeCun, “Dimensionality Reduction by Learning an
Invariant Mapping,” in CVPR, 2006.
[12] L. Ma and S. Tang, “Customized Multi-Person Tracker,” in ACCV, 2018, pp. 1–16.
[13] N. L. Baisa, “Online Multi-target Visual Tracking using a HISP Filter,” in International
Conference on Computer Vision Theory and Applications, 2018, no. March, pp. 429–438.
[14] S. H. Lee, M. Y. Kim, and S. H. Bae, “Learning discriminative appearance models for
online multi-object tracking with appearance discriminability measures,” IEEE Access,
vol. 6, pp. 67316–67328, 2018.

More Related Content

What's hot

Gesture recognition system
Gesture recognition systemGesture recognition system
Gesture recognition systemeSAT Journals
 
Object Detection and tracking in Video Sequences
Object Detection and tracking in Video SequencesObject Detection and tracking in Video Sequences
Object Detection and tracking in Video SequencesIDES Editor
 
A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsDevansh16
 
A Literature Survey: Neural Networks for object detection
A Literature Survey: Neural Networks for object detectionA Literature Survey: Neural Networks for object detection
A Literature Survey: Neural Networks for object detectionvivatechijri
 
IRJET - Hand Gesture Recognition to Perform System Operations
IRJET -  	  Hand Gesture Recognition to Perform System OperationsIRJET -  	  Hand Gesture Recognition to Perform System Operations
IRJET - Hand Gesture Recognition to Perform System OperationsIRJET Journal
 
Video object tracking with classification and recognition of objects
Video object tracking with classification and recognition of objectsVideo object tracking with classification and recognition of objects
Video object tracking with classification and recognition of objectsManish Khare
 
Implementation of high performance feature extraction method using oriented f...
Implementation of high performance feature extraction method using oriented f...Implementation of high performance feature extraction method using oriented f...
Implementation of high performance feature extraction method using oriented f...eSAT Journals
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Inpainting scheme for text in video a survey
Inpainting scheme for text in video   a surveyInpainting scheme for text in video   a survey
Inpainting scheme for text in video a surveyeSAT Journals
 
Research on object detection and recognition using machine learning algorithm...
Research on object detection and recognition using machine learning algorithm...Research on object detection and recognition using machine learning algorithm...
Research on object detection and recognition using machine learning algorithm...YousefElbayomi
 
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural Network
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural NetworkTargeted Visual Content Recognition Using Multi-Layer Perceptron Neural Network
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural Networkijceronline
 
IMAGE RECOGNITION USING MATLAB SIMULINK BLOCKSET
IMAGE RECOGNITION USING MATLAB SIMULINK BLOCKSETIMAGE RECOGNITION USING MATLAB SIMULINK BLOCKSET
IMAGE RECOGNITION USING MATLAB SIMULINK BLOCKSETIJCSEA Journal
 
Object detection technique using bounding box algorithm for
Object detection technique using bounding box algorithm forObject detection technique using bounding box algorithm for
Object detection technique using bounding box algorithm forVESIT,Chembur,Mumbai
 
Self-Directing Text Detection and Removal from Images with Smoothing
Self-Directing Text Detection and Removal from Images with SmoothingSelf-Directing Text Detection and Removal from Images with Smoothing
Self-Directing Text Detection and Removal from Images with SmoothingPriyanka Wagh
 
A UTILIZATION OF CONVOLUTIONAL MATRIX METHODS ON SLICED HIPPOCAMPAL NEURON RE...
A UTILIZATION OF CONVOLUTIONAL MATRIX METHODS ON SLICED HIPPOCAMPAL NEURON RE...A UTILIZATION OF CONVOLUTIONAL MATRIX METHODS ON SLICED HIPPOCAMPAL NEURON RE...
A UTILIZATION OF CONVOLUTIONAL MATRIX METHODS ON SLICED HIPPOCAMPAL NEURON RE...ijscai
 
Ijarcet vol-2-issue-4-1383-1388
Ijarcet vol-2-issue-4-1383-1388Ijarcet vol-2-issue-4-1383-1388
Ijarcet vol-2-issue-4-1383-1388Editor IJARCET
 

What's hot (19)

Gesture recognition system
Gesture recognition systemGesture recognition system
Gesture recognition system
 
Object Detection and tracking in Video Sequences
Object Detection and tracking in Video SequencesObject Detection and tracking in Video Sequences
Object Detection and tracking in Video Sequences
 
J017426467
J017426467J017426467
J017426467
 
A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representations
 
A Literature Survey: Neural Networks for object detection
A Literature Survey: Neural Networks for object detectionA Literature Survey: Neural Networks for object detection
A Literature Survey: Neural Networks for object detection
 
IRJET - Hand Gesture Recognition to Perform System Operations
IRJET -  	  Hand Gesture Recognition to Perform System OperationsIRJET -  	  Hand Gesture Recognition to Perform System Operations
IRJET - Hand Gesture Recognition to Perform System Operations
 
Video object tracking with classification and recognition of objects
Video object tracking with classification and recognition of objectsVideo object tracking with classification and recognition of objects
Video object tracking with classification and recognition of objects
 
Kq3518291832
Kq3518291832Kq3518291832
Kq3518291832
 
Implementation of high performance feature extraction method using oriented f...
Implementation of high performance feature extraction method using oriented f...Implementation of high performance feature extraction method using oriented f...
Implementation of high performance feature extraction method using oriented f...
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Inpainting scheme for text in video a survey
Inpainting scheme for text in video   a surveyInpainting scheme for text in video   a survey
Inpainting scheme for text in video a survey
 
Research on object detection and recognition using machine learning algorithm...
Research on object detection and recognition using machine learning algorithm...Research on object detection and recognition using machine learning algorithm...
Research on object detection and recognition using machine learning algorithm...
 
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural Network
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural NetworkTargeted Visual Content Recognition Using Multi-Layer Perceptron Neural Network
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural Network
 
IMAGE RECOGNITION USING MATLAB SIMULINK BLOCKSET
IMAGE RECOGNITION USING MATLAB SIMULINK BLOCKSETIMAGE RECOGNITION USING MATLAB SIMULINK BLOCKSET
IMAGE RECOGNITION USING MATLAB SIMULINK BLOCKSET
 
Object detection technique using bounding box algorithm for
Object detection technique using bounding box algorithm forObject detection technique using bounding box algorithm for
Object detection technique using bounding box algorithm for
 
Self-Directing Text Detection and Removal from Images with Smoothing
Self-Directing Text Detection and Removal from Images with SmoothingSelf-Directing Text Detection and Removal from Images with Smoothing
Self-Directing Text Detection and Removal from Images with Smoothing
 
B280916
B280916B280916
B280916
 
A UTILIZATION OF CONVOLUTIONAL MATRIX METHODS ON SLICED HIPPOCAMPAL NEURON RE...
A UTILIZATION OF CONVOLUTIONAL MATRIX METHODS ON SLICED HIPPOCAMPAL NEURON RE...A UTILIZATION OF CONVOLUTIONAL MATRIX METHODS ON SLICED HIPPOCAMPAL NEURON RE...
A UTILIZATION OF CONVOLUTIONAL MATRIX METHODS ON SLICED HIPPOCAMPAL NEURON RE...
 
Ijarcet vol-2-issue-4-1383-1388
Ijarcet vol-2-issue-4-1383-1388Ijarcet vol-2-issue-4-1383-1388
Ijarcet vol-2-issue-4-1383-1388
 

Similar to MULTIPLE HUMAN TRACKING USING RETINANET FEATURES, SIAMESE NEURAL NETWORK, AND HUNGARIAN ALGORITHM

Long-Term Robust Tracking Whith on Failure Recovery
Long-Term Robust Tracking Whith on Failure RecoveryLong-Term Robust Tracking Whith on Failure Recovery
Long-Term Robust Tracking Whith on Failure RecoveryTELKOMNIKA JOURNAL
 
Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsIRJET Journal
 
IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
 IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2 IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2IRJET Journal
 
Implementation of Object Tracking for Real Time Video
Implementation of Object Tracking for Real Time VideoImplementation of Object Tracking for Real Time Video
Implementation of Object Tracking for Real Time VideoIDES Editor
 
Machine learning based augmented reality for improved learning application th...
Machine learning based augmented reality for improved learning application th...Machine learning based augmented reality for improved learning application th...
Machine learning based augmented reality for improved learning application th...IJECEIAES
 
Integrated Hidden Markov Model and Kalman Filter for Online Object Tracking
Integrated Hidden Markov Model and Kalman Filter for Online Object TrackingIntegrated Hidden Markov Model and Kalman Filter for Online Object Tracking
Integrated Hidden Markov Model and Kalman Filter for Online Object Trackingijsrd.com
 
IRJET - Human Eye Pupil Detection Technique using Center of Gravity Method
IRJET - Human Eye Pupil Detection Technique using Center of Gravity MethodIRJET - Human Eye Pupil Detection Technique using Center of Gravity Method
IRJET - Human Eye Pupil Detection Technique using Center of Gravity MethodIRJET Journal
 
A Pointing Gesture-based Signal to Text Communication System Using OpenCV in ...
A Pointing Gesture-based Signal to Text Communication System Using OpenCV in ...A Pointing Gesture-based Signal to Text Communication System Using OpenCV in ...
A Pointing Gesture-based Signal to Text Communication System Using OpenCV in ...IRJET Journal
 
Backbone search for object detection for applications in intrusion warning sy...
Backbone search for object detection for applications in intrusion warning sy...Backbone search for object detection for applications in intrusion warning sy...
Backbone search for object detection for applications in intrusion warning sy...IAESIJAI
 
ANIMAL SPECIES RECOGNITION SYSTEM USING DEEP LEARNING
ANIMAL SPECIES RECOGNITION SYSTEM USING DEEP LEARNINGANIMAL SPECIES RECOGNITION SYSTEM USING DEEP LEARNING
ANIMAL SPECIES RECOGNITION SYSTEM USING DEEP LEARNINGIRJET Journal
 
Image Recognition Expert System based on deep learning
Image Recognition Expert System based on deep learningImage Recognition Expert System based on deep learning
Image Recognition Expert System based on deep learningPRATHAMESH REGE
 
IRJET - Blind Guidance using Smart Cap
IRJET - Blind Guidance using Smart CapIRJET - Blind Guidance using Smart Cap
IRJET - Blind Guidance using Smart CapIRJET Journal
 
ArtificialIntelligenceInObjectDetection-Report.pdf
ArtificialIntelligenceInObjectDetection-Report.pdfArtificialIntelligenceInObjectDetection-Report.pdf
ArtificialIntelligenceInObjectDetection-Report.pdfAbishek86232
 
IRJET- Comparative Study of Different Techniques for Text as Well as Object D...
IRJET- Comparative Study of Different Techniques for Text as Well as Object D...IRJET- Comparative Study of Different Techniques for Text as Well as Object D...
IRJET- Comparative Study of Different Techniques for Text as Well as Object D...IRJET Journal
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdfmokamojah
 
Object and Currency Detection for the Visually Impaired
Object and Currency Detection for the Visually ImpairedObject and Currency Detection for the Visually Impaired
Object and Currency Detection for the Visually ImpairedIRJET Journal
 
Proposed Multi-object Tracking Algorithm Using Sobel Edge Detection operator
Proposed Multi-object Tracking Algorithm Using Sobel Edge Detection operatorProposed Multi-object Tracking Algorithm Using Sobel Edge Detection operator
Proposed Multi-object Tracking Algorithm Using Sobel Edge Detection operatorQUESTJOURNAL
 
An Analysis of Various Deep Learning Algorithms for Image Processing
An Analysis of Various Deep Learning Algorithms for Image ProcessingAn Analysis of Various Deep Learning Algorithms for Image Processing
An Analysis of Various Deep Learning Algorithms for Image Processingvivatechijri
 
Object Detection and Tracking AI Robot
Object Detection and Tracking AI RobotObject Detection and Tracking AI Robot
Object Detection and Tracking AI RobotIRJET Journal
 
IRJET - Direct Me-Nevigation for Blind People
IRJET -  	  Direct Me-Nevigation for Blind PeopleIRJET -  	  Direct Me-Nevigation for Blind People
IRJET - Direct Me-Nevigation for Blind PeopleIRJET Journal
 

Similar to MULTIPLE HUMAN TRACKING USING RETINANET FEATURES, SIAMESE NEURAL NETWORK, AND HUNGARIAN ALGORITHM (20)

Long-Term Robust Tracking Whith on Failure Recovery
Long-Term Robust Tracking Whith on Failure RecoveryLong-Term Robust Tracking Whith on Failure Recovery
Long-Term Robust Tracking Whith on Failure Recovery
 
Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather Conditions
 
IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
 IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2 IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
 
Implementation of Object Tracking for Real Time Video
Implementation of Object Tracking for Real Time VideoImplementation of Object Tracking for Real Time Video
Implementation of Object Tracking for Real Time Video
 
Machine learning based augmented reality for improved learning application th...
Machine learning based augmented reality for improved learning application th...Machine learning based augmented reality for improved learning application th...
Machine learning based augmented reality for improved learning application th...
 
Integrated Hidden Markov Model and Kalman Filter for Online Object Tracking
Integrated Hidden Markov Model and Kalman Filter for Online Object TrackingIntegrated Hidden Markov Model and Kalman Filter for Online Object Tracking
Integrated Hidden Markov Model and Kalman Filter for Online Object Tracking
 
IRJET - Human Eye Pupil Detection Technique using Center of Gravity Method
IRJET - Human Eye Pupil Detection Technique using Center of Gravity MethodIRJET - Human Eye Pupil Detection Technique using Center of Gravity Method
IRJET - Human Eye Pupil Detection Technique using Center of Gravity Method
 
A Pointing Gesture-based Signal to Text Communication System Using OpenCV in ...
A Pointing Gesture-based Signal to Text Communication System Using OpenCV in ...A Pointing Gesture-based Signal to Text Communication System Using OpenCV in ...
A Pointing Gesture-based Signal to Text Communication System Using OpenCV in ...
 
Backbone search for object detection for applications in intrusion warning sy...
Backbone search for object detection for applications in intrusion warning sy...Backbone search for object detection for applications in intrusion warning sy...
Backbone search for object detection for applications in intrusion warning sy...
 
ANIMAL SPECIES RECOGNITION SYSTEM USING DEEP LEARNING
ANIMAL SPECIES RECOGNITION SYSTEM USING DEEP LEARNINGANIMAL SPECIES RECOGNITION SYSTEM USING DEEP LEARNING
ANIMAL SPECIES RECOGNITION SYSTEM USING DEEP LEARNING
 
Image Recognition Expert System based on deep learning
Image Recognition Expert System based on deep learningImage Recognition Expert System based on deep learning
Image Recognition Expert System based on deep learning
 
IRJET - Blind Guidance using Smart Cap
IRJET - Blind Guidance using Smart CapIRJET - Blind Guidance using Smart Cap
IRJET - Blind Guidance using Smart Cap
 
ArtificialIntelligenceInObjectDetection-Report.pdf
ArtificialIntelligenceInObjectDetection-Report.pdfArtificialIntelligenceInObjectDetection-Report.pdf
ArtificialIntelligenceInObjectDetection-Report.pdf
 
IRJET- Comparative Study of Different Techniques for Text as Well as Object D...
IRJET- Comparative Study of Different Techniques for Text as Well as Object D...IRJET- Comparative Study of Different Techniques for Text as Well as Object D...
IRJET- Comparative Study of Different Techniques for Text as Well as Object D...
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
 
Object and Currency Detection for the Visually Impaired
Object and Currency Detection for the Visually ImpairedObject and Currency Detection for the Visually Impaired
Object and Currency Detection for the Visually Impaired
 
Proposed Multi-object Tracking Algorithm Using Sobel Edge Detection operator
Proposed Multi-object Tracking Algorithm Using Sobel Edge Detection operatorProposed Multi-object Tracking Algorithm Using Sobel Edge Detection operator
Proposed Multi-object Tracking Algorithm Using Sobel Edge Detection operator
 
An Analysis of Various Deep Learning Algorithms for Image Processing
An Analysis of Various Deep Learning Algorithms for Image ProcessingAn Analysis of Various Deep Learning Algorithms for Image Processing
An Analysis of Various Deep Learning Algorithms for Image Processing
 
Object Detection and Tracking AI Robot
Object Detection and Tracking AI RobotObject Detection and Tracking AI Robot
Object Detection and Tracking AI Robot
 
IRJET - Direct Me-Nevigation for Blind People
IRJET -  	  Direct Me-Nevigation for Blind PeopleIRJET -  	  Direct Me-Nevigation for Blind People
IRJET - Direct Me-Nevigation for Blind People
 

More from IAEME Publication

IAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdfIAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdfIAEME Publication
 
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...IAEME Publication
 
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSA STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSIAEME Publication
 
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSBROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSIAEME Publication
 
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONSDETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONSIAEME Publication
 
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONSANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONSIAEME Publication
 
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINOVOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINOIAEME Publication
 
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...IAEME Publication
 
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMYVISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMYIAEME Publication
 
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...IAEME Publication
 
GANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEGANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEIAEME Publication
 
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...IAEME Publication
 
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...IAEME Publication
 
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...IAEME Publication
 
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...IAEME Publication
 
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...IAEME Publication
 
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...IAEME Publication
 
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...IAEME Publication
 
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...IAEME Publication
 
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTA MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTIAEME Publication
 

More from IAEME Publication (20)

IAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdfIAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdf
 
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
 
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSA STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
 
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSBROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
 
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONSDETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
 
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONSANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
 
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINOVOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
 
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
 
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMYVISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
 
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
 
GANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEGANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICE
 
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
 
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
 
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
 
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
 
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
 
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
 
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
 
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
 
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTA MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
 

Recently uploaded

IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1
 

Recently uploaded (20)

IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptx
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 

MULTIPLE HUMAN TRACKING USING RETINANET FEATURES, SIAMESE NEURAL NETWORK, AND HUNGARIAN ALGORITHM

  • 1. http://www.iaeme.com/IJMET/index.asp 465 editor@iaeme.com International Journal of Mechanical Engineering and Technology (IJMET) Volume 10, Issue 05, May 2019, pp. 465-475, Article ID: IJMET_10_05_047 Available online at http://www.iaeme.com/ijmet/issues.asp?JType=IJMET&VType=10&IType=5 ISSN Print: 0976-6340 and ISSN Online: 0976-6359 © IAEME Publication MULTIPLE HUMAN TRACKING USING RETINANET FEATURES, SIAMESE NEURAL NETWORK, AND HUNGARIAN ALGORITHM Dina Chahyati, Aniati Murni Arymurthy Machine Learning and Computer Vision Laboratory Faculty of Computer Science, Universitas Indonesia, Depok, Indonesia ABSTRACT Multiple human tracking based on object detection has been a challenge due to its complexity. Errors in object detection would be propagated to tracking errors. In this paper, we propose a tracking method that minimizes the error produced by object detector. We use RetinaNet as object detector and Hungarian algorithm for tracking. The cost matrix for Hungarian algorithm is calculated using the RetinaNet features, bounding box center distances, and intersection of unions of bounding boxes. We interpolate the missing detections in the last step. The proposed method yield 43.2 MOTA for MOT16 benchmark. Key words: RetinaNet, tracking by detection, Hungarian algorithm, Siamese neural network, interpolation Cite this Article: Dina Chahyati, Aniati Murni Arymurthy, Multiple Human Tracking Using Retinanet Features, Siamese Neural Network, and Hungarian Algorithm, International Journal of Mechanical Engineering and Technology 10(5), 2019, pp. 465-475. http://www.iaeme.com/IJMET/issues.asp?JType=IJMET&VType=10&IType=5 1. INTRODUCTION Multiple object tracking has been a challenge for researchers over a decade. Vehicle and human tracking have been dominating this field since they are very important for surveillance system. Human tracking has its own challenges because of a wide variety of human appearance and severe occlusions in most of the scenes. In the beginning, researchers focused on tracking by trying to predict the path. Methods such as Kalman filter and optical flow [1][2][3] were commonly used in this approach. Nowadays many researchers change the approach to tracking by detection [4]–[6]. Detections by using HOG or its variation such as DPM are quite popular and has been used in multiple object tracking (MOT) benchmark [7]. However, DPM has limitation for recognizing more complex objects and its properties such as variation of clothes, bags, activities, gender, etc. Our future research topic is to track movements of people with a specific gender, therefore we cannot rely on DPM for the detection step. A more reasonable option would be to use deep learning approach for the detection step because they can be trained to detect more specific categories by using transfer learning.
  • 2. Dina Chahyati, Aniati Murni Arymurthy http://www.iaeme.com/IJMET/index.asp 466 editor@iaeme.com There are many deep learning architectures for image detection such as YOLO, SSD, RetinaNet [8], etc. In our research, we use RetinaNet for image detection. RetinaNet is a one- stage object detector that performs better than state of the art two-stage methods when it was released in late 2017 outperforming YOLOv2 and SSD513. RetinaNet may use VGG, DenseNet, Mobilenet,,or ResNet as its backbone. Unfortunately, being the state of the art does not mean perfect. There are still limitations in the object detection result that makes it difficult to use for object tracking, as it would propagate errors. There are at least two errors that an object detector commonly suffer from: false positive (FP) and false negative (FN) detection. FP may appear in three forms as shown in Fig.1, one is overdetection (two persons detected as three), second is under detection (two persons detected as one), and the third is inconsistent size. FN means that the detector cannot find a person in a certain location while there should be one. There are two categories in false negatives: when a person is detected in some frames but not in between frames, or when a person is totally undetected all over the scene. (a) (b) (c) Figure 1 Problems in detections: (a) overdetection, (b) under detection, (c) inconsistent size We should be able to minimize these errors if we retrain or fine tune the model. However, deep learning models need high-performance computing resources or at least GPU for the training process. In this paper, we propose a way of handling the detection errors in the tracking process in a limited computing environment, where GPU is not. Nevertheless, even after fine-tuning, every deep-learning based detection method would still yield errors in the detection. In this paper, we want to find a way of optimizing the detection result for tracking. We use RetinaNet for object detection with ResNet50 [9] as the backbone. We then use the combination of bounding box center, intersection of union, and feature similarity from Siamese network as input for Hungarian Algorithm as object association technique. 2. RELATED WORK There are major components that we used in our research: object detector (RetinaNet), method to associate objects between frames (Hungarian algorithm), method to evaluate the similarty between two feature vector (Siamese Network), and evaluation of tracking result. We will explain each of those concepts briefly. 2.1. RETINA NET We use RetinaNet since it is considered as one of the most cited object detector that is available for public use. RetinaNet claimed as the first one-stage object detector that matches the state-of-the-art COCO AP of more complex two-stage detector such as Feature Pyramid Network (FPN) or variants of Faster R-CNN [10]. Two-stage detector means that in the first stage the system generates sparse candidate set of object locations or bounding boxes, and in
  • 3. Multiple Human Tracking Using Retinanet Features, Siamese Neural Network, and Hungarian Algorithm http://www.iaeme.com/IJMET/index.asp 467 editor@iaeme.com the second stage the system classifies the object into several classes. One-stage detector does the classification over dense object detectors that usually performs faster but less accurate than two-stage detector. RetinaNet is a single, unified network that consists of a backbone network (compute convolutional feature map of input image) and two-task specific subnetworks (object classification and bounding box regression). The key innovations of RetinaNet is the use of a new loss function called Focal loss to improve the accuracy of the detector while still maintaining its detection speed. In our research, we use ResNet as backbone, as suggested by the RetinaNet paper [8]. It consists of total 216 layers. The feature for each object detected that we used in this paper is a 256-vector extracted from 2D convolutional layers P3, P4, P5, P6, and P7. P3 to P5 are computed from the output of the corresponding ResNet residual stage (C3 to C5). As an illustration, suppose we have a typical 1920x1080 MOT image. The image is then resized such that its longer side is 1333 pixel. The resized image is the divided into anchor boxes with various sizes. In our case, the image is resized into 1333 x 750 pixel and the output feature maps are shown in Table 1. Table 1 RetinaNet Feature Map Size Layer Size of Feature Map Number of Feature Vector Anchor Size P3 167 x 94 x 256 15698 8 x 8 P4 84 x 47 x 256 3948 16 x 16 P5 42 x 24 x 256 1008 32 x 32 P6 21 x 12 x 256 252 64 x 64 P7 11 x 6 x 256 66 128 x 128 Total 20972 Each of those 20972 feature vector sized 256 are then associated with three scale (20 , 21/3 , 22/3 ) and three aspect ratio (1:2, 1:1, 2:1) of the original size. All of the output are then concatenated as input for classification and regression layers in the RetinaNet. The regression layer is responsible for predicting the appropriate bounding boxes for each feature maps under 9 corresponding scales and ratios. Thus, for our image we have 20972 x 9 = 188748 candidate bounding boxes. 2.2. Hungarian Algorithm Suppose we input two consecutive frames, t and t + 1 to RetinaNet and find detected persons in each frame. Since each frame consists of many persons, we need to associate or assign each person in frame t and t + 1 in order to track them. One of the most known and simplest methods of assignment problem is the Hungarian algorithm. Given a cost matrix, Hungarian algorithm will match each elements of row and columns in order to find the minimum cost, as illustrated in Fig.2. Person A will be associated with person D, B with E, dan C with F because this combination yield the minimum total cost. If we want to use Hungarian algorithm for tracking, we have to provide it with appropriate cost matrix between each persons in the frames. Other than the position of the bounding box, we want to take into account the convolutional feature of each object because it should represents the visual appearance. Suppose we have a pair of feature vector representing two detected person in different frames, then we must find a metric that evaluate how similar those vectors are.
  • 4. Dina Chahyati, Aniati Murni Arymurthy http://www.iaeme.com/IJMET/index.asp 468 editor@iaeme.com Figure 2 Example of Hungarian algoritm applied for tracking between frames The intuition used is that similar objects should have considerably similar feature vector. This is true for some of the objects but untrue for other ones. We have tried using simple Euclidean distance, but the result shows that some objects have very far feature difference even though their appearance look similar, as shown in Fig. 3. Frame 1 Frame 2 Figure 3 Euclidean distance of feature vectors are not always consistent with their visual appearance Table 2 Frame 1 Frame 2 Bounding box distance Feature vector distance 7001 7001 1.00 10.71 133151 133151 3.00 20.30 147751 149263 1.00 45.67 151549 151549 4.12 17.96 151828 151828 3.16 16.47 152520 152520 5.10 7.37 160127 160127 1.00 9.23 163763 163763 1.41
  • 5. Multiple Human Tracking Using Retinanet Features, Siamese Neural Network, and Hungarian Algorithm http://www.iaeme.com/IJMET/index.asp 469 editor@iaeme.com 2.3. Siamese Neural Network The unexpected far distances between similar-looking objects in consecutive frames may be caused by some noises in the images that is invisible to our eyes. Therefore we need more robust way to define similarity between two RetinaNet vector features. We uses Siamese Network introduced by [11]. In their paper, the writer noted that Siamese network is good for classifying objects that has only few instances per class but the number of classes are many. Siamese Neural Network (SNN), as the name explains, consists of twin functions GW which share the same set of parameters W, and a cost module that generates distance ‖ ( ) ( )‖. The input of SNN is a pair of images (x1, x2) and a label Y. The label Y may be Y = 0 (similar) or Y = 1 (dissimilar). The output of Siamese network is a similarity number, e.g. if it is less than 0.5 then it is considered similar. 2.4. Evaluation Method One of the most popular tracking evaluation is MOT evaluation metric [7]. We use 6 tracking metrics of MOT:  Recall: percentage of correctly detected targets, compared to the ground truth  MT: mostly tracked trajectories, more than 80% of ground truth trajectory length is tracked  FP: number of false positives  FN: number of missed targets  IDs: number of ID switches. ID switch happens when a person is detected as different person due to missed association or is it was occluded by other objects  MOTA: multi-object tracking accuracy in [0,100], MOTA = 1 – error, where error is defined as (FN+FP+IDs) divided by the ground truth 3. PROPOSED METHOD In this section, we discuss more details about the method proposed and the dataset that we used. The flowchart of our proposed method for the first three frames is shown in Fig. 4. The process of Frame 4 and the rest are the same with Frame 3. Each frame goes to RetinaNet to get the bounding box and convolutional (CNN) features. Figure 4 Flowchart of our method
  • 6. Dina Chahyati, Aniati Murni Arymurthy http://www.iaeme.com/IJMET/index.asp 470 editor@iaeme.com The original RetinaNet code only outputs the best 300 filtered bounding boxes along with the information of what object might be at the bounding box (person, car, etc). We use resnet5_coco_best_v2.1.0 model in the RetinaNet that classifies objects into 80 classes, and take only the “person” class. In order to get the Conv2D feature, we need to extract the contents of RetinaNet intermediate layers that stores the complete 188748 candidate bounding boxes (the clip_boxes layer) and find the corresponding Conv2D feature vector. These feature vector may come from layer P3, P4, P5, P6, or P7. In our figures, the number next to each bounding box is the number that identifies which box is it from 188748 candidate. Suppose the box number is 150019, then it came from layer P4, because 150019/9 is within the range of P4. After collecting RetinaNet features for every detected person in all frame, we need to associate same persons in each consecutive frame using Hungarian algorithm. Two key components of Hungarian algorithm is to define the sets of task (row) and assignee (column) to be associated and the cost function between each pair of task-assignee. The straightforward way is to consider persons detected in frame t as task and persons detected in frame t + 1 as assignee, but since we have occlusion problems, we cannot do this. Instead of pairing the persons in consecutive frames, we try to pair the persons in frame t + 1 to the set of persons found in frame 1 to frame t. This would solve the problem when a person is undetected in intermediate frames due to occlusion or missed detection. In Fig. 5, this process is shown in the processing of Frame 3, where the input of SNN in this step comes from RetinaNet output of Frame 3 as assignee and the list of of all tracked person (not only the output of frame t – 1) as tasks. Another important components of Hungarian algorithm is how to define the cost between any pair of detected persons in different frames. In our study we define the cost as combination of three metrics: the distance of bounding box centers, intersection of union (IoU) of the bounding box, and the output of Siamese network. The distance of bounding box centers is used for simplicity, because in most cases, the distance of the same person in consecutive frames are quite near compare to others. The intersection of union between two candidate pairs is useful when the bounding box size of the same person are not the same in consecutive frames, as shown in Fig.1c. We can see that the height of the same person are detected quite differently. If we only use the distance of bounding box centers, they will be considered different person because the distance is too far. By using intersection of union between the two bounding boxes, we can consider these detected person as the same person. The SNN that we use to calculate feature similarity consists of three 128-node dense layer with Relu activation, alternated with two 10% drop-out layers. We trained the SNN with 300 first frames of each training scenes of MOT16. We combine all the data from all scenes to build a more general SNN that can be used throughout all test scenes of MOT16. IoU is defined as the ratio between intersection and union of each pair of bounding box, so the scale is [0,1]. The output of SNN is a number such that the two input images is considered similar if the output is less than 0.5. To match up with the previous two metrics, the distance of bounding box centers need to be normalized. We normalize it by dividing with a number such that it is considered close if the distance is less than 0.5. In our case, we get number 60 as normalizer that comes from the observation of scene 04 MOT16 dataset. Original Hungarian algorithm accepts square cost matrix where the rows and columns indicates task and assignee. Further modification allows non-square matrix as input, where the number of assignment will follow the minimum of row or column size. In our case, this will be a problem when we have a new person in current frame, since he will be forced to be matched to existing persons found so far. That is why we need to apply certain threshold to
  • 7. Multiple Human Tracking Using Retinanet Features, Siamese Neural Network, and Hungarian Algorithm http://www.iaeme.com/IJMET/index.asp 471 editor@iaeme.com the cost matrix such that if a detected person in current frame has far distances with our existing list of persons, then he will be considered new person, and he will be omitted from the input cost matrix of the Hungarian algorithm. In our experiment we use the threshold such that if the sum of all three metrics is greater than 1.8, it will be considered new person. After running the process for all the frames in the scene, we do other adjustments. The first is to delete all tracks whose length is less than five. This step is important for deleting tracks caused by false positives (overdetection as shown in Fig. 1). We chose five as threshold because we notice that in MOT16 train dataset, overdetection of RetinaNet does not appear in a long range. Other additional adjustment is to apply interpolation to the resulting track. Some objects may be miss-detected in certain frame but detected before or after. We tried to add the bounding box of those missed object to decrease the number of false negatives. 4. RESULT AND DISCUSSION We used MOT16 dataset as our benchmark. We run the experiments on computer with specification of Intel Core i7-7700 CPU @3.69 GHz, 8 GB RAM. Since we only use RetinaNet for testing and not for training nor fine tuning, we did not use the GPU. SNN training is also done in CPU mode. The result of MOT16 test scenes are shown in Table II. The detection recall is 49.4, means that less than half of total person are detected. Table 2 Tracking Result of Proposed Method Scene Recall MT FP FN IDs MOTA 01 44.0 26.1 % 140 3579 38 41.3 03 51.1 16.9 % 5271 51151 495 45.6 06 69.2 30.8 % 1644 3554 281 52.5 07 49.4 14.8 % 731 8267 138 44.0 08 39.6 20.6 % 937 10108 134 33.2 12 55.3 23.2 % 381 3707 47 50.2 14 36.1 6.1% 717 11805 359 30.3 Overall 49.4 19.8% 9821 92171 1492 43.2 Compared to other method in MOT16 benchmark, we have not beat state of the art tracker HCC [12] with overall MOTA 49.3 for public detector category. This is because we did not do any training nor fine-tuning for the detector due to computing environment limitation. We use existing publicly available model for object detection and use methods that does not require high computational cost. Table 3 Comparison with Other Methods Method MT FP FN IDs MOTA HISP_T [13] 7.8% 6412 107918 2594 35.9 AM_ADM [14] 7.1% 8503 99891 789 40.1 Ours 19.8% 9821 92171 1492 43.2 LMP [6] 18.2% 6654 86245 481 48.8 HCC [12] 17.8% 5333 86795 391 49.3 We notice that the reason of our low MOTA is caused by the high number of FP, FN and IDs. Since scene 3 contributes the highest errors, and the corresponding training scene for that scene is scene 4, we captured examples of FP and FN in scene 4 in Fig. 5. The blue boxes are the output of RetinaNet. The red boxes surrounding blue are the detection results that are considered as FP. The green boxes are the FNs, those which are supposed to be there but
  • 8. Dina Chahyati, Aniati Murni Arymurthy http://www.iaeme.com/IJMET/index.asp 472 editor@iaeme.com failed to be detected by RetinaNet. As we can see, RetinaNet failed to detect person if they are occluded. Persons in the bottom frame like A and B are considered both FP and FN because the real bounding box in ground truth considers the whole body, while the detection bounding box only counts what is seen in the image, so the height of the detected and ground truth object is different. This also cause the FP person C. RetinaNet only sees the upper body part while the ground truth consider the whole body. Persons in D are also not detected because only small part of legs are shown. In E, two persons are considered one, because of the severe occlusion. The high number of ID switch also remains a problem. We explore the reason and found that one of the reason is because of the high number of FN and FP. As we can see in Fig. 6, one of the detection in Frame 15 is considered FP (red box) because it failed to satisfy the IoU with ground truth threshold. Detection (blue) 150019 is considered to be the lower person in Frame 15, and detection 151531 is considered FP. In this case, the evaluation program says that there is only one person in Frame 15 and he is associated as the lower person. However, the two persons in Frame 16 are detected correctly, since they are not considered FN nor FP. In this case, in Frame 16, detection 150019 is considered as the upper person. This mistake is propagated in the tracking process, as shown in Fig. 7. Our method actually tracked correctly the upper (pale yellow track) and lower (pale green track) person, but it was considered as ID switch by the evaluator program. As we can see the track was correct when it is extended to frame 28. Therefore, even though the ID switch number is high for our method, it does not always represents a mistake. Figure 5 Examples of FP dan FN in scene 4 frame 50 Figure 6 Example of detection (blue), ground truth (yellow), FP (red), FN (green) bounding boxes
  • 9. Multiple Human Tracking Using Retinanet Features, Siamese Neural Network, and Hungarian Algorithm http://www.iaeme.com/IJMET/index.asp 473 editor@iaeme.com Figure 7 Example of cases that is considered as ID switch (cyan box) while actually it is not As we have mentioned before in the introduction section, there are five problems that object detector commonly suffers. The discussion of whether our tracking method has overcome these problems is as follows. Overdetection Overdetection is the situation when two persons are detected as three or more. Fortunately, RetinaNet does not over detect consistently in consecutive frames. For example, in scene 3, overdetection of two persons in Fig. 8 occur in frame 7, 8, 9, 33. We tried to overcome this problem by putting a threshold of track length. If a track length is less than 5, then it is deleted. Under detection Under detection is when two person is detected as one. In our approach, the detection is assigned to one person, whose similarity is closer. Inconsistent size Since we use the information of IoU, even though the size is not consistent, our method were still able to track correctly. Undetected in some frames Since we keep the list of all person detected from frame 1 to frame N and compare the detection of frame N + 1 with that list, person who were undetected in some frame were still recognized after skipping some frames. Interpolation in the last step also helps to recover the missing detections. If we compare Table 2 (with interpolation) and Table 4 (without interpolation), we can see that interpolation increase the MOTA from 41.4 to 43.2. Even though it caused unexpected FP by around 2000, but it also decrease the FN by around 6000 as expected. Undetected in all frames As for this problem, there is no way to solve this problem unless we improve the accuracy of the detector. We should fine tune RetinaNet with new examples specific to the dataset in order to improve the accuracy of the detector. Fine tuning the detector (RetinaNet) is out of the scope of this paper. Table 4 Tracking Result of Proposed Method Without Interpolation Scene Recall MT FP FN IDs MOTA 01 40.9 17.4% 95 3779 43 38.7 03 46.9 10.8% 4193 55516 578 42.3 06 66.4 26.2% 992 3878 296 55.6 07 47.0 9.3% 462 8646 145 43.3 08 38.6 19.0% 725 10283 143 33.4 12 53.5 20.9% 259 3859 59 49.6 14 32.7 4.9% 357 12444 345 28.9 Overall 46.0 15.9% 7083 98405 1609 41.3
  • 10. Dina Chahyati, Aniati Murni Arymurthy http://www.iaeme.com/IJMET/index.asp 474 editor@iaeme.com The example of each problems is shown in Fig. 8. The white boxes represents the detections that is tracked. The blue box shows the detections that is not included in the final track because it was suspected to be FP (track length less than 5). The orange boxes are the interpolated boxes. RetinaNet originally does not detect these bounding boxes, but they were obtained by interpolation of bounding boxes in the last step. The drawbacks of our method is the great dependency with the threshold chosen, which is 1.8 in our experiment. If the sum of bounding box center distance, the ratio of IoU and the feature similarity is greater than 1.8, we consider it a new person that would yield new track. The moving speed of person or camera would greatly affect the bounding box center distance, and thus the threshold should be adjusted. This will be the focus of our next research. Figure 8 Example of tracking result 6. RESULT AND DISCUSSION Publicly available RetinaNet model without fine tuning are able to detect persons in MOT16 dataset with about 46% recall. This means that there are still problems with the detections results. Our proposed method shows that without the need of high performance computing environment, we are able to solve some problems in detections for tracking. Even though the MOTA out our method has not beat the state the art benchmark for MOT16, we hope that our result can be a baseline for other methods with similar circumstances, i.e. making use of convolutional features from deep learning architecture object detectors, but working in non- GPU computing environment. REFERENCES [1] X. Li, K. Wang, W. Wang, and Y. Li, “A multiple object tracking method using Kalman filter,” 2010 IEEE Int. Conf. Inf. Autom. ICIA 2010, vol. 1, no. 1, pp. 1862–1866, 2010. [2] S. Shantaiya, K. Verma, and K. Mehta, “Multiple Object Tracking using Kalman Filter and Optical Flow,” Eur. J. Adv. Eng. Technol., vol. 2, no. 2, pp. 34–39, 2015. [3] A. Kulkarni and E. Rani, “KALMAN Filter Based Multiple Object Tracking System,” Int. J. Electron. Commun. Instrum. Eng. Res. Dev., vol. 8, no. 2, pp. 1–6, 2018. [4] L. Leal-Taix, C. Canton-Ferrer, and K. Schindler, “Learning by tracking : Siamese CNN for robust target association,” in CVPR, 2016, pp. 33–40. [5] K. Zhang, Q. Liu, Y. Wu, and M. Yang, “Robust Visual Tracking via Convolutional Networks Without Training,” IEEE Trans. IMAGE Process., vol. 25, no. 4, pp. 1779– 1792, 2016. [6] S. Tang, M. Andriluka, B. Andres, and B. Schiele, “Multiple People Tracking by Lifted Multicut and Person Re-identification,” in CVPR, 2017. [7] A. Milan, L. Leal-Taixe, I. Reid, S. Roth, and K. Schindler, “MOT16: A Benchmark for Multi-Object Tracking,” pp. 1–12, 2016.
  • 11. Multiple Human Tracking Using Retinanet Features, Siamese Neural Network, and Hungarian Algorithm http://www.iaeme.com/IJMET/index.asp 475 editor@iaeme.com [8] T. Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal Loss for Dense Object Detection,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2017-Octob, pp. 2999–3007, 2017. [9] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in CVPR, 2016, pp. 770–778. [10] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN : Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1–14, 2017. [11] R. Hadsell, S. Chopra, and Y. LeCun, “Dimensionality Reduction by Learning an Invariant Mapping,” in CVPR, 2006. [12] L. Ma and S. Tang, “Customized Multi-Person Tracker,” in ACCV, 2018, pp. 1–16. [13] N. L. Baisa, “Online Multi-target Visual Tracking using a HISP Filter,” in International Conference on Computer Vision Theory and Applications, 2018, no. March, pp. 429–438. [14] S. H. Lee, M. Y. Kim, and S. H. Bae, “Learning discriminative appearance models for online multi-object tracking with appearance discriminability measures,” IEEE Access, vol. 6, pp. 67316–67328, 2018.