SlideShare a Scribd company logo
1 of 6
Download to read offline
Human Detection and Tracking Using Apparent Features under
Multi-cameras with Non-overlapping
Lu Tian, Shengjin Wang, Xiaoqing Ding
Tsinghua University, Department of Electronic Engineering, Beijing, China
tianlu@ocrserv.ee.tsinghua.edu.cn
Abstract
This paper describes a human detection and track-
ing system under multi-cameras with non-overlapping
views using apparent features only. Our system is able
to first detect people and then perform object matching.
In the distributed intelligent surveillance system, com-
puters need to detect pedestrians automatically under
multi-cameras probably with non-overlapping views for
providing a steady and continuous tracking of the pedes-
trian targets. In this paper, we combine Histograms of
Oriented Gradients (HOG) and Local Binary Pattern
(LBP) to detect human and segment human body from
the background using GrabCut algorithm. We also study
the method of pedestrian feature extraction and object
matching based on appearance. We connect all the mod-
ules above in series to obtain a complete system and test
it on samples we collect over three cameras with non-
overlapping views to prove the effectiveness. We believe
that our system will be helpful to the development of the
public security system.
1. Introduction
Nowadays cameras are all around us and they are
everywhere in the buildings and on the streets. How to
use them to help us with the development of the public
security system is very meaningful. We expect comput-
ers to detect and track pedestrians automatically under
multi-cameras with non-overlapping views. This task is
complex and full of challenges because human detec-
tion, segmentation, feature extraction and object match-
ing are all needed. Not only people have variable ap-
pearance and wide range of poses, but also the scale,
visual angle and illumination change a lot under multi-
cameras.
In the research of object detection and tracking, the
difficult problems can be summarized in the following
points: (a) Changes of visual angle: The imaging of
object in the image plane changes with the change of
projection transformation matrix which is caused by the
variable angles between objects in real scene and the
camera optical axis. (b) Scale changes: The size of an
object varies when it moves through different cameras.
(c) Illumination: Images captured by cameras are relate
to illumination direction, intensity and target surface re-
flectivity in real scene and the relevance is different to
describe and model. (d) The deformation of objects: In
the tracking process the target we are interested in is al-
ways moving such as pedestrian walking, running and
jumping, thus its shape will change over time. All the
above factors will lead to time-varying observation of
the target we detect and track, especially when the tar-
get goes through multi-cameras with non-overlapping.
Our system is composed of four parts including hu-
man detection, pedestrian segmentation, apparent fea-
tures extraction and object matching. An overview of
our system is shown in Figure 1.
First, we study pedestrian detection in the video
[1,2,3,5,6]. By combining Histograms of Oriented Gra-
dients (HOG) and Local Binary Pattern (LBP) as the fea-
ture set, we provide a cascade human detector trained by
AdaBoost algorithm. The results show that the pedes-
trian detector based on HOG features has a high hit rate.
By combining HOG and LBP we can reduce the false
alarm rate while keeping the high hit rate.
Second, we use the output of human detector that
contains people as the input of object segmentation in
order to filter out the background information. We use
GrabCut algorithm to segment objects [7,8]. We create a
prior mask to initialize the segment. Experiments show
that when the edge of the input rectangle includes rich
background information we can get good results of the
object segmentation.
Third, we extract the apparent features of the fore-
ground obtained by the object segmentation. The fea-
tures we extract include color histogram in different
978-1-4673-0174-9/12/$31.00 c
⃝2012 IEEE 1082 ICALIP2012
(a) the system is composed of four parts
(b) the actual implementation steps of the system
Figure 1. An overview of our system.
color space and LBP descriptor histogram to describe
both color and texture characteristics of the pedestrians.
Then we do pedestrian matching to achieve the pur-
pose of tracking using apparent features we extract.
When matching people we calculate the correlation co-
efficients between two histogram descriptors and match
the targets by threshold control and selecting the maxi-
mum correlation coefficient.
At last, all the modules above are connected in series
to obtain a complete system to detect and track human
over different non-overlapping camera views. We also
test the system on the samples we collect over three cam-
eras with non-overlapping and prove the effectiveness of
the system.
2. Human Detection
Dalal and Triggs proposed using grids of HOG de-
scriptors for human detection and obtained good results
[1]. In recent years, HOG descriptors are widely used
and proved to be effective [4]. HOG features focus on
the edge orientations. However this may lose some other
useful information such as texture features. So we de-
termine to combine HOG and LBP to describe both the
edge and texture information of human in videos so that
the appearance can be better captured. Some works have
been done to detect human by combining HOG and LBP
[2]. It is difficult to balance the weights of different fea-
tures when putting HOG feature and LBP feature in an
augmented vector to train the detector. So in this paper
Figure 2. The flow chart of human detec-
tion in our system. Our detector is a cas-
cade of HOG classifier and LBP classifier.
we train HOG classifier and LBP classifier separately
and cascade them to get our final detector.
We extract HOG features by following the procedure
in [1]. 2×2 cell blocks and 9 orientation bins are chosen
then we obtain 36 dimensional HOG descriptors.
To capture texture features we extract LBP-based
region descriptor. In this paper a LBP histogram de-
scriptor similar to S-LBP (Semantic-LBP) [5] is used
but we make the calculation easier. We use the nota-
tion LBPP,R in [4], where P is the number of sampling
points and R is the radius of the circle to be sampled,
and LBPP,R for pixel(x, y) is:
LBPP,R = [bP −1, ..., b1, b0], bi ∈ 0, 1 (1)
The LBP8,1 operator labels the pixels of an image by
thresholding the 3×3-neighbourhood of each pixel with
the center value and considering the result as a binary
1083
Figure 3. Examples of false alarms pass-
ing HOG classifier then removed by LBP
classifier .
number. LBP is often used after decimal coding:
DecimalCode{LBPP,R} =
P −1
∑
i=1
bi2i
(2)
so LBPP,R has 2P
different values. In order to present
LBP features of a region more simply, instead of decimal
coding we put a LBP operator as circular data and calcu-
late the histogram of the region based on the following
interpretation: several continuous ”1” in an operator can
be compactly represented with their length and the point
they start. We calculate the number of occurrences of
different continuous ”1” and organize the statistics in a
histogram to express texture features. For example, op-
erator ”10011011” has two continuous ”1”, one starts at
the fourth number and its length is 2, while the other
starts at the seventh number and its length is 3. To an
8-bit binary operator there are 7×8+2 different kinds of
continuous ”1” including all the eight binary numbers
are ”0” or ”1”. So the histogram of LBP8,1 has 58 bins
and finally we can obtain 58 dimensional LBP descrip-
tors.
After we get the histogram descriptors it is easy to
calculate the integral image of the features. We train
our human detector using Fisher’s linear discriminant as
weak classifier and using AdaBoost algorithm to gener-
ate strong cascade classifier. After training HOG clas-
sifier and LBP classifier separately, we cascade them to
get our final detector. We use a sliding window of vari-
able size to detect human in videos and the process in
detail is shown in Figure 2. The experiments tested on
our video samples show that the feature fusion detector
reduces about fifty percent of false positive detections
with a slight decline in hit rate compared to using HOG
detector only. Some examples are presented in Figure 3.
3. Pedestrian Segmentation
After human detection we get some rectangles con-
tain pedestrians in them. We want to cut the pedestrians
Figure 4. The Left is the mask we use to
initialize and right is a simple rectangle
initialization. Red represents foreground,
black represents background, and the blue
part is unknown region.
from images for extracting features only on the body we
are interested in and filtering out the background infor-
mation.
In our system we use GrabCut algorithm which is
introduced in [7] to segment pedestrians. GrabCut is
a kind of interactive foreground extraction using iter-
ated graph cuts by modeling Gaussian Mixture Model
(GMM) in RGB color space to analyze the input image
and separate foreground people from the background.
Object segmentation with complex background is
difficult to be accurate since computers do not know
which part in an image is foreground or background.
Good segmentation algorithms tend to use the full su-
pervision or semi-supervision [5]. For example, when
we use Photoshop to segment an object we need to man-
ually draw the contour to get better result. Thus initial-
ization becomes quite important to segment object auto-
matically. By knowing the middle part of an input rect-
angle is a pedestrian whose contour looks like an oval
and the edge part is background, we initialize the input
rectangles with a prier mask in figure 4 and it turns out
to perform better than initializing the input with a simple
rectangle. Figure 5 shows that segmentation initialized
with our mask is good to retain the whole body.
We can take the advantage of segmentation results
to remove some of the false positive detections by ana-
lyzing the size and the shape of foreground. If the num-
ber of foreground pixels is less than a threshold or the
proportion of the object’s contour is unreasonable, this
detection result must be a false alarm.
4. Feature Extraction
As we want to track human under multi-cameras
with non-overlapping, the movement information of a
person is no longer continuous, hence we want to ex-
tract apparent features to adapt to the changes of visual
1084
(a) (b) (c) (d)
Figure 5. Pedestrian segmentation. (a) In-
put images of segmentation. (b) GMM im-
ages. (c) Results of rectangle initialization.
(d) Results of mask initialization.
angle, scale, illumination, etc. Consequently we con-
centrate on the color and texture features of the body.
Color histogram is a kind of color characteristics
widely used in image or video processing. It describes
the proportion of different colors in an image and do not
care about the spatial location of each color, so it can
adapt to object deformation and scale changes to a cer-
tain extent. Color histograms calculate statistical color
information of image data and organize the statistics to
a series of bins predefined to present color distributions.
Color histogram can be calculated in different color
space like RGB and HSV. We experiment the extraction
of color histogram both in RGB space and HSV space
for tracking. It turns out that the extraction in HSV per-
Figure 6. Examples of feature extraction.
The first column are the body parts we
segment and the second column are ap-
parent feature histograms extracted. The
left part of a histogram represent color fea-
tures and the right part in blue represent
texture features.
forms better because HSV shows a more direct descrip-
tion of color compared to RGB. As illumination has a
great influence on V-channel (value), we only calculate
the histogram of H-channel (hue) and S-channel (satu-
ration) to adapt to variable illumination. We equally di-
vide H-channel into 16 bins and S-channel into 8 bins
so that we obtain 128 dimensional histogram to describe
the color characteristics.
As for texture features we still use the 58 dimen-
sional LBP histogram descriptor introduced in section
2.
In order to only get the information of the body we
are interested in, we just extract the color histogram and
LBP histogram of the foreground region after pedestrian
segmentation. We combine color and texture features
by connecting the two histograms together after normal-
ized individually. So finally we obtain 186 dimensional
feature vectors. Part based features seem to be more ac-
curate to describe a person [9], but they do not perform
better than features extracted from the whole body in
1085
Figure 7. Correlation coefficients between
objects under multi-cameras. The top row
shows some objects after segmentation.
Correlation coefficients between them are
listed in table below. Each triangular gray
area represents a same person under dif-
ferent cameras.
tracking under multi-cameras. Part based features are
not stable or robust because of the deformation of ob-
ject under multi-cameras and our experiments also prove
this. Therefore we use the apparent features of the whole
foreground region to describe a person, and some results
are shown in Figure 6.
5. Object Matching
We track human under multi-cameras only using ap-
parent features we extract without the help of discontin-
uous motion information under multi-cameras. With the
color and texture histogram features we extracted in sec-
tion 4 we do object matching by calculating correlation
coefficients between every two histograms and selecting
the maximum. Two histograms that have the maximum
correlation coefficient are determined to be the same per-
son as well as the maximum correlation coefficient ex-
ceeds the matching threshold predetermined. The corre-
lation coefficient between histogram H1, H2 is calculate
as following formulas and some examples are shown in
Figure 7. From the figure we see that with the changes
of illumination and variable postures the correlation co-
efficients between the same person in different images
are still greater than others.
H′
k(i) = Hk(i) − (1/N)(
∑
j
Hk(j)) (3)
dcorrel(H1, H2) =
∑
iH′
1(i)H′
2(i)
√∑
iH′2
1 (i)H′2
2 (i)
(4)
Once appear a new person, his histogram feature will
be recorded in a database. As we track pedestrians in
different videos with non-overlapping, we need to up-
date the information we record since the feature of the
same person is always changing. We set two thresh-
olds to control matching and updating respectively. The
lower threshold called matching threshold is to deter-
mine if two histograms match and describe the same per-
son. The higher threshold is called updating threshold
because we can certainly identify that two histograms
are extracted from the same person if their correlation
exceeds this threshold and we can update the informa-
tion recorded of the person using average value of the
two histograms. The information update improves the
matching effectively and especially does well in stable
tracking pedestrians under the same camera since it can
catch the slow changes of pedestrian features.
(a) top view of sample collection place.
(b) channel A (c) channel B (d) channel C.
Figure 8. Sample collection under three
cameras with non-overlapping. (a) shows
positional relationship of multi cameras.
(b)(c)(d) are scenes of three channels.
6. Experiment Results
Our test samples are collected over three cameras
with non-overlapping views. We collect different chan-
nels at the same time. Figure 8 shows position, height
and angle information of sample acquisition cameras.
When running our system on these samples according
to the time sequence of the videos, we deal with three
1086
Figure 9. Results of human detection and tracking under three cameras. Green rectangles
show outputs of human detector. The red rectangles are bounding boxes after segmentation.
Blue numbers represent the labels of people identified by objects matching.
frames from different cameras at the same time. Every
time a new person appears we give the target a new label.
Some experiment results are shown in Figure 9.
7. Conclusions and Future Work
We implement a public security system composed
of four parts including human detection, pedestrian
segmentation, apparent features extraction and object
matching. The results of experiments tested on the
samples we collect show the effectiveness of detect-
ing and tracking human under three cameras with non-
overlapping views. In the future we can add information
between adjacent frames for stable tracking under single
camera and consider spatial and temporal information
for multi-cameras tracking.
Acknowledgments
This work is supported by the National High Tech-
nology Research and Development Program of China
(863 program) under Grant No. 2011AA110402 and
the National Natural Science Foundation of China un-
der Grant No. 61071135.
References
[1] N.Dalal, and B.Triggs .Histograms of oriented grandients
for huamn detection, CVPR 2005, volume 1 , pp. 886-893 ,
2005.
[2] Xiaoyu Wang, Tony X. Han, and Shuicheng Yan .An
HOG-LBP human detector with partial occlusion handling,
ICCV 2009, pp. 32-39 , 2009.
[3] William Robson Schwartz, Aniruddha Kembhavi, David
Harwood, and Larry S.Davis .Human detection using partial
least squares analysis, ICCV 2009, pp. 24-31 , 2009.
[4] T.Ahonen, A.Hadid, and M.Pietikinen .Face recognition
with local binary patterns, ECCV 2004, volume 3021/2004,
pp. 469-481 , 2004.
[5] Y.Mu, S.yan, Y.Liu, T.Huang, and B.Zhou .Discriminative
local binary patterns for human detection in personal album,
CVPR 2008, 2008.
[6] Pedro Felzenszwalb, David McAllester, and Deva Ra-
manan .A discriminatively trained, multiscale, deformable part
model, CVPR 2008, pp. 1-8 , 2008.
[7] Carsten Rother, Vladimir Kolmogorov, and Andrew Blake
.”GrabCut” - Interactive foreground extraction using iterated
graph cut, ACM Transactions on Graphics, volume 23 , No. 3
, 2004.
[8] Sara Vicente, Vladimir Kolmogorov, and Carsten Rother
.Graph cut based image segmentation with connectivity priors,
CVPR 2008, pp. 1-8 , 2008.
[9] M.farenzena, L.Bazzani, A.Perina, V.Murino, and
M.Cristani .Person re-identification by symmetry-driven accu-
mulation of local features, CVPR 2010, pp. 2360-2367 , 2010.
1087

More Related Content

Similar to Human Detection and Tracking Using Apparent Features under.pdf

Interactive Full-Body Motion Capture Using Infrared Sensor Network
Interactive Full-Body Motion Capture Using Infrared Sensor Network  Interactive Full-Body Motion Capture Using Infrared Sensor Network
Interactive Full-Body Motion Capture Using Infrared Sensor Network
ijcga
 
Interactive full body motion capture using infrared sensor network
Interactive full body motion capture using infrared sensor networkInteractive full body motion capture using infrared sensor network
Interactive full body motion capture using infrared sensor network
ijcga
 

Similar to Human Detection and Tracking Using Apparent Features under.pdf (20)

K-Means Clustering in Moving Objects Extraction with Selective Background
K-Means Clustering in Moving Objects Extraction with Selective BackgroundK-Means Clustering in Moving Objects Extraction with Selective Background
K-Means Clustering in Moving Objects Extraction with Selective Background
 
Final Paper
Final PaperFinal Paper
Final Paper
 
MULTIPLE HUMAN TRACKING USING RETINANET FEATURES, SIAMESE NEURAL NETWORK, AND...
MULTIPLE HUMAN TRACKING USING RETINANET FEATURES, SIAMESE NEURAL NETWORK, AND...MULTIPLE HUMAN TRACKING USING RETINANET FEATURES, SIAMESE NEURAL NETWORK, AND...
MULTIPLE HUMAN TRACKING USING RETINANET FEATURES, SIAMESE NEURAL NETWORK, AND...
 
Extraction of Buildings from Satellite Images
Extraction of Buildings from Satellite ImagesExtraction of Buildings from Satellite Images
Extraction of Buildings from Satellite Images
 
Texture classification of fabric defects using machine learning
Texture classification of fabric defects using machine learning Texture classification of fabric defects using machine learning
Texture classification of fabric defects using machine learning
 
motion and feature based person tracking in survillance videos
motion and feature based person tracking in survillance videosmotion and feature based person tracking in survillance videos
motion and feature based person tracking in survillance videos
 
A Review of Paper Currency Recognition System
A Review of Paper Currency Recognition SystemA Review of Paper Currency Recognition System
A Review of Paper Currency Recognition System
 
3D Reconstruction from Multiple uncalibrated 2D Images of an Object
3D Reconstruction from Multiple uncalibrated 2D Images of an Object3D Reconstruction from Multiple uncalibrated 2D Images of an Object
3D Reconstruction from Multiple uncalibrated 2D Images of an Object
 
ei2106-submit-opt-415
ei2106-submit-opt-415ei2106-submit-opt-415
ei2106-submit-opt-415
 
Symbolic representation and recognition of gait an approach based on lbp of ...
Symbolic representation and recognition of gait  an approach based on lbp of ...Symbolic representation and recognition of gait  an approach based on lbp of ...
Symbolic representation and recognition of gait an approach based on lbp of ...
 
Vehicle Tracking Using Kalman Filter and Features
Vehicle Tracking Using Kalman Filter and FeaturesVehicle Tracking Using Kalman Filter and Features
Vehicle Tracking Using Kalman Filter and Features
 
Oc2423022305
Oc2423022305Oc2423022305
Oc2423022305
 
IMAGE SUBSET SELECTION USING GABOR FILTERS AND NEURAL NETWORKS
IMAGE SUBSET SELECTION USING GABOR FILTERS AND NEURAL NETWORKSIMAGE SUBSET SELECTION USING GABOR FILTERS AND NEURAL NETWORKS
IMAGE SUBSET SELECTION USING GABOR FILTERS AND NEURAL NETWORKS
 
I MAGE S UBSET S ELECTION U SING G ABOR F ILTERS A ND N EURAL N ETWORKS
I MAGE S UBSET S ELECTION U SING G ABOR F ILTERS A ND N EURAL N ETWORKSI MAGE S UBSET S ELECTION U SING G ABOR F ILTERS A ND N EURAL N ETWORKS
I MAGE S UBSET S ELECTION U SING G ABOR F ILTERS A ND N EURAL N ETWORKS
 
Interactive Full-Body Motion Capture Using Infrared Sensor Network
Interactive Full-Body Motion Capture Using Infrared Sensor Network  Interactive Full-Body Motion Capture Using Infrared Sensor Network
Interactive Full-Body Motion Capture Using Infrared Sensor Network
 
Leader follower formation control of ground vehicles using camshift based gui...
Leader follower formation control of ground vehicles using camshift based gui...Leader follower formation control of ground vehicles using camshift based gui...
Leader follower formation control of ground vehicles using camshift based gui...
 
B018110915
B018110915B018110915
B018110915
 
Trajectory Based Unusual Human Movement Identification for ATM System
	 Trajectory Based Unusual Human Movement Identification for ATM System	 Trajectory Based Unusual Human Movement Identification for ATM System
Trajectory Based Unusual Human Movement Identification for ATM System
 
Interactive full body motion capture using infrared sensor network
Interactive full body motion capture using infrared sensor networkInteractive full body motion capture using infrared sensor network
Interactive full body motion capture using infrared sensor network
 
Color Tracking Robot
Color Tracking RobotColor Tracking Robot
Color Tracking Robot
 

More from vasuhisrinivasan (14)

2. Dispersion Understanding the effects of dispersion in optical fibers is qu...
2. Dispersion Understanding the effects of dispersion in optical fibers is qu...2. Dispersion Understanding the effects of dispersion in optical fibers is qu...
2. Dispersion Understanding the effects of dispersion in optical fibers is qu...
 
AntBrief123A12-6-07.pptMaxwell’s Equations & EM Waves
AntBrief123A12-6-07.pptMaxwell’s Equations & EM WavesAntBrief123A12-6-07.pptMaxwell’s Equations & EM Waves
AntBrief123A12-6-07.pptMaxwell’s Equations & EM Waves
 
Helical.pptan antenna consisting of a conducting wire wound in the form of a ...
Helical.pptan antenna consisting of a conducting wire wound in the form of a ...Helical.pptan antenna consisting of a conducting wire wound in the form of a ...
Helical.pptan antenna consisting of a conducting wire wound in the form of a ...
 
surveillance.ppt
surveillance.pptsurveillance.ppt
surveillance.ppt
 
Aerial photo.ppt
Aerial photo.pptAerial photo.ppt
Aerial photo.ppt
 
cis595_03_IMAGE_FUNDAMENTALS.ppt
cis595_03_IMAGE_FUNDAMENTALS.pptcis595_03_IMAGE_FUNDAMENTALS.ppt
cis595_03_IMAGE_FUNDAMENTALS.ppt
 
rmsip98.ppt
rmsip98.pptrmsip98.ppt
rmsip98.ppt
 
IP_Fundamentals.ppt
IP_Fundamentals.pptIP_Fundamentals.ppt
IP_Fundamentals.ppt
 
defenseTalk.ppt
defenseTalk.pptdefenseTalk.ppt
defenseTalk.ppt
 
Ch24 fiber optics.pptx
Ch24 fiber optics.pptxCh24 fiber optics.pptx
Ch24 fiber optics.pptx
 
Radiation from an Oscillating Electric Dipole.ppt
Radiation from an Oscillating Electric Dipole.pptRadiation from an Oscillating Electric Dipole.ppt
Radiation from an Oscillating Electric Dipole.ppt
 
Aperture ant.ppt
Aperture ant.pptAperture ant.ppt
Aperture ant.ppt
 
Spiral antenna.pptx
Spiral antenna.pptxSpiral antenna.pptx
Spiral antenna.pptx
 
Antennas-p-3.ppt
Antennas-p-3.pptAntennas-p-3.ppt
Antennas-p-3.ppt
 

Recently uploaded

Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
Query optimization and processing for advanced database systems
Query optimization and processing for advanced database systemsQuery optimization and processing for advanced database systems
Query optimization and processing for advanced database systems
meharikiros2
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 

Recently uploaded (20)

Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
Query optimization and processing for advanced database systems
Query optimization and processing for advanced database systemsQuery optimization and processing for advanced database systems
Query optimization and processing for advanced database systems
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptx
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Memory Interfacing of 8086 with DMA 8257
Memory Interfacing of 8086 with DMA 8257Memory Interfacing of 8086 with DMA 8257
Memory Interfacing of 8086 with DMA 8257
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Ground Improvement Technique: Earth Reinforcement
Ground Improvement Technique: Earth ReinforcementGround Improvement Technique: Earth Reinforcement
Ground Improvement Technique: Earth Reinforcement
 
fitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .pptfitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .ppt
 
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 

Human Detection and Tracking Using Apparent Features under.pdf

  • 1. Human Detection and Tracking Using Apparent Features under Multi-cameras with Non-overlapping Lu Tian, Shengjin Wang, Xiaoqing Ding Tsinghua University, Department of Electronic Engineering, Beijing, China tianlu@ocrserv.ee.tsinghua.edu.cn Abstract This paper describes a human detection and track- ing system under multi-cameras with non-overlapping views using apparent features only. Our system is able to first detect people and then perform object matching. In the distributed intelligent surveillance system, com- puters need to detect pedestrians automatically under multi-cameras probably with non-overlapping views for providing a steady and continuous tracking of the pedes- trian targets. In this paper, we combine Histograms of Oriented Gradients (HOG) and Local Binary Pattern (LBP) to detect human and segment human body from the background using GrabCut algorithm. We also study the method of pedestrian feature extraction and object matching based on appearance. We connect all the mod- ules above in series to obtain a complete system and test it on samples we collect over three cameras with non- overlapping views to prove the effectiveness. We believe that our system will be helpful to the development of the public security system. 1. Introduction Nowadays cameras are all around us and they are everywhere in the buildings and on the streets. How to use them to help us with the development of the public security system is very meaningful. We expect comput- ers to detect and track pedestrians automatically under multi-cameras with non-overlapping views. This task is complex and full of challenges because human detec- tion, segmentation, feature extraction and object match- ing are all needed. Not only people have variable ap- pearance and wide range of poses, but also the scale, visual angle and illumination change a lot under multi- cameras. In the research of object detection and tracking, the difficult problems can be summarized in the following points: (a) Changes of visual angle: The imaging of object in the image plane changes with the change of projection transformation matrix which is caused by the variable angles between objects in real scene and the camera optical axis. (b) Scale changes: The size of an object varies when it moves through different cameras. (c) Illumination: Images captured by cameras are relate to illumination direction, intensity and target surface re- flectivity in real scene and the relevance is different to describe and model. (d) The deformation of objects: In the tracking process the target we are interested in is al- ways moving such as pedestrian walking, running and jumping, thus its shape will change over time. All the above factors will lead to time-varying observation of the target we detect and track, especially when the tar- get goes through multi-cameras with non-overlapping. Our system is composed of four parts including hu- man detection, pedestrian segmentation, apparent fea- tures extraction and object matching. An overview of our system is shown in Figure 1. First, we study pedestrian detection in the video [1,2,3,5,6]. By combining Histograms of Oriented Gra- dients (HOG) and Local Binary Pattern (LBP) as the fea- ture set, we provide a cascade human detector trained by AdaBoost algorithm. The results show that the pedes- trian detector based on HOG features has a high hit rate. By combining HOG and LBP we can reduce the false alarm rate while keeping the high hit rate. Second, we use the output of human detector that contains people as the input of object segmentation in order to filter out the background information. We use GrabCut algorithm to segment objects [7,8]. We create a prior mask to initialize the segment. Experiments show that when the edge of the input rectangle includes rich background information we can get good results of the object segmentation. Third, we extract the apparent features of the fore- ground obtained by the object segmentation. The fea- tures we extract include color histogram in different 978-1-4673-0174-9/12/$31.00 c ⃝2012 IEEE 1082 ICALIP2012
  • 2. (a) the system is composed of four parts (b) the actual implementation steps of the system Figure 1. An overview of our system. color space and LBP descriptor histogram to describe both color and texture characteristics of the pedestrians. Then we do pedestrian matching to achieve the pur- pose of tracking using apparent features we extract. When matching people we calculate the correlation co- efficients between two histogram descriptors and match the targets by threshold control and selecting the maxi- mum correlation coefficient. At last, all the modules above are connected in series to obtain a complete system to detect and track human over different non-overlapping camera views. We also test the system on the samples we collect over three cam- eras with non-overlapping and prove the effectiveness of the system. 2. Human Detection Dalal and Triggs proposed using grids of HOG de- scriptors for human detection and obtained good results [1]. In recent years, HOG descriptors are widely used and proved to be effective [4]. HOG features focus on the edge orientations. However this may lose some other useful information such as texture features. So we de- termine to combine HOG and LBP to describe both the edge and texture information of human in videos so that the appearance can be better captured. Some works have been done to detect human by combining HOG and LBP [2]. It is difficult to balance the weights of different fea- tures when putting HOG feature and LBP feature in an augmented vector to train the detector. So in this paper Figure 2. The flow chart of human detec- tion in our system. Our detector is a cas- cade of HOG classifier and LBP classifier. we train HOG classifier and LBP classifier separately and cascade them to get our final detector. We extract HOG features by following the procedure in [1]. 2×2 cell blocks and 9 orientation bins are chosen then we obtain 36 dimensional HOG descriptors. To capture texture features we extract LBP-based region descriptor. In this paper a LBP histogram de- scriptor similar to S-LBP (Semantic-LBP) [5] is used but we make the calculation easier. We use the nota- tion LBPP,R in [4], where P is the number of sampling points and R is the radius of the circle to be sampled, and LBPP,R for pixel(x, y) is: LBPP,R = [bP −1, ..., b1, b0], bi ∈ 0, 1 (1) The LBP8,1 operator labels the pixels of an image by thresholding the 3×3-neighbourhood of each pixel with the center value and considering the result as a binary 1083
  • 3. Figure 3. Examples of false alarms pass- ing HOG classifier then removed by LBP classifier . number. LBP is often used after decimal coding: DecimalCode{LBPP,R} = P −1 ∑ i=1 bi2i (2) so LBPP,R has 2P different values. In order to present LBP features of a region more simply, instead of decimal coding we put a LBP operator as circular data and calcu- late the histogram of the region based on the following interpretation: several continuous ”1” in an operator can be compactly represented with their length and the point they start. We calculate the number of occurrences of different continuous ”1” and organize the statistics in a histogram to express texture features. For example, op- erator ”10011011” has two continuous ”1”, one starts at the fourth number and its length is 2, while the other starts at the seventh number and its length is 3. To an 8-bit binary operator there are 7×8+2 different kinds of continuous ”1” including all the eight binary numbers are ”0” or ”1”. So the histogram of LBP8,1 has 58 bins and finally we can obtain 58 dimensional LBP descrip- tors. After we get the histogram descriptors it is easy to calculate the integral image of the features. We train our human detector using Fisher’s linear discriminant as weak classifier and using AdaBoost algorithm to gener- ate strong cascade classifier. After training HOG clas- sifier and LBP classifier separately, we cascade them to get our final detector. We use a sliding window of vari- able size to detect human in videos and the process in detail is shown in Figure 2. The experiments tested on our video samples show that the feature fusion detector reduces about fifty percent of false positive detections with a slight decline in hit rate compared to using HOG detector only. Some examples are presented in Figure 3. 3. Pedestrian Segmentation After human detection we get some rectangles con- tain pedestrians in them. We want to cut the pedestrians Figure 4. The Left is the mask we use to initialize and right is a simple rectangle initialization. Red represents foreground, black represents background, and the blue part is unknown region. from images for extracting features only on the body we are interested in and filtering out the background infor- mation. In our system we use GrabCut algorithm which is introduced in [7] to segment pedestrians. GrabCut is a kind of interactive foreground extraction using iter- ated graph cuts by modeling Gaussian Mixture Model (GMM) in RGB color space to analyze the input image and separate foreground people from the background. Object segmentation with complex background is difficult to be accurate since computers do not know which part in an image is foreground or background. Good segmentation algorithms tend to use the full su- pervision or semi-supervision [5]. For example, when we use Photoshop to segment an object we need to man- ually draw the contour to get better result. Thus initial- ization becomes quite important to segment object auto- matically. By knowing the middle part of an input rect- angle is a pedestrian whose contour looks like an oval and the edge part is background, we initialize the input rectangles with a prier mask in figure 4 and it turns out to perform better than initializing the input with a simple rectangle. Figure 5 shows that segmentation initialized with our mask is good to retain the whole body. We can take the advantage of segmentation results to remove some of the false positive detections by ana- lyzing the size and the shape of foreground. If the num- ber of foreground pixels is less than a threshold or the proportion of the object’s contour is unreasonable, this detection result must be a false alarm. 4. Feature Extraction As we want to track human under multi-cameras with non-overlapping, the movement information of a person is no longer continuous, hence we want to ex- tract apparent features to adapt to the changes of visual 1084
  • 4. (a) (b) (c) (d) Figure 5. Pedestrian segmentation. (a) In- put images of segmentation. (b) GMM im- ages. (c) Results of rectangle initialization. (d) Results of mask initialization. angle, scale, illumination, etc. Consequently we con- centrate on the color and texture features of the body. Color histogram is a kind of color characteristics widely used in image or video processing. It describes the proportion of different colors in an image and do not care about the spatial location of each color, so it can adapt to object deformation and scale changes to a cer- tain extent. Color histograms calculate statistical color information of image data and organize the statistics to a series of bins predefined to present color distributions. Color histogram can be calculated in different color space like RGB and HSV. We experiment the extraction of color histogram both in RGB space and HSV space for tracking. It turns out that the extraction in HSV per- Figure 6. Examples of feature extraction. The first column are the body parts we segment and the second column are ap- parent feature histograms extracted. The left part of a histogram represent color fea- tures and the right part in blue represent texture features. forms better because HSV shows a more direct descrip- tion of color compared to RGB. As illumination has a great influence on V-channel (value), we only calculate the histogram of H-channel (hue) and S-channel (satu- ration) to adapt to variable illumination. We equally di- vide H-channel into 16 bins and S-channel into 8 bins so that we obtain 128 dimensional histogram to describe the color characteristics. As for texture features we still use the 58 dimen- sional LBP histogram descriptor introduced in section 2. In order to only get the information of the body we are interested in, we just extract the color histogram and LBP histogram of the foreground region after pedestrian segmentation. We combine color and texture features by connecting the two histograms together after normal- ized individually. So finally we obtain 186 dimensional feature vectors. Part based features seem to be more ac- curate to describe a person [9], but they do not perform better than features extracted from the whole body in 1085
  • 5. Figure 7. Correlation coefficients between objects under multi-cameras. The top row shows some objects after segmentation. Correlation coefficients between them are listed in table below. Each triangular gray area represents a same person under dif- ferent cameras. tracking under multi-cameras. Part based features are not stable or robust because of the deformation of ob- ject under multi-cameras and our experiments also prove this. Therefore we use the apparent features of the whole foreground region to describe a person, and some results are shown in Figure 6. 5. Object Matching We track human under multi-cameras only using ap- parent features we extract without the help of discontin- uous motion information under multi-cameras. With the color and texture histogram features we extracted in sec- tion 4 we do object matching by calculating correlation coefficients between every two histograms and selecting the maximum. Two histograms that have the maximum correlation coefficient are determined to be the same per- son as well as the maximum correlation coefficient ex- ceeds the matching threshold predetermined. The corre- lation coefficient between histogram H1, H2 is calculate as following formulas and some examples are shown in Figure 7. From the figure we see that with the changes of illumination and variable postures the correlation co- efficients between the same person in different images are still greater than others. H′ k(i) = Hk(i) − (1/N)( ∑ j Hk(j)) (3) dcorrel(H1, H2) = ∑ iH′ 1(i)H′ 2(i) √∑ iH′2 1 (i)H′2 2 (i) (4) Once appear a new person, his histogram feature will be recorded in a database. As we track pedestrians in different videos with non-overlapping, we need to up- date the information we record since the feature of the same person is always changing. We set two thresh- olds to control matching and updating respectively. The lower threshold called matching threshold is to deter- mine if two histograms match and describe the same per- son. The higher threshold is called updating threshold because we can certainly identify that two histograms are extracted from the same person if their correlation exceeds this threshold and we can update the informa- tion recorded of the person using average value of the two histograms. The information update improves the matching effectively and especially does well in stable tracking pedestrians under the same camera since it can catch the slow changes of pedestrian features. (a) top view of sample collection place. (b) channel A (c) channel B (d) channel C. Figure 8. Sample collection under three cameras with non-overlapping. (a) shows positional relationship of multi cameras. (b)(c)(d) are scenes of three channels. 6. Experiment Results Our test samples are collected over three cameras with non-overlapping views. We collect different chan- nels at the same time. Figure 8 shows position, height and angle information of sample acquisition cameras. When running our system on these samples according to the time sequence of the videos, we deal with three 1086
  • 6. Figure 9. Results of human detection and tracking under three cameras. Green rectangles show outputs of human detector. The red rectangles are bounding boxes after segmentation. Blue numbers represent the labels of people identified by objects matching. frames from different cameras at the same time. Every time a new person appears we give the target a new label. Some experiment results are shown in Figure 9. 7. Conclusions and Future Work We implement a public security system composed of four parts including human detection, pedestrian segmentation, apparent features extraction and object matching. The results of experiments tested on the samples we collect show the effectiveness of detect- ing and tracking human under three cameras with non- overlapping views. In the future we can add information between adjacent frames for stable tracking under single camera and consider spatial and temporal information for multi-cameras tracking. Acknowledgments This work is supported by the National High Tech- nology Research and Development Program of China (863 program) under Grant No. 2011AA110402 and the National Natural Science Foundation of China un- der Grant No. 61071135. References [1] N.Dalal, and B.Triggs .Histograms of oriented grandients for huamn detection, CVPR 2005, volume 1 , pp. 886-893 , 2005. [2] Xiaoyu Wang, Tony X. Han, and Shuicheng Yan .An HOG-LBP human detector with partial occlusion handling, ICCV 2009, pp. 32-39 , 2009. [3] William Robson Schwartz, Aniruddha Kembhavi, David Harwood, and Larry S.Davis .Human detection using partial least squares analysis, ICCV 2009, pp. 24-31 , 2009. [4] T.Ahonen, A.Hadid, and M.Pietikinen .Face recognition with local binary patterns, ECCV 2004, volume 3021/2004, pp. 469-481 , 2004. [5] Y.Mu, S.yan, Y.Liu, T.Huang, and B.Zhou .Discriminative local binary patterns for human detection in personal album, CVPR 2008, 2008. [6] Pedro Felzenszwalb, David McAllester, and Deva Ra- manan .A discriminatively trained, multiscale, deformable part model, CVPR 2008, pp. 1-8 , 2008. [7] Carsten Rother, Vladimir Kolmogorov, and Andrew Blake .”GrabCut” - Interactive foreground extraction using iterated graph cut, ACM Transactions on Graphics, volume 23 , No. 3 , 2004. [8] Sara Vicente, Vladimir Kolmogorov, and Carsten Rother .Graph cut based image segmentation with connectivity priors, CVPR 2008, pp. 1-8 , 2008. [9] M.farenzena, L.Bazzani, A.Perina, V.Murino, and M.Cristani .Person re-identification by symmetry-driven accu- mulation of local features, CVPR 2010, pp. 2360-2367 , 2010. 1087