1. See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/320748583
Robustly Tracking People with LIDARs in a Crowded Museum for Behavioral
Analysis
Article in IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences · November 2017
DOI: 10.1587/transfun.E100.A.2458
CITATIONS
5
READS
258
6 authors, including:
Some of the authors of this publication are also working on these related projects:
Observing People’s Behaviors in Public Spaces for Initiating Proactive Human-Robot Interaction by Social Robots View project
PhD Research View project
Md. Golam Rashed
University of Rajshahi
34 PUBLICATIONS 145 CITATIONS
SEE PROFILE
Takuya Yonezawa
Saitama University
4 PUBLICATIONS 9 CITATIONS
SEE PROFILE
Antony Lam
Mercari Inc.
52 PUBLICATIONS 366 CITATIONS
SEE PROFILE
Yoshinori Kobayashi
Saitama University
147 PUBLICATIONS 1,537 CITATIONS
SEE PROFILE
All content following this page was uploaded by Md. Golam Rashed on 03 November 2017.
The user has requested enhancement of the downloaded file.
3. RASHED et al.: ROBUSTLY TRACKING PEOPLE WITH LIDARS IN A CROWDED MUSEUM FOR BEHAVIORAL ANALYSIS
2459
because it presents important advantages over other sensing
devices (e.g. vision) like high accuracy, effective sensing
distance, wide view angles, high scanning rates, little sensi-
tivity to illumination change, and ease of use [11]. However,
most of the outcomes of the early LRF based human-tracking
systems have generally been limited only to the estimation
of people’s positions in limited and controlled spaces with-
out the ability to recognize each person’s identity. Recently
in [4], [5], the authors proposed systems for simultaneously
tracking the position and body orientation of many people,
using a network of LRFs mounted at torso height and are
distributed in the environment but their systems were sen-
sitive to occluded views of people and so, could only work
reliably when the number of persons was small and there
were line-of-sight views from the LRF sensors to the target
persons.
There are also some proposed works presented in [12]–
[20] to detect and track humans in public spaces and/or lab-
oratory environments where the authors utilized different
sensing modalities in combination, like various kinds of cam-
eras and LRFs. The authors in [13], [14] used RGB-D cam-
eras, which combined 3D range and camera measurements
to track humans. Again, Kobayashi and Kuno proposed the
use of integrated sensors that are composed of an LRF and
an omni-directional camera for precise tracking of human
behaviors for HRI [15]. Blanco et al. presented an approach
to combine LRFs and vision data for robustly and efficiently
detecting people even in cluttered environments [20]. But,
from the practical implementation point of view, combining
multiple sensor types make the system more complex to in-
stall and maintain, especially for large-scale environments
(e.g. shopping mall arcades, train stations, museums)[1].
In addition, most of the early studies of human track-
ing using various sensor modalities simply either ignored
the occlusion problems or claimed to handle just partial oc-
clusion owing to their robust design [21]. But dealing with
occlusions of people and/or objects in the sensor’s views is
a critical issue in human behavior tracking research as we
need to continuously track their positions and body orien-
tation to ultimately know their interests and intentions in
public spaces.
The state-of-the-art in handling occlusions for visual
object tracking is presented in [21]. Carballo et al. dealt
with partial occlusion in people detection and position es-
timation using multiple layers of LRFs on a mobile robot
[11]. Their proposed approach is well suited to tracking
people that are relatively close to the robot. However, their
tracker is not suited to larger spaces where a significant num-
ber of people may gather. Brscic et al introduced multiple
three-dimensional (3D) range sensors, mounted above hu-
man height to provide good coverage and minimize the influ-
ence of occlusions between people [1]. However this would
have resulted in a significant and prohibitive increase in the
cost to buy, install, and maintain the tracking equipment in
public spaces.
Thus, we intend to develop a human-tracking system
with networks of relatively low cost LIDARs as opposed to a
3D range sensor platform to include better spatial coverage,
robustness, modularity, and overcome the occlusion con-
straints. Due to the current limitations (stated in Sect. 3.2)
of our previous system [4], we extended this system using
LIDAR to track a relatively large number of people even in
occluded situations in large scale social environments. The
sensing accuracy in our proposed approach can be improved
by overlapping the fields of view of widely separated sensors
deployed in real-world environments. Our aim is that users
with access to the information provided by the set of LIDARs
capable of detecting and tracking people in the public spaces,
will allow for classifying behaviors and ultimately estimat-
ing their degree of interest towards their surroundings. It
may also help in adapting new services for people in the near
future.
We note, visitor behavioral observation in museums is
recognized as an important topic for research [22]. Cur-
rently, a huge number of emerging multimedia technologies
(for example, [23]–[25]) and robotic technologies (for ex-
ample, [26]–[30]) are utilized in many real museums. But,
most of these utilized technologies are not employed to iden-
tify people’s overall behaviors within the museum space.
What differentiates our introduced technology is that we are
the first to set up such a sensor system framework in a real
museum which may make it possible for MPs to automate
the observation of the visitors’ behaviors inside the museum
seamlessly, and to ultimately obtain various kinds of valu-
able information including visitors’ degree of interests and
intentions about the exhibits and knowledge on their overall
experiences.
3. Our Human Tracking Method
Before introducing our proposed human tracking method
(detailed in Sect. 3.3) in detail, we first briefly describe the
principles, and its limitations as well as drawbacks of our
previously proposed human tracking method [4].
3.1 Our Previous Human Tracking Method
In the following section, we describe our previous human
tracking method based on the particle filter framework.
3.1.1 Modeling Human as Tracking Target
The human target is modeled as shown in Fig. 1(a). We as-
sumed that the LIDARs on the top of each of the sensor poles
to be at the shoulder level and placed horizontally so that the
contour of the human’s shoulder can be observed. The ob-
served outline shape of the human’s shoulder obtained by
the LIDARs can be considered as a part of an ellipse. Thus
we use an ellipse for the model of the tracking target. Then
the system tracks the position and the body orientation of
the humans by applying a particle filter framework [31]. We
assume the coordinate system is represented by X- and Y-
axes aligned on the ground plane, and the Z- axis represents
the vertical direction from the ground plane. The model of
4. 2460
IEICE TRANS. FUNDAMENTALS, VOL.E100–A, NO.11 NOVEMBER 2017
Fig. 1 Human body model; (a) human model, (b) body contour model.
the tracking target is represented with the center coordinates
of the ellipse [u,v] and rotation of the ellipse φ. State vari-
ables of the tracking target are denoted by {u,v,φ}. These
parameters are estimated in each frame by the particle filter
(described in Sect. 3.1.2). Using regular particle filters, we
represent the posterior distribution by a set of weighted sam-
ples that are propagated by a motion model and evaluation
by an observation model. Here, we apply a simple random
walk model for state propagation, and we evaluate samples
based on the observation of the LIDARs.
When the distance data captured with the LIDARs are
mapped onto the 2D image plane (what we call a “laser
image”), the contour of the human’s shoulder is partially
observed as shown in Fig. 1(b). The points on the contour
(see Fig. 1(b)) indicate evaluation points, which are used in
the likelihood evaluation step, which is described Sect. 3.1.3.
3.1.2 Particle Filtering
The particle filter is a Bayesian sequential importance sam-
pling technique, which recursively approximates the poste-
rior distribution using a finite set of weighted samples. A set
of samples can be used to approximate the non-Gaussian dis-
tribution and they are propagated by a state transition model
for each recursion. It thus allows us to realize robust track-
ing against observation noise and abrupt changes of target’s
motion.
Suppose that the state of a target at time t is denoted
by the vector xt and that the observation of the 2D image at
time t is denoted by the vector zt. Then all the observations
up to time t are denoted Zt = (z1, z2, ..., zt ). Assuming the
Markov process enables us to describe a prior probability
P(xt |Zt−1) at time t by
P(xt |Zt−1) = P(xt |xt−1)P(xt−1|Zt−1)dxt−1 (1)
where P(xt−1|Zt−1) is a posterior probability at time t − 1,
and P(xt |xt−1) is a state transition probability from t − 1 to
t. Assuming that P(zt |Zt−1) remains constant, a posterior
probability P(xt |Zt at time t is described by
P(xt |Zt ) ∝ P(zt |xt )P(xt |Zt−1) (2)
where P(zt |xt )P(xt ) is a likelihood and P(xt |Zt−1) is a prior
probability at timet. Tracking is then achieved by calculating
the expectation of posterior probability P(xt |Zt ) at each
time.
In the particle filter framework, the probability distribu-
tion is approximated by a set of samples {s(1)
t , ..., s(N)
t }. Each
sample s(n)
t , representing a hypothesis has the weight π(n)
t
representing a corresponding discrete sampling probability.
The hypothesis evaluation, which is also called the sample
evaluation, is to compute the weight π(n)
t by considering the
observation likelihood corresponding to the sample s(n)
t . A
set of samples is then updated by the following procedures
at each time.
•Sampling:
Select samples {ś(1)
t−1, ...ś(N)
t−1 } in proportion to weight
{π(1)
t−1, ...π(N)
t−1 } corresponding to sample {s(1)
t−1, ...s(N)
t−1 }.
•Propagation:
Propagate samples {ś(1)
t−1, ...ś(N)
t−1 } with state transition
probability P(xt |xt−1) = ś(n)
t−1 and generate new samples
{s(1)
t , ...s(N)
t } at time t.
•Weight Computation:
Compute weight π(n)
t ≈ P(zt |Xt = s(n)
t ) corresponding
to sample s(n)
t by evaluating a likelihood through 2D images
(n=1,2,...,N).
3.1.3 Likelihood Evaluation
In the previous method, we evaluated particles based on the
LIDAR observations. The likelihood is determined by as-
sessing the contour similarity between the model (described
in Sect. 3.1.1) and the human’s shoulder as partially observed
by the LIDARs. At first, an ellipse model of the body contour
is sampled as evaluation candidates. Then we select evalua-
tion points from candidates by calculating the inner product
between the normal vector at the point and the vector from
the LIDAR to the point. This process is executed by each
LIDAR. Finally, the likelihood of each sample is evaluated
by the maximum distance between the evaluation points and
the nearest distance data by:
Wt
i
= exp(
−d2
max
σd
) (3)
where Wt
i
is the likelihood score based on the laser image.
The dmax term is the maximum distance between evaluation
points and the nearest distance data. At each time instance,
once the distance image is generated from the LIDAR image,
each distance (dn) is easily obtained. The σd is the variance
derived from dn. A conceptual image of the evaluation pro-
cess is shown in Fig. 2 where the likelihood of each particle
is evaluated. The likelihood score is maximized when the
model completely matches the LIDAR image. This likeli-
hood evaluation procedure is repeated for each particle. The
estimation of the state at time t is calculated as the expecta-
tion of the weights over particles.
The principle of the above described human tracking
method usually deals with the case of a single LIDAR to
5. RASHED et al.: ROBUSTLY TRACKING PEOPLE WITH LIDARS IN A CROWDED MUSEUM FOR BEHAVIORAL ANALYSIS
2461
Fig. 2 Likelihood evaluation based on maximum distance.
explain the basic tracking process, which allows us to track
a human within a small area. Later, to track humans within
a relatively larger area, in our proposed method in [4], we
use multiple LIDARs. All the LIDARs’ data are integrated
into one laser image, and the method with a single LIDAR
is applied to the image.
We use a simple manual operation for this integration.
We consider an image covering the entire area. Then, we
roughly put each LIDAR’s’ data on the image. We translate
and rotate each LIDARs’ data manually so that correspond-
ing features in each image such as line segments showing
the walls or corners of the room coincide. The data from
each LIDAR are sent to a PC, which assigns a time stamp on
the data when it receives them. The data whose time differ-
ences are less than 1/40 second (the LIDAR’s’ scan rate) are
integrated into one image data.
This can be called, raw-data-level integration. It is good
in that we can easily integrate data from a real life situation
and use the same method as a single LIDAR. However, it
has some drawbacks and limitations. In the next subsection,
we describe the limitations and drawbacks of our previous
human tracking method.
3.2 Drawbacks of Our Previous Human Tracking Method
As we mentioned in the Sect. 2, our previous system [4]
under the setting with multiple LIDAR poles distributed in
laboratory environment can only work reliably where there
is a small number of people and there are line-of-sight views
from all the deployed LIDARs to the humans.This is because
the detection and tracking was based on the observations of
all the deployed LIDARs. Furthermore, it can not robustly
handle partial occlusions and fails to handle full occlusions
which occur frequently in between the LIDARs and the hu-
mans in crowded large-scale environments. Here, the basic
reasons for those drawbacks are briefly discussed.
The previous system can effectively track humans with
unique-IDs when all the LIDARs are accurately calibrated
on the poles deployed in the observed environment, and if
there are no occlusions between the target human and all
the deployed LIDARs, even when humans are moving about
(see Fig. 3(a)). Our goal is to apply the proposed method
here to real situations where calibrations among LIDARs are
Fig. 3 (a) Ideal case: evaluation model formed by fitting an ellipse to
the shoulder outline obtained by LIDAR-1 and LIDAR-2, (b) defective
observed body outline by LIDAR-1 and LIDAR-2 black as compared to the
ideal case. (c) Fitted evaluation model is quite different from the shoulder
outline obtained only from LIDAR-1, without LIDAR-2 due to occlusion.
not tightly interdependent occlusions may often happened
among humans. But, due to the lack of accurate calibrations
among all of the deployed LIDARs as well as LIDARs’ inter-
nal operational processing delays in time causes problems.
For example, the partially observed body outlines from all
the deployed LIDARs do not always fit to the contour ob-
servation model in the image mapped from the distance data
(see Fig. 3(b)). This results in inaccurate computation of
weights of the evaluation points of the target person in the
image mapped from the LIDARs and thereby degrades the
performance of human body position and orientation mea-
surement.
Again, in the case of crowded environments, it is not
always possible to maintain line-of-sight views from all the
LIDARs to the target people. Hence, partial occlusion may
occur frequently between any of the LIDARs and the target
person. So, it is quite difficult to assess the contour simi-
larity between the observation model and the contour of the
human body observed from all the deployed LIDARs due
to insufficient observable evaluation points (see Fig. 3(c)).
Thus, the weight computation of the very small number of
evaluation points of a given person is not enough to compute
the expectation value across the target person over time. For
this reason, with this system we can not track a partially
occluded person’s position and body orientation accurately
with his/her unique-ID.
In addition, our previous system can not deal with situ-
ations where reassigning the same unique-ID to the persons
that were temporarily lost by the tracker due to full occlu-
sion from the LIDARs. Thus, our previous system is not
robust enough to track humans with their proper identity
in crowded public spaces. For these reasons, we propose
a new modified system to make it usable not only in labo-
ratory environments but also in crowded large-scale social
environments to robustly track human position, body orien-
tation, and walking trajectories even with partial and/or full
occlusion, to thereby obtain their behaviors. Details of our
6. 2462
IEICE TRANS. FUNDAMENTALS, VOL.E100–A, NO.11 NOVEMBER 2017
Fig. 4 Likelihood evaluation; (a) conceptual image, (b) evaluation model
formed by fitting an ellipse to the shoulder outline obtained by the LIDAR.
extended system are presented in the next section.
3.3 Our Extended Human Tracking System: Proposed Ap-
proach
To estimate the positions and body orientations of human
as tracking targets using our proposed method in museum
art galleries like large scale environments, we used our pre-
viously proposed human model as described in Sect. 3.1.1.
Our proposed method is designed to employ a set of LIDAR
poles where the LIDARs on the top of each of the LIDAR
poles are at the shoulder level of the human. The LIDAR
poles can be put in the gallery without damaging walls or
the floor. We then incorporate them to ensure the coverage
and track humans with less occluded observation within a
typical large area.
3.3.1 Proposed Likelihood Computing Model
To compute the likelihoods of the samples from the deployed
LIDARs, we evaluate particles based on the observation of
the LIDAR. The likelihood evaluation procedure of our pro-
posed system consists of two steps: the fundamental and
integration step.
In the fundamental step, for each of the deployed LI-
DARs, an ellipse model of the visitor shoulder contour
is sampled as evaluation candidates. The weights of the
samples are evaluated individually from assessing the con-
tour similarity between the observation model and the body
outline partially observed by the individual LIDARs (see
Fig. 4(a)). To do so, we consider data from individual LI-
DARs as providing qualitative information which are useful
for estimating human positions in the observation area un-
der the following three scenarios: (a) most probably exist-
indicating a contour which may correspond with the edge of
a detected human and/or object, (b) must not exist-indicating
certain points are empty, that is there is no any observable en-
tities, (c) undefined-indicating that a certain area is occupied
by some observable entity. In such an area, the existence of
any person and/or objects is unclear. Fig. 5 (Left and Cen-
ter) illustrates the distinction between these three types of
scenarios, obtained from the LIDARs provided information.
Now, it is important to design a weighting scheme for
each of the samples. The design of our weighting function is
Fig. 5 A typical LIDAR to scan for people. (Left) The position of people
relative to the LIDAR, (center) three types of scenarios in LIDAR scanning,
(right) representation of their weight function.
illustrated in Fig. 5 (Right) for the three mentioned scenarios.
We assigned the weights as CHIGH, CLOW, and CMEDIUM to
the data provided by the LIDARs for the scenarios-most
probably exist, must not exist and undefined, respectively.
These assigned weights are not related to the distance of the
tracking target from LIDARs but to identify the state of the
observation area on whether any object/person exists or not.
Specifically, the i-th sample is generated at time t for
each contour obtained from the image mapped from the dis-
tance data. The normal vectors of each point (the black points
in Fig. 4(b)), such as ‘b’ and ‘d’, are assumed as shown in
Fig. 4(b). The vectors from the position of the LIDAR to the
points are assumed to be ‘a’ and ‘c’. The system then calcu-
lates the inner product of the vectors for each point. If the
result of the inner product is negative (a · b < 0), the point is
able to be observed by the LIDAR. These observable points
are dealt with as evaluation points (the dark black points in
Fig. 4(b)). Conversely, a positive inner product (c · d > 0)
indicates that the point is not able to be observed by the LI-
DAR. Next, the distance, dn between each evaluation point
and the observed contour is calculated. Then, the weights of
the i-th sample at time t for a single LIDAR is calculated by:
wt
l,i
=
n
Vn (4)
Here, Vn= f (dn), where dn is the distance from the
LIDAR sensor to the samples of evaluation candidates, l is
the LIDAR ID.
More specifically, Vn can be expressed as follows.
Vn = f (dn) =
CLOW, if dn < T1
CHIGH · exp(−d2
n), if T1 dn T2
CMEDIUM, if T2 dn
(5)
Now in the integration step, we simply add those com-
puted weights of individually observed samples from differ-
ent LIDARs to get the aggregated weights for each of the
samples. Thus, the aggregated weights of the i-th sample at
time t from all the LIDARs can be obtained by the following
simple calculation:
7. RASHED et al.: ROBUSTLY TRACKING PEOPLE WITH LIDARS IN A CROWDED MUSEUM FOR BEHAVIORAL ANALYSIS
2463
Wt
i
=
n−1
l=0
wi,l
t (6)
By calculating the weight values across the samples,
the system can estimate and track each person’s position ac-
curately with a unique-ID. The higher the aggregated weight
value for a sample, the greater the tracker’s stability to ro-
bustly detect and track the position and body orientation of
the human. These likelihood evaluation procedures are re-
peated for each particle. The estimation of the state at time
t is calculated as the expectation of the weights over the
particles.
Since, in our extended system, the computations of the
weights of the observed sample are performed from each
LIDAR individually in the fundamental step black, and in the
integration step, the LIDARs specific computed weights are
integrated to finally calculate the expectation value across the
samples to estimate each person’s position and orientation.
In addition, our extended system can track humans even if
partially occluded in a crowded large scale environment.
This is because, our system can track a human as long as
s/he is detected by at least one LIDAR.
3.3.2 Reassigning Unique-ID to a Temporarily Lost Person
If our systemcan not estimate the location ofa person from all
the LIDARs of our system due to temporary full occlusion by
the surroundings person, then the weight computation will
not be applicable in maintaining the tracking of him/her.
Thus, the likelihood would be below the threshold. In such
a case, his/her tracker is immediately removed because the
particle filter is assumed to no longer be tracking anything.
In our proposed system, we overcome such situations with
the following simple but effective strategy.
Once a person is fully under occlusion or not visible to
the all LIDARs of the sensor system for a while at some lo-
cation in the observation area, then the resulting likelihoods
tend to zero by which the tracker for him/her will be removed
by the system but the following valuable information will be
preserved for a later attempt to reassign to the lost person:
•Lost unique-ID,
•Coordinate position where tracker failed, (x1,y1)
•Frame no. (when tracker failed), FL
After few frames later, if the fully occluded person
becomes visible to any of the LRFs of our sensor system, then
the system will try to assign the previous tracker’s unique-
ID to that person if meet the following two conditions are
satisfied:
• Condition-1. The distance, D between the lost coordi-
nate (x1,y1) and the coordinate of the newly visible position
of the lost person (x2,y2) is less than some threshold distance,
Dth.
• Condition-2. The frame difference, FD between the
frame where tracking failed, FL and the frame where the
person appeared, FA is some reasonable amount,Fth.
Thus, we can say, if D <Dth and FD(=FA-FL)<Fth
are satisfied for a person then the system will reassign the
same unique-ID to the person who just lost that unique-ID
previously due to full occlusion for a while.
People cannot disappear in the middle of the room and
they cannot appear from the middle of the room. If the
method loses any people and later finds any people, it is
reasonable to make correspondences among them that are
closest in distance and time as in Condition-1 and Condition-
2 when only LIDAR data are available. This heuristics may
not work if the method fails to track multiple people nearby in
a short time period. Such situations may happen if the room
is crowded with people. We evaluate how visitor density
affects the performance of our method through experiments
in Sect. 4.
In our present system, we utilize GPGPU to parallelize
calculations of each evaluation point in each sample which
can be performed by using CUDA (Compute Unified Device
Architecture) [32]. Consequently, the system’s performance
will not be adversely affected by the number of visitors being
tracked even in large-scale environments. Additionally, the
system can track multiple persons robustly in real time even
with full occlusions for a while.
4. Art Gallery Installation
In the following we present the experimental setup and re-
sults of the human tracking system implementation in an art
gallery of the “Ohara Museum of Art” in the Kurashiki area
of Okayama, Japan. We first demonstrate the system design
for the real world environment and analyze the obtained per-
formance in terms of tracking accuracy. Second, we provide
an example where the achieved large amount of tracking data
can provide useful knowledge about the visitors to the MPs.
4.1 Tracking System Setup
Figure 6(top) illustrates location map of the art gallery (the
red dashed line with size 20 m × 11 m) in the museum. That
art gallery is exhibited with 21 masterpieces of the western
arts from the 19th and 20th centuries. Among those, 19
paintings are hanged on the wall and two remaining large
sized paintings are hanged over the doors in the north and
south sides of the art gallery. There are also four sofas
and one piano permanently in the middle of the art gallery;
see Fig. 6(middle). Additionally, for the visitors, there is a
charming window view of the outside through the door in
the north side of the art gallery. All the valuable paintings
and piano are permanently protected against physical contact
by the visitors using a physical barrier. Usually, in that art
gallery, visitors view the paintings according to different
patterns of behaviors. For example, some of the visitors
view all the paintings and exit the art gallery, some visitors
sit on the sofa to rest in between viewing the paintings, a
few visitors go through the door on the north side of the art
gallery to look out the charming window view of the outside
before leaving the art gallery.
Our goal was to make the sensor arrangement inside the
art gallery as simple and visitor friendly as possible so that
8. 2464
IEICE TRANS. FUNDAMENTALS, VOL.E100–A, NO.11 NOVEMBER 2017
Fig. 6 Visitor tracking area and sensor setup in an art gallery of Ohara
Museum of Art; (top) the red dashed line shows the border of the area
covered by the sensors; (middle) experimental space covered by the red
dashed line; (bottom) illustration of the experimental setup.
the sensors would be inconspicuous and not detract from the
visitor’s actual interests towards the painting inside the art
gallery. To do so, the tracking system is realized as a combi-
nation of six LIDARs poles which are set up at the standard
shoulder level of an adult human inside the art gallery; see
Fig. 6(bottom). Each of the LIDAR poles is equipped with
a laser range finder (Model: UTM-30LX/LN and UTM-
30LX-EW by Houyo Electric Machinery). In general, as
mentioned earlier, the height of the LIDAR sensor pole is
considered as the height of the “shoulder level” of typical
adults. In museum like social spaces people usually pay
attention to the exhibits while standing. Our observations
in Japanese society revealed that the standard height of the
shoulder of a typical adult is about 120 cm. Although there
is variation in height, in our experiments, we placed the LI-
DAR on a pole about 120 cm in height and found it worked
well in most cases. This setup is well suited to installation
at a real large-scale environment such as an art gallery in a
museum where visitors shoulders can be tracked easily just
by placing LIDAR poles in different locations to ensure the
coverage. This is because our sensors have a relatively large
usable range: the maximum range at which correct and sta-
ble measurements can be obtained is around 0.1 meter to 30
meters in distance and 270 degree [33]. The measurements
are accurate with low noise, especially for close ranges. The
number of missing measurements increases with distance,
especially for dark and transparent objects. To overcome
such types of technical constraints as well as to compen-
sate the tracking error due to occlusion among visitors, in
our considered wide-scale art gallery we use six LIDARs.
This setup ultimately assists our human tracking system to
robustly track most of the visitors with their unique-ID from
the beginning to the end of their painting viewing time inside
the art gallery.
Our current setup can also continue tracking the visitors
with their unique-ID, in cases where they sit on the sofas for
a while, and/or who shortly go through the door of the north
side of the art gallery to look out the charming window. In
general, our system detects and tracks people by observing
their shoulder positions only inside the art gallery. So, to
continue tracking those visitors we utilize the method stated
in Sect. 3.3.2 to reassign the same unique-ID to the visitors
when they stand-up from the sofa or return back to the art
gallery from the door of the north side of the art gallery,
to continue viewing the paintings. In this case, we relaxed
condition-2 to reassign the lost tracking unique-ID to the
lost person. Hence, this method compensates for tracking
errors to continuously track visitors who view the paintings
in different ways inside the art gallery.
The LIDARs are connected using USB extensions to
the sensor control unit which was located at the outside of
the art gallery; see Fig. 6(bottom). For simplicity, we use
only one PC (Intel core i7 CPU, 3.60 GHz, 8GB RAM) to
receive and store the sensor data. In addition to the sensor
system, four cameras (two handy cams, two SP360 action
cameras) were additionally installed inside the art gallery to
capture the visual video information of the whole tracking
area; see Fig. 6(bottom).
We conducted our experiment on 28th August 2015 with
the above mentioned setup and we have been using it to gather
visitors’ tracking data for 7 hours (09:00 to 16:00). Later, the
stored data were applied to our proposed method to analyze
the achieved performance of it in terms of tracking accuracy.
Additionally, we provided an example where the obtained
large amount of continuous human tracking data can provide
very useful knowledge about the visitor’s behaviors to the
MPs.
4.2 Visual Illustrations of Visitor Detection and Body Ori-
entation as well as Movement Trajectory Recognition
Our proposed system worked as expected in a real social
environment (art gallery of a real museum) and we show
9. RASHED et al.: ROBUSTLY TRACKING PEOPLE WITH LIDARS IN A CROWDED MUSEUM FOR BEHAVIORAL ANALYSIS
2465
Fig. 7 Visualizing the visitors; Illustration of (a) visitor detection and
body orientation recognition, (b) movement trajectories recognition.
Fig. 7 as an example. It is revealed that our system can judge
correctly, the frontal direction of the visitors as illustrated
in Fig. 7(a). The ellipses of visitors are detected inside the
art gallery and the line segments indicate the body’s frontal
direction that our proposed system recognizes. Fig. 7(b)
illustrates the visual presentation of the typical movement
trajectories of visitors from 11:00 to 12:00 during our ex-
periment. Details can be found in Sect. 4.3. The movement
trajectories are obtained by extracting the positional data of
57 visitors who were completely tracked by our system within
the above mentioned 1 hour time duration. In Sect. 4.4 we
demonstrated the value of extracted positional data for ob-
taining vital information about the visitors’ behaviors inside
the art gallery of a real museum.
4.3 Tracking Accuracy Evaluation
We manually obtained the ground truth to evaluate the ac-
curacy of our system. We viewed the images of the four
cameras taken in the experiment and gave each visitor a la-
bel by comparing the images and the tracking results. Table 1
shows the evaluation results. The tracking accuracy may de-
pend on how crowded the room is. Thus we examined the
number of visitors that entered the art gallery in each hour
period. Then we chose three one-hour periods (11:00 to
12:00, 13:00 to 14:00, and 15:00 to 16:00). As shown in
Table 1, we call these periods, light-density, high-density,
and moderate-density, respectively.
As we would like to observe each individual’s behaviors
inside the art gallery using our proposed system, it is neces-
sary to evaluate the accuracy by how long a visitor is tracked
using the assigned unique-ID for the duration of his/her total
time in the art gallery. For every labelled visitor, we compute
the Accuracy Index (AI) using Eq. (7). If the value of the AI
for a visitor is 100, we denote it as AI100, and that visitor was
fully tracked by our system with his/her assigned unique-ID.
However, in some applications, AI100 may not be necessary.
We would like to know how the performance may change if
we reduce the necessary tracking accuracy. Therefore, we
choose AI80 and examine the rate of visitors that the system
can successfully track more than 80% of time. Table 1 shows
the rate of visitors with AI100 and that of visitors with more
than AI80 compared to the number of labeled visitors.
AI =
Tracking time with assigned unique-ID
Total passed time inside the art gallery
× 100
(7)
Our visitor tracking system performs quite satisfactorily
when the visitor density in the art gallery is low. But an in-
crease in density leads to a decrease in tracking performance.
Although we utilize the method described in Sect. 3.3.2 to
combat the full occlusion of visitors in the case of high visi-
tor density, there were still errors. The main source of errors
are lost tracks of some visitors who were in a group or family.
We observe that, in the case of a group or family of visitors,
they often tend to move closely together from one painting
to another. In such cases, in a particular small region in
front of any painting, the density of the visitors would be
very high, thereby causing the system to fail in identifying
and tracking each of the group members with their initially
assigned unique-ID. Besides that, our method described in
Sect. 3.3.2 also failed to reassign unique-IDs to visitors that
stood up from sofas while with their group, or those that re-
turned back to the art gallery with a group from the door of
the north side of the art gallery. ID switching has a great im-
pact on the performance evaluation of any tracking system.
In the evaluation of our proposed tracking system, we also
consider unique-ID switching. In any given situations, if
visitor switches his/her unique-ID with others during track-
ing, we did not consider him/her tracked in the subsequent
parts of his/her visit to the art gallery. In most cases, the
system recovered the track correctly, but there were also a
number of unique-ID changes. We observed that unique-ID
switching was a major cause of lower tracking performance
(cases with less than AI80).
We note that in general, our system applied trackers to
each of the visitors with a unique-ID at the entrance/exit of
the art gallery one-by-one whentheir shoulders were detected
by the sensory system, and finally removed those applied
trackers automatically when they left the art gallery through
the entrance/exit gate. In addition, using shoulder height
LIDARs poles prevented a major cause of false positives
because the height of most objects in the gallery such as the
piano, were below the shoulder height of a typical visitor.
But false negatives did occur because we observed from our
10. 2466
IEICE TRANS. FUNDAMENTALS, VOL.E100–A, NO.11 NOVEMBER 2017
Table 1 Tracking accuracy evaluation of our proposed system under different visitor densities.
Tracking
Scenario
Light-density
(11:00 ∼ 12:00)
Moderate-density
(15:00 ∼ 16:00)
High-density
(13:00 ∼ 14:00)
Number of
labelled visitors
59 107 178
Average density
(visitors)
9.2 14.2 20
Maximum gathering
at a time
19 25 31
% of visitors secured AI80 96.6 78.5 71.3
% of fully tracked visitors (AI100) 89.8 72.0 61.2
captured video data that among the total labelled visitors,
there were children, toddlers, and wheelchair users who were
not observable by our system since both of their heights were
less than the height of the LIDAR poles. This is another
source of errors.
4.4 Application of Our System for the MPs: Statistical
Analysis
Our human tracking system permits us to observe the undis-
turbed behaviors of the visitors inside the art gallery for
extended periods of time. Thus, inside the art gallery, it is
possible to gather knowledge on, how many visitors visited,
which directions (clockwise or counter clockwise) they usu-
ally moved to view the paintings, how long they were inside
the art gallery, which paintings are mostly liked by the visi-
tors, how many people were seated on the sofa to rest, how
many visitors were in special areas of the room for a partic-
ular time period. Thus, with our system, we can naturally
observe the interests and intentions of the visitors inside the
art gallery, which will definitely be valuable information for
MPs to take into account for any further decisions to make
the art gallery more attractive.
Here, we illustrate some example statistics that we were
able to extract from our aggregated data about visitors by
our sensor system. Figure 8 shows the variations of the total
number of visitors inside the art gallery on the experimental
day. It is seen that the number of visitors after lunch was
remarkably high. Before and after the lunch period, the
average number of visitors were almost the same. It is also
observed from the recorded data that 88.8% of the total
visitors moved counter clockwise inside the art gallery, as
is typical (>75%) in an art gallery which was claimed by
Melton in [34], [35]. We further observed that 24.6% of the
visitors would sit on the sofa while 6.58% would go through
the north gate of the art gallery to view the natural beauty
of outside window. This is the type of information that the
museum curator asked of us while we conducted experiment
in that art gallery.
To analyze more detailed behaviors of the visitors inside
the art gallery, we chose the data for the period from 15:00 to
16:00 (Moderate-density). Our system can completely track
77 visitors in the period. We can categorize these 77 visitors’
movements into four categories of visiting styles as proposed
in ethnographic studies by Veron and Levasseur [36]. They
classified visitors into Ant, Fish, Grasshopper and Butterfly
Fig. 8 Total number of visitors to the art gallery on an hourly basis.
Table 2 Different categories of visitors based on their visiting style.
Visiting Style [%]
Ant visitor 20.8
Fish visitor 6.5
Grasshopper visitor 39.0
Butterfly visitor 33.7
visitors as suggested by the behaviors of these animals. The
Ant visitors spend quite a long time observing all the exhibits
by walking closer to exhibits but avoid empty spaces. The
Fish visitors prefer to move and stop at empty spaces but
avoid areas near exhibits. The Grasshopper visitors spend
long periods of time viewing select exhibits but ignore the
rest of exhibits. The Butterfly visitors observe almost all
the exhibits but spend varied times observing each exhibit.
Identifying patron visiting styles for MPs can be advanta-
geous for setting up an effective guide system in museums
as mentioned in [37]–[39]. Table 2 shows the categorization
results for 77 visitors. In this study we manually categorized
the visitors by looking at their walking trajectory patterns.
Fig. 9 shows heat map images where for each style, the cu-
mulative time visitors’ stay at the same position is shown
using colors. The figure clearly shows the differences in
movements among these four visiting style visitors.
The results show that our system is applicable as a
useful tool for museum studies. The system can analyze the
behaviors of visitors that the system can track. However, if
we need to know the exact rate of each of the four visiting
11. RASHED et al.: ROBUSTLY TRACKING PEOPLE WITH LIDARS IN A CROWDED MUSEUM FOR BEHAVIORAL ANALYSIS
2467
Fig. 9 Heat map image of different types of visitors’ movements in the
art gallery.
styles, we need to improve the system to track almost all of
the visitors.
Other art gallery specific information can also be ob-
tained from the data provided by our tracking system. The
heat map in Fig. 10 shows the positions in the art gallery
where all categories of visitors often stopped. The most
stopped at positions of the visitors inside the art gallery re-
flect the message that there are interesting paintings around
those positions from where visitors usually view their most
liked paintings. We note that, at the outgoing corridor of
that art gallery, we often asked the visitors randomly to know
about their most liked paintings. The top ranked most like-
able paintings information from these surveys matched with
our proposed system’s provided information. It is also ob-
served from Fig. 10 that because visitors usually move in an
anti-clockwise pattern through the paintings in the gallery,the
first paintings in this anti-clockwise path received relatively
more attention than the ones at the end of the path. Melton
[34] called this “exit gradient” as a special case when visitors
move through an art gallery. Thus we can say that our sys-
tem is very much effective for the MPs to gather such types
of valuable information autonomously from our proposed
human tracking system. These types of information indi-
Fig. 10 Heat map of stopping positions of the all categories of visitors
inside the art gallery; the photos show the most likeable paintings.
cate whether certain people are interested in some selected
exhibits and may help the MPs to rearrange the exhibits to
make the art gallery more attractive and/or to adapt attractive
services for providing additional details to the visitors about
those exhibits in that art gallery.
4.5 Discussion
Practically tracking in public spaces has been considered dif-
ficult because the arrangement of static objects in such places
is very variable over time. But, the layout of the contents
in art galleries are quite well organized over long periods of
time. So, the setup presented here does not show any big
problem. Actually, our aim was to setup the sensor arrange-
ment inside the art gallery to be as simple and unobtrusive
as possible to track the visitors so that it would not detract
from the natural experience of attending the gallery. Thus,
we choose a LIDAR pole based sensor system. Due to its
portability, it is more convenient to arrange such types of se-
tups in any other art galleries within a very short amount of
time with few technical professionals for the same purposes.
LIDARs are much more cost effective and commercially
available on the market as opposed us to 3D range sensors
because of their prohibitive increase in cost and necessities
of maintenance with highly technical professionals.
5. Conclusion
In this study, we proposed a LIDARs based people tracking
system usable in museum like crowded large-scale social
environments. This tracking system allows us to estimate
and track the position, body orientation, and the walking
trajectories of all persons with their unique-IDs inside any
observation area even in crowded situations. The evaluation
of our proposed method showed that the system performs ac-
ceptable human tracking performance and provides a wealth
of valuable behavioral information about the patrons in art
gallery like popular social public spaces. The art gallery
installation of our human tracking system demonstrated the
12. 2468
IEICE TRANS. FUNDAMENTALS, VOL.E100–A, NO.11 NOVEMBER 2017
ability to infer a great deal of knowledge about the visitors’
interests and intentions inside the art gallery, which are very
important for the MPs to take into account in making deci-
sions on improving the attractiveness of any art gallery by
introducing more services for the visitors. In the future, we
would like to introduce automatic categorization of various
types of visitors instead of requiring manual categorization.
Furthermore, we would like to introduce a museum guide
robot system for the visitors based on our obtained visitors’
behavioral knowledge from these experiments. We leave this
for our future research.
Acknowledgments
The authors would like to thank the authority of the Ohara
Museum of Art, Kurashiki, Japan. This work was supported
in part by JSPS KAKENHI Grant Number 26330186.
References
[1] D. Brscic, T. Kanda, T. Ikeda, and T. Miyashita, “Person tracking in
large public spaces using 3-D range sensors,” IEEE Trans. Human-
Mach. Syst., vol.43, no.6, pp.522–534, Nov. 2013.
[2] M. Rashed, R. Suzuki, T. Kikugawa, A. Lam, Y. Kobayashi, and Y.
Kuno, “Network guide robot system proactively initiating interaction
with humans based on their local and global behaviors,” Lecture
Notes in Computer Science (LNCS), vol.9226, pp.283–294, Springer
International Publishing, 2015.
[3] T. Kanda, D. Glas, M. Shiomi, and N. Hagita, “Abstracting peo-
ple’s trajectories for social robots to proactively approach customers,”
IEEE Trans. Robot., vol.25, no.6, pp.1382–1396, Dec. 2009.
[4] T. Oyama, E. Yoshida, Y. Kobayashi, and Y. Kuno, “Tracking visitors
with sensor poles for robot’s museum guide tour,” Proc. 6th Interna-
tional Conference on Human System Interaction, pp.645–650, June
2013.
[5] D. Glas, Y. Morales, T. Kanda, H. Ishiguro, and N. Hagita, “Si-
multaneous people tracking and robot localization in dynamic social
spaces,” Auton. Robot., vol.39, no.1, pp.43–63, 2015.
[6] D.F. Glas, T. Miyashita, H. Ishiguro, and N. Hagita, “Laser-based
tracking of human position and orientation using parametric shape
modeling,” Adv. Robot., vol.23, no.4, pp.405–428, 2009.
[7] P. Duan, G. Tian, and W. Zhang, “Human localization based on
distributed laser range finders,” International Journal of Hybrid In-
formation Technology, vol.7, no.3, pp.311–324, 2014.
[8] E.J. Jung, J.H. Lee, B.J. Yi, J. Park, S. Yuta, and S.T. Noh, “De-
velopment of a laser-range-finder-based human tracking and con-
trol algorithm for a marathoner service robot,” IEEE/ASME Trans.
Mechatronics, vol.19, no.6, pp.1963–1976, 2014.
[9] H. Zhao and R. Shibasaki, “A novel system for tracking pedestrians
using multiple single-row laser-range scanners,” IEEE Trans. Syst.,
Man, Cybern. A, Systems and Humans, vol.35, no.2, pp.283–291,
March 2005.
[10] M. Mucientes and W. Burgard, “Multiple hypothesis tracking of
clusters of people,” Proc. IEEE/RSJ International Conference on
Intelligent Robots and Systems, pp.692–697, Oct. 2006.
[11] A. Ohya, A. Carballo, and S. Yuta, “Reliable people detection using
range and intensity data from multiple layers of laser range finders
on a mobile robot,” International Journal of Social Robotics, vol.3,
no.2, pp.167–186, April 2011.
[12] Y. Matsumoto, T. Wada, S. Nishio, T. Miyashita, and N. Hagita,
“Scalable and robust multi-people head tracking by combining dis-
tributed multiple sensors,” Intelligent Service Robotics, vol.3, no.1,
pp.29–36, 2010.
[13] M. Luber, L. Spinello, and K.O. Arras, “People tracking in RGB-D
data with on-line boosted target models,” Proc. International Confer-
ence on Intelligent Robots and Systems (IROS), 2011.
[14] W. Choi, C. Pantofaru, and S. Savarese, “Detecting and tracking
people using an rgb-d camera via multiple detector fusion,” Com-
puter Vision Workshops (ICCV Workshops), 2011 IEEE Interna-
tional Conference on, pp.1076–1083, Nov. 2011.
[15] Y. Kobayashi and Y. Kuno, “People tracking using integrated sensors
for human robot interaction,” Proc. IEEE International Conference
on Industrial Technology, pp.1617–1622, March 2010.
[16] J. Cui, H. Zha, H. Zhao, and R. Shibasaki, “Tracking multiple people
using laser and vision,” Proc. IEEE/RSJ International Conference on
Intelligent Robots and Systems, pp.2116–2121, Aug. 2005.
[17] J. Cui, H. Zha, H. Zhao, and R. Shibasaki, “Multi-modal tracking
of people using laser scanners and video camera,” Image and Vision
Computing, vol.26, no.2, pp.240–252, 2008.
[18] C. Premebida, O. Ludwig, and U. Nunes, “LIDAR and vision-based
pedestrian detection system,” J. Field Robotics, vol.26, no.9, pp.696–
711, 2009.
[19] X. Song, H. Zhao, J. Cui, X. Shao, R. Shibasaki, and H. Zha, “Fusion
of laser and vision for multiple targets tracking via on-line learning,”
Proc. Int. Conf. Robotics and Automation (ICRA), pp.406–411, May
2010.
[20] J. Blanco, W. Burgard, R. Sanz, and J. Fernandez, “Fast face detection
for mobile robots by integrating laser range data with vision,” Proc.
of the ICAR, 2003.
[21] S. Ishii and K. Meshgi, “The state-of-the-art in handling occlusions
for visual object tracking,” IEICE Trans. Inf. & Syst., vol.E98-D,
no.7, pp.1–14, 2015.
[22] S. Bitgood, “An analysis of visitor circulation: Movement patterns
and the general value principle,” Curator: The Museum Journal,
vol.49, no.4, pp.463–475, 2006.
[23] F. le Coz and F. Lemessier, “Multimedia technology at the orsay
museum: Institutional experience and future prospects,” ICHIM,
pp.377–383, 1993.
[24] M. Economou, “The evaluation of museum multimedia applica-
tions: lEssons from research,” Museum Management and Curator-
ship, vol.17, no.2, pp.173–187, 1998.
[25] F. Liarokapis, P. Petridis, I. Dunwell, G. Constantinou, S. Arnab,
S. de Freitas, and M. Hendrix, “The herbert virtual museum,” J.
Electrical and Computer Engineering, pp.1–8, 2013.
[26] M. Shiomi, T. Kanda, H. Ishiguro, and N. Hagita, “Interac-
tive humanoid robots for a science museum,” Proc. 1st ACM
SIGCHI/SIGART Conference on Human-robot Interaction, HRI ’06,
pp.305–312, ACM, New York, NY, USA, 2006.
[27] K. Yamazaki, A. Yamazaki, M. Okada, Y. Kuno, Y. Kobayashi, Y.
Hoshi, K. Pitsch, P. Luff, D. vom Lehn, and C. Heath, “Revealing
gauguin: Engaging visitors in robot guide’s explanation in an art
museum,” Proc. SIGCHI Conf. on Human Factors in Computing
Systems, CHI ’09, pp.1437–1446, 2009.
[28] W. Burgard, A.B. Cremers, D. Fox, D. Hähnel, G. Lakemeyer, D.
Schulz, W. Steiner, and S. Thrun, “Experiences with an interactive
museum tour-guide robot,” Artificial Intelligence, vol.114, no.12,
pp.3–55, 1999.
[29] S. Thrun, M. Beetz, W. Bennewitz, M. Burgard, A. Cremers, F.
Dellaert, D. Fox, C. Haehnel, D. Rosenberg, N. Roy, J. Schulte,
and D. Schulz, “Probabilistic algorithms and the interactive museum
tour-guide robot Minerva,” The International Journal of Robotics
Research, vol.19, no.11, 2016.
[30] W. Burgard, A.B. Cremers, D. Fox, D. Hähnel, G. Lakemeyer, D.
Schulz, W. Steiner, and S. Thrun, “The interactive museum tour-
guide robot,” Proc. of the Fifteenth National Conference on Artificial
Intelligence (AAAI-98), 1998.
[31] M. Isard and A. Blake, “Condensation—conditional density propa-
gation for visual tracking,” International Journal of Computer Vision,
vol.29, no.1, pp.5–28, 1998.
[32] O.M. Lozano and K. Otsuka, “Real-time visual tracker by stream
processing,” J. Signal Process. Syst., vol.57, no.2, pp.285–295, 2009.
13. RASHED et al.: ROBUSTLY TRACKING PEOPLE WITH LIDARS IN A CROWDED MUSEUM FOR BEHAVIORAL ANALYSIS
2469
[33] “Scanning laser range finder uxm-30lx-ew specification),” Hokuyo
Automatic Co., LTD, 2014.
[34] A. Melton, “Some behavior charateristics of museum visitors,” Phy-
chological Bulletin, pp.720–721, 1933.
[35] A. Melton, “Visitor bevavior in museum: Some early research in
environmental design,” Human Factors, pp.393–403, 1972.
[36] E. Véron and M. Levasseur, Ethnographie de l’Exposition., Biblio-
thèque publique d’information, Paris, 1983.
[37] A. Bianchi and M. Zancanaro, “Tracking users’ movements in an
artistic physical space,” Proc. The i3 Annual Conference, pp.103–
106, Siena, Italy, Oct. 1999.
[38] F. Gabrielli, P. Marti, and L. Petroni, “The environment as interface,”
Proc. The i3 Annual Conference, pp.44–47, Siena, Italy, Oct. 1999.
[39] M. Zancanaro, T. Kuflik, Z. Boger, D. Goren-Bar, and D. Goldwasser,
“Analyzing museum visitors’ behavior patterns,” in User Modeling
2007, ed. C. Conati, K. McCoy, and G. Paliouras, Lecture Notes in
Computer Science, vol.4511, pp.238–246, Springer Berlin Heidel-
berg, 2007.
Md. Golam Rashed received the B.Sc.
and M.Sc. degrees in Information and Commu-
nication Engineering from the University of Ra-
jshahi, Rajshahi, Bangladesh, in 2006 and 2007,
respectively, and the Ph.D. degree in Science
and Engineering from Saitama University, Sai-
tama, Japan in 2016. After working at Prime
University, Dhaka, Bangladesh, he joined the In-
formation and Communication Engineering De-
partment, University of Rajshahi as a Lecturer
in 2012. He is currently an Assistant Professor
with the Department of Information and Communication Engineering, Uni-
versity of Rajshahi. His research interests include human behavior sensing,
robotics, and human–robot interaction.
Ryota Suzuki received the Ph.D. degree
from the Graduate School of Science and Engi-
neering, Saitama University, Saitama, Japan, in
2016. He is currently working in National Insti-
tute of Advanced Industrial Science and Technol-
ogy, Japan as a Postdoc. His research interests
include computer vision for human-robot inter-
action.
Takuya Yonezawa received the B.E. de-
gree in Information and Computer Sciences from
Saitama University, in 2016. Currently, he is a
M.E. student at the Graduate School of Science
and Engineering, Saitama University. His re-
search interests include computer vision for hu-
man sensing.
Antony Lam received the B.S. degree in
Computer Science at the California State Poly-
technic University, Pomona in 2004 and the
Ph.D. in Computer Science at the University of
California, Riverside in 2010. After working at
the National Institute of Informatics, Japan, he
joined the Graduate School of Science and Engi-
neering at Saitama University as an assistant pro-
fessor in 2014. His research interests are mainly
in computer vision with emphasis on the areas
of physics-based vision and pattern recognition.
Yoshinori Kobayashi received the Ph.D.
degree from the Graduate School of Information
Science and Technology, the University of To-
kyo, Tokyo, Japan, in 2007. He is currently an
Associate Professor with the Graduate School
of Science and Engineering, Saitama University,
Saitama, Japan. His research interests include
computer vision for human sensing and its ap-
plication to human–robot interaction.
Yoshinori Kuno received the B.S., M.S., and
Ph.D. degrees in electrical and electronics en-
gineering from the University of Tokyo, Tokyo,
Japan, in 1977, 1979, and 1982, respectively. Af-
ter working with Toshiba Corporation and Osaka
University, since 2000, he has been a Professor
in the Department of Information and Computer
Sciences, Saitama University, Saitama, Japan.
His research interests include computer vision
and human–robot interaction.
View publication statsView publication stats