SlideShare a Scribd company logo
1 of 4
Download to read offline
Robust Optical Eye Detection During Head Movement
                                                    Jeffrey B. Mulligan∗                                     Kevin N. Gabayan†
                                                 NASA Ames Research Center                                   Stanford University


Abstract                                                                                            be accomplished automatically. A number of techniques are avail-
                                                                                                    able for following the eyes using conventional imagery [Tong and
Finding the eye(s) in an image is a critical first step in a remote                                  Ji 2008]; in the following section we consider the special case of
gaze-tracking system with a working volume large enough to en-                                      optical pupil detection, and how it is affected by head motion.
compass head movements occurring during normal user behavior.
We briefly review an optical method which exploits the retroreflec-                                   2    Optical eye detection
tive properties of the eye, and present a novel method for combin-
ing difference images to reject motion artifacts. Best performance                                  The optical properties of the eye itself enable a unique approach to
is obtained when a curvature operator is used to enhance punctate                                   eye detection which exploits the fact that the eye is a natural retrore-
features, and search is restricted to a neighborhood about the last                                 flector [Ebisawa and Satoh 1993; Morimoto et al. 2000]. This is
known location. Optimal setting of the size of this neighborhood                                    commonly observed as the ”red eye” effect seen in flash photog-
is aided by a statistical model of naturally-occurring head move-                                   raphy, occurring when the light source is sufficiently close to the
ments; we present head-movement statistics mined from a corpus                                      camera lens that light reflected by its image on the retina returns and
of around 800 hours of video, collected in a team-performance ex-                                   enters the camera lens. By rapidly alternating on- and off-axis illu-
periment.                                                                                           mination, a pair of images can be obtained which are nearly iden-
                                                                                                    tical except for the pupil region; subtracting these images results in
Keywords: eye finding, pupil detection, active illumination, head                                    an image in which the pupils are prominent, greatly simplifying the
movement                                                                                            machine vision problem.
                                                                                                    There are a variety of ways in which the two illumination chan-
1      Introduction                                                                                 nels can be multiplexed. One early system [Grace et al. 2001] used
                                                                                                    a pair of cameras, a beam splitter and narrow-band filters to im-
To obtain high-accuracy, a video-based eye-tracker must acquire a                                   plement wavelength multiplexing, allowing simultaneous acquisi-
high-resolution image of the eye. For any given camera, the best                                    tion of bright- and dark-pupil images. A single-camera solution
image will be one in which the eye occupies the entire frame. For a                                 using wavelength multiplexing has been demonstrated [Fouquet
fixed (non-steerable) camera, however, the camera must maintain a                                    et al. 2004], using a custom sensor chip equipped with a checker-
constant position relative to the head, to prevent the eye from leav-                               board array of filters. While this approach is elegant, the sensors
ing the frame when the subject makes a head movement. Some sys-                                     are not commercially available, and so it is unavailable to most
tems solve this problem by attaching the camera(s) to the head; this                                would-be users. A more common approach that is amenable to a
provides good eye image quality, but requires the subject to wear                                   single-camera system is temporal multiplexing, in which the illu-
a head-mount, and requires tracking of the head in order to refer                                   minator channels are energized in alternation, synchronized with
gaze to objects in the world. Alternatively, a remote camera may be                                 the camera’s frame rate. Temporal multiplexing requires additional
used, with the subject’s head stabilized with a chin-rest or bite-bar.                              circuitry for dynamic control of the illuminators, and more compli-
Neither of these approaches provides the subject with a particularly                                cated software to correctly reject artifacts arising from head motion,
natural experience, however, and are inappropriate when the goal is                                 which we will explore in this section.
to covertly observe behavior ”in the wild.”
                                                                                                    We have previously incorporated a stereo pair of wide-field cam-
To maintain high accuracy while still allowing a range of natural                                   eras equipped with such an active illumination system into a multi-
movement, the image acquisition system must be able to follow the                                   camera platform for monitoring workstation use [Brolly and Mul-
eye as it moves in space. This can be accomplished by mechanical                                    ligan 2004]. Our system employs standard analog cameras out-
steering of the viewing axis (e.g., by using a pan-tilt-zoom camera,                                putting interlaced video, with field-rate alternation of illuminator
or steering the line-of-sight with galvonometer mirrors), or by mov-                                channels, as in Ebisawa and Satoh [1993]. This system works quite
ing a region-of-interest inset in a large, high-resolution image from                               well for eye acquisition when the head is still. Because the two illu-
a fixed camera with a wide field-of-view. Currently, commercial                                       minator channels are alternated in time, however, the system suffers
offerings are available using both approaches.                                                      from motion artifacts (see figure 1). One approach that has been
                                                                                                    employed to reduce the deleterious effects of head motion in this
Regardless of how repositioning the area of analysis is accom-                                      situation is to measure the motion in the field and shift one of the
plished, the system must determine where to reposition it. Addi-                                    images prior to computing the difference image [Ebisawa 2009].
tionally, when the initial position of the head is undefined, the eye(s)                             Here we present an alternative approach that does not depend on
must be located to begin tracking, and it is desirable for this to                                  measurement of image motion. The key to our method is the ob-
                                                                                                    servation that the motion artifacts differ in polarity for within-frame
    ∗ e-mail:   jeffrey.b.mulligan@nasa.gov                                                         and across-frame difference images.
    † e-mail:   kevingabayan@gmail.com
                                                                                                    Figure 1 shows a series of images obtained with this system during
                                                                                                    a lateral head movement. The top two rows show the raw images,
                                                                                                    with bright pupil illumination delivered during the odd fields, and
                                                                                                    dark pupil illumination delivered during the even fields. The third
                                                                                                    and fourth rows show the difference images: the third row shows
© 2010 Association for Computing Machinery. ACM acknowledges that this contribution                 the ”within frame” difference, in which the even field of the current
was authored by an employee, contractor or affiliate of the U.S. Government. As such, the
Government retains a nonexclusive, royalty-free right to publish or reproduce this article,
                                                                                                    frame is subtracted from the odd field of the current frame, while
or to allow others to do so, for Government purposes only.                                          the fourth row shows the ”across frame” differences, in which the
ETRA 2010, Austin, TX, March 22 – 24, 2010.
© 2010 ACM 978-1-60558-994-7/10/0003 $10.00

                                                                                              129
Figure 1: A series of images from a wide-field camera equipped with an active illumination system. Bright pupil illumination is provided
during odd fields, while dark pupil illumination is provided during even fields. Within-frame difference images are obtained by subtracting
the two fields from a single frame, while across-frame difference images are obtained by subtracting fields from different frames. These two
types of difference images show different types of motion artifacts, which may be in large part suppressed by taking their product, as shown
in the bottom row. To enhance visibility, all difference images have been inverted, blurred, and normalized.


second (even) field of the previous frame is subtracted from the first
(odd) field of the current frame. The difference images are clipped
from below at zero, in order to suppress regions where the dark                               wi    =    max(oi − ei , 0),                        (1)
pupil image is brighter than the bright pupil image. The difference                           ai    =    max(oi − ei−1 , 0),      and             (2)
images have been inverted and slightly blurred to enhance visibility                          ci    =    a i wi ,                                 (3)
in the present reproduction.
                                                                               where the all vector operations are understood to represent the cor-
To get the most up-to-date estimate of the pupil location, we should           responding pixel-wise operation on the images.
use the most recently acquired pair of bright- and dark-pupil im-              It can be seen in figure 1 that the combined difference image has a
ages. But, as illustrated in figure 1, the temporal ordering of                 strong signal in the eye regions, and has reduced the strength of the
the bright- and dark-pupil fields alternates, and so the polarity of            motion artifacts. The source of the eye signal in this case can be
motion-induced artifacts alternates as well. The half-wave rectifi-             quite different than in the case of a stationary head; when the head
cation performed after the differencing suppresses roughly half of             and eyes are still, the only difference between the two images will
the motion-induced artifacts; in figure 1 the head is moving to the             be the brightness of the pupil region - the glints (often the brightest
left, and so strong motion-induced artifacts are seen at the left edge         part of the image) will cancel. When the head and/or eye is in
of the head in the within-frame differences, and on the right edge             motion, on the other hand, the glints will not cancel each other; but
of the head in the across-frame differences. Because a particular              when our goal is to simply find the eye (as opposed to accurately
head-edge artifact is in general only present in one of the two types          segmenting the pupil region), then this is not a problem: because the
of difference image, we are able to eliminate most of them (and                glint in the bright pupil image can be expected to be brighter than
reduce the strength of the remainder relative to the eye signals) by           anything else with which it might be (mis)aligned, we are likely
pixel-wise multiplication the pair of difference images derived from           to get a strong signal in the difference image even when the pupil
a single bright-pupil image. We can express this more formally as              signal is degraded.
follows: let oi and ei represent the odd and even fields (respec-
tively) from frame i. We further let wi and ai represent the within-           We have performed a preliminary evaluation of the technique using
and across-frame differences, respectively, and let ci represent the           a video sequence captured while a subject executed voluntary, side-
combined difference image. Then                                                to-side head movements, ranging from slow to as fast as possible.


                                                                         130
Figure 2: Proportion of frames in which both eyes were correctly               Figure 3: Similar to figure 2, for four algorithms in which within-
localized as a function of average head speed, for four algorithms,            and across-frame difference images were combined as illustrated
based on: single inter-frame differences (inverted triangles); dif-            in figure 1: combined difference images (inverted triangles); com-
ference images with a restricted search region (upright triangles);            bined difference images with a restricted search area (upright trian-
Gaussian curvature of difference image (squares); and Gaussian                 gles); combined curvature images (squares); and combined curva-
curvature with a restricted search region (diamonds).                          ture images with a restricted search area (diamonds).


The imaging configuration was similar to that shown in figure 1,                 size of the search window, it is therefore useful to know something
with the head taking up roughly half the width of the frame (about             about the expected movements of the subjects. Such data are also
300 pixels), and an effective frame rate of 60 Hz after deinterlacing.         useful for determining parameters of a remote gaze tracking sys-
Difference images were computed, and blurred slightly. Our base-               tem, such as field-of-view and steering speed. To this end, we an-
line algorithm was to search for the first two local maxima in the              alyzed a large corpus of video, provided to us by a research group
blurred difference image. While this simple procedure performs                 studying team decision-making [Fischer et al. 2007]. In this study,
satisfactorily when the head is still, it breaks down as significant            5-member teams participated in a (simulated) tele-robotic search
head motion is encountered (see figure 2). A couple of simple en-               mission, by manning console stations in each of 5 separate exper-
hancements can improve performance somewhat: first we can re-                   imental chambers. Video footage of each subject’s work session
strict the search area for each eye to a small neighborhood centered           was recorded by a video camera mounted above their workstation,
on the previous position (in our example we used a radius of 7 pix-            with a horizontal field-of-view of about 40 degrees. Although the
els, larger than the highest head speed). Secondly, we can apply               subjects in the videos assume a variety of seated poses while at
a transformation to the difference image which accentuates punc-               work, their detected faces typically occupied between roughly one-
tate features while suppressing linear features; this is motivated by          quarter to one-third of a frame width (80 - 106 pixels). The corpus
the observation that many of the most severe motion artifacts occur            made available to us consisted of 258 separate recordings, compris-
around the edges of the face and are elongated. The transformation             ing a total of 793 hours (42.8 million images). The recordings had
we chose was Gaussian curvature [Zetzsche and Barth 1990; Barth                already been subjected to some preprocessing; as received by us,
et al. 1998], a combination of second derivatives which is zero for            each movie had a resolution of 320 x 240, at a temporal sampling
linear features but responds strongly to spots and corners. We can             rate of 15 frames per second.
see in figure 2 that, while application of the curvature operator does
produce a performance benefit, the greatest benefit comes from as-               To detect the location of the face in each frame, we used a boosted
suming the eyes are not far from their previous locations. This too            cascade of classifiers [Viola and Jones 2001]. We used the imple-
breaks down, however, at moderate head speeds.                                 mentation provided in the OpenCV library [Lienhart and Maydt
                                                                               2002]. Exactly 1 face was found in approximately 50.9% of the
Figure 3 shows similar results for various combinations of within-             frames analyzed. The recordings included periods that were not
and across-frame differences. Simple combination of difference                 part of the study; typically recordings began with an empty booth,
images (as illustrated in figure 1) produces a significant improve-              depicted a brief flurry of activity as the subject and experimenter
ment over single difference images, but still fails frequently. The            entered, and then a relatively long period with the subject seated at
within- and across-frame combination can be applied to curvature               the console. So, some fraction of the frames in which no faces were
images as well, and some improvement is seen. But as was the case              detected are indeed correct rejections. However, the face detector
for the simple differences, restriction of the search area produces            as implemented is trained to detect frontal faces, and fails when the
the largest single improvement; for our test sequence, we obtain               head is turned, presenting an oblique or profile view. Similarly, it
near-perfect performance for combination of curvatures of within-              fails when the subject tilts his or her head down (for example, to
and across-frame difference images, with a search area restricted to           read a clipboard held in the lap). We feel that these misses do not
a neighborhood about the previous location.                                    greatly diminish the value of the dataset, because in these cases a
                                                                               gaze tracking system would similarly be hard-put to make sense of
                                                                               the image.
3    Modeling natural head movements
                                                                               The outputs of the face detector are the location and size of a square
In the previous section, we saw that the eye regions can be tracked            window containing the face. From the raw data, we computed the
much more robustly when they are assumed to be relatively near                 inter-frame displacement of the center of the face-box, for all pairs
to their previous locations. In order to intelligently determine the           of temporally contiguous frames in which exactly one face was de-


                                                                         131
References
                                                                               BARTH , E., Z ETZSCHE , C., AND K RIEGER , G. 1998. Curvature
                                                                                 measures in visual information processing. Open Systems and
                                                                                 Information Dynamics 5, 25–39.
                                                                               B ROLLY, X. L. C., AND M ULLIGAN , J. B. 2004. Implicit calibra-
                                                                                  tion of a remote gaze tracker. In IEEE Conference on Computer
                                                                                  Vision and Pattern Recognition Workshop (CVPRW), vol. 8, 134.
                                                                               E BISAWA , Y., AND S ATOH , S. 1993. Effectiveness of pupil area
                                                                                  detection technique using two light sources and image difference
                                                                                  method. In Proceedings of the 15th Annual International Con-
                                                                                  ference of the IEEE Engineering in Medicine and Biology Soci-
                                                                                  ety, San Diego, CA, A. Y. J. Szeto and R. M. Rangayan, Eds.,
                                                                                  1268–1269.
                                                                               E BISAWA , Y. 2009. Robust pupil detection by image dif-
                                                                                  ference with positional compensation. In Proceedings of the
                                                                                  2009 IEEE International Conference on Virtual Environments,
                                                                                  Human-Computer Interfaces and Measurement Systems, 143–
                                                                                  148.
Figure 4: Plot showing histogram for inter-frame face displace-
ment, computed from a large corpus of video data using the Viola-              F ISCHER , U., M C D ONNEL , L., AND O RASANU , J. 2007. Lin-
Jones face detector, as implemented in OpenCV.                                    guistic correlates of team performance: toward a tool for moni-
                                                                                  toring team functioning during space missions. Aviation, Space,
                                                                                  and Environmental Medicine 78, 5 Suppl., B86–B95.
tected in each frame of the pair. Because the detector missed the              F OUQUET, J. E., H AVEN , R. E., V ENKATESH , S., AND W EN -
face on some frames when the subject was present, there are gaps                  STRAND , J. S. 2004. A new optical imaging approach for detect-
in the records, and the number of transitions available for analysis              ing drowsy drivers. In IEEE Intelligent Transportation Systems
is somewhat less than the number of frames. The number of tran-                   Conference, IEEE, WeA6.
sitions analyzed was approximately 47.1% of all transitions, corre-
sponding to 92.4% of the frames in which one face was detected.                G RACE , R., B YRNE , V. E., B IERMAN , D. M., L EGRAND , J.,
The results are shown as a histogram of displacements in figure 4.                 G RICOURT, D., DAVIS , B. K., S TASZEWSKI , J. J., AND C AR -
Face displacements in pixels have been converted to face-widths by                NAHAN , B. 2001. A drowsy driver detection system for heavy
dividing each displacement by the average of the box sizes in the                 vehicles. In Proceedings of the 17th Digital Avionics Systems
two frames making up the pair. Note that the vertical axis is loga-               Conference, vol. 2, I36/1–I36/8.
rithmic; the bulk of the displacements are at or near zero, reflecting
the fact that the subjects sat still more than they moved.                     L IENHART, R., AND M AYDT, J. 2002. An extended set of haar-like
                                                                                  features for rapid object detection. In IEEE ICIP 2002, 900–903.
It can be seen that the central portion of the distribution in figure 4
is within ±1 head-width per second. (Because the voluntary head                M ORIMOTO , C. H., KOONS , D., A MIR , A., AND F LICKNER , M.
                                                                                 2000. Pupil detection and tracking using multiple light sources.
movements shown in figures 2 and 3 do not exceed 2 head-widths
                                                                                 Image and Vision Computing 18, 4, 331–335.
per second, we assume that the tails of the distribution shown in
figure 4, which extend beyond the range plotted, represent large                T ONG , Y., AND J I , Q. 2008. Automatic eye position detection and
jumps of the face detector to false targets.) Assuming that an eye-               tracking under natural facial movement. In Passive Eye Moni-
width is about one fifth of a head-width, this corresponds to 5 eye-               toring, Springer, R. I. Hammoud, Ed., 83–107.
widths per second, or 0.083 eye-widths per field of 60 Hz video. For
a high-magnification image in which the eye fills a frame having an              V IOLA , P., AND J ONES , M. 2001. Rapid object detection using a
angular extent of 4 x 3 degrees (corresponding to a camera located                boosted cascade of simple features. In Proc. IEEE Conference on
at arm’s length from the subject), we see that this corresponds to an             Computer Vision and Pattern Recognition (CVPR), vol. 1, 511–
angular rate at the camera of 0.33 degrees per field, or 20 degrees                518.
per second. This provides a performance goal for robotic cameras               Z ETZSCHE , C., AND BARTH , E. 1990. Fundamental limits of
used for remote gaze tracking.                                                    linear filters in the visual processing of two-dimensional signals.
                                                                                  Vision Res. 30, 1111–1117.
4    Conclusion
In this paper we have demonstrated an novel approach to active il-
lumination eye detection, that combines pairs of difference images
to reduce the effects of head movements. We have also examined
the movement statistics of users working in a naturalistic setting at
a computer console, and found that the majority of movements do
not exceed a speed of 1 head-width per second. This is therefore a
sensible target goal for a head-tracking system designed to keep the
eye region within the area of analysis in a remote gaze-tracking sys-
tem. By considering the camera field-of-view, and average distance
from the subject, we can appropriately size the search region when
repositioning the region-of-interest in a high-resolution image.


                                                                         132

More Related Content

What's hot

Visionbf 1h-fin
Visionbf 1h-finVisionbf 1h-fin
Visionbf 1h-fin
MUBOSScz
 
173v1 wp prostate row electronic
173v1 wp prostate row electronic173v1 wp prostate row electronic
173v1 wp prostate row electronic
Nasos Papapostolou
 
Night vision
Night visionNight vision
Night vision
sowmimano
 
INTERACTIVE ANALYTICAL TOOL FOR CORNEAL CONFOCAL IMAGING
INTERACTIVE ANALYTICAL TOOL FOR CORNEAL CONFOCAL IMAGINGINTERACTIVE ANALYTICAL TOOL FOR CORNEAL CONFOCAL IMAGING
INTERACTIVE ANALYTICAL TOOL FOR CORNEAL CONFOCAL IMAGING
Madhavi Tippani
 
Guillet mottin 2005 spie_ 5964
Guillet mottin 2005 spie_ 5964Guillet mottin 2005 spie_ 5964
Guillet mottin 2005 spie_ 5964
Stéphane MOTTIN
 
Sj 공학심리학 Virtural Environments
Sj 공학심리학  Virtural EnvironmentsSj 공학심리학  Virtural Environments
Sj 공학심리학 Virtural Environments
Seojung Ko
 

What's hot (20)

Design of Superlens in the visible range using Metamaterials
Design of Superlens in the visible range using MetamaterialsDesign of Superlens in the visible range using Metamaterials
Design of Superlens in the visible range using Metamaterials
 
Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...
Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...
Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...
 
Visionbf 1h-fin
Visionbf 1h-finVisionbf 1h-fin
Visionbf 1h-fin
 
P-iris
P-irisP-iris
P-iris
 
Motion hot topics_broc
Motion hot topics_brocMotion hot topics_broc
Motion hot topics_broc
 
173v1 wp prostate row electronic
173v1 wp prostate row electronic173v1 wp prostate row electronic
173v1 wp prostate row electronic
 
Diffractive Optics Manufacturing
Diffractive Optics ManufacturingDiffractive Optics Manufacturing
Diffractive Optics Manufacturing
 
Droege Pupil Center Detection In Low Resolution Images
Droege Pupil Center Detection In Low Resolution ImagesDroege Pupil Center Detection In Low Resolution Images
Droege Pupil Center Detection In Low Resolution Images
 
Image Fusion of Video Images and Geo-localization for UAV Applications
Image Fusion of Video Images and Geo-localization for UAV ApplicationsImage Fusion of Video Images and Geo-localization for UAV Applications
Image Fusion of Video Images and Geo-localization for UAV Applications
 
Purkinje imaging for crystalline lens density measurement
Purkinje imaging for crystalline lens density measurementPurkinje imaging for crystalline lens density measurement
Purkinje imaging for crystalline lens density measurement
 
Night vision
Night visionNight vision
Night vision
 
INTERACTIVE ANALYTICAL TOOL FOR CORNEAL CONFOCAL IMAGING
INTERACTIVE ANALYTICAL TOOL FOR CORNEAL CONFOCAL IMAGINGINTERACTIVE ANALYTICAL TOOL FOR CORNEAL CONFOCAL IMAGING
INTERACTIVE ANALYTICAL TOOL FOR CORNEAL CONFOCAL IMAGING
 
Guillet mottin 2005 spie_ 5964
Guillet mottin 2005 spie_ 5964Guillet mottin 2005 spie_ 5964
Guillet mottin 2005 spie_ 5964
 
New refractive optical MIOL concepts _ 
Comparison of different optical systems
New refractive optical MIOL concepts _ 
Comparison of different optical systemsNew refractive optical MIOL concepts _ 
Comparison of different optical systems
New refractive optical MIOL concepts _ 
Comparison of different optical systems
 
Principle of presbyopia correcting iols
Principle of presbyopia correcting iols Principle of presbyopia correcting iols
Principle of presbyopia correcting iols
 
MIOL Treatment – Quo Vadis?
MIOL Treatment – Quo Vadis?MIOL Treatment – Quo Vadis?
MIOL Treatment – Quo Vadis?
 
Sj 공학심리학 Virtural Environments
Sj 공학심리학  Virtural EnvironmentsSj 공학심리학  Virtural Environments
Sj 공학심리학 Virtural Environments
 
Sd optics introduction
Sd optics introductionSd optics introduction
Sd optics introduction
 
A maskless exposure device for rapid photolithographic prototyping of sensor ...
A maskless exposure device for rapid photolithographic prototyping of sensor ...A maskless exposure device for rapid photolithographic prototyping of sensor ...
A maskless exposure device for rapid photolithographic prototyping of sensor ...
 
LLTECH SKIN ATLAS 2012
LLTECH SKIN ATLAS 2012LLTECH SKIN ATLAS 2012
LLTECH SKIN ATLAS 2012
 

Viewers also liked

Acta agrupamento hostelería vigo (emenda)
Acta agrupamento hostelería vigo (emenda)Acta agrupamento hostelería vigo (emenda)
Acta agrupamento hostelería vigo (emenda)
oscargaliza
 
Zena acrta _cnc_30092011
Zena acrta _cnc_30092011Zena acrta _cnc_30092011
Zena acrta _cnc_30092011
oscargaliza
 
תכירו את שולה הישנה והחדשה
תכירו את שולה הישנה והחדשהתכירו את שולה הישנה והחדשה
תכירו את שולה הישנה והחדשה
haimkarel
 
C++ Efficient medicine transfer
C++ Efficient medicine transfer C++ Efficient medicine transfer
C++ Efficient medicine transfer
cheeyuan
 
An impugnacion convenio 2011 2012
An impugnacion convenio 2011 2012An impugnacion convenio 2011 2012
An impugnacion convenio 2011 2012
oscargaliza
 

Viewers also liked (20)

TEMA 3B SER and ESTAR
TEMA 3B SER and ESTARTEMA 3B SER and ESTAR
TEMA 3B SER and ESTAR
 
Madness
MadnessMadness
Madness
 
Acta agrupamento hostelería vigo (emenda)
Acta agrupamento hostelería vigo (emenda)Acta agrupamento hostelería vigo (emenda)
Acta agrupamento hostelería vigo (emenda)
 
Zena acrta _cnc_30092011
Zena acrta _cnc_30092011Zena acrta _cnc_30092011
Zena acrta _cnc_30092011
 
ลักษณะภูมิประเทศ2.1
ลักษณะภูมิประเทศ2.1ลักษณะภูมิประเทศ2.1
ลักษณะภูมิประเทศ2.1
 
Don't Trust, And Verify - Mobile Application Attacks
Don't Trust, And Verify - Mobile Application AttacksDon't Trust, And Verify - Mobile Application Attacks
Don't Trust, And Verify - Mobile Application Attacks
 
Kahvakuulaharjoittelun Perusteet 2010
Kahvakuulaharjoittelun Perusteet 2010Kahvakuulaharjoittelun Perusteet 2010
Kahvakuulaharjoittelun Perusteet 2010
 
ศาสนาอิสลาม 402
ศาสนาอิสลาม 402ศาสนาอิสลาม 402
ศาสนาอิสลาม 402
 
Exchange server
Exchange   serverExchange   server
Exchange server
 
ZFConf 2012: Проектирование архитектуры, внедрение и организация процесса раз...
ZFConf 2012: Проектирование архитектуры, внедрение и организация процесса раз...ZFConf 2012: Проектирование архитектуры, внедрение и организация процесса раз...
ZFConf 2012: Проектирование архитектуры, внедрение и организация процесса раз...
 
1
11
1
 
TEMA 5B Vocabulario
TEMA 5B VocabularioTEMA 5B Vocabulario
TEMA 5B Vocabulario
 
Pajaros
PajarosPajaros
Pajaros
 
תכירו את שולה הישנה והחדשה
תכירו את שולה הישנה והחדשהתכירו את שולה הישנה והחדשה
תכירו את שולה הישנה והחדשה
 
Eidea_SEMCOM
Eidea_SEMCOMEidea_SEMCOM
Eidea_SEMCOM
 
C++ Efficient medicine transfer
C++ Efficient medicine transfer C++ Efficient medicine transfer
C++ Efficient medicine transfer
 
TEMA 5A Tener
TEMA 5A TenerTEMA 5A Tener
TEMA 5A Tener
 
An impugnacion convenio 2011 2012
An impugnacion convenio 2011 2012An impugnacion convenio 2011 2012
An impugnacion convenio 2011 2012
 
ภูมิศาสตร์ของทวีปยุโรป 2.4
ภูมิศาสตร์ของทวีปยุโรป 2.4ภูมิศาสตร์ของทวีปยุโรป 2.4
ภูมิศาสตร์ของทวีปยุโรป 2.4
 
Windows xp services
Windows xp servicesWindows xp services
Windows xp services
 

Similar to Mulligan Robust Optical Eye Detection During Head Movement

Eye Gaze Tracking With a Web Camera in a Desktop Environment
Eye Gaze Tracking With a Web Camera in a Desktop EnvironmentEye Gaze Tracking With a Web Camera in a Desktop Environment
Eye Gaze Tracking With a Web Camera in a Desktop Environment
1crore projects
 
26.motion and feature based person tracking
26.motion and feature based person tracking26.motion and feature based person tracking
26.motion and feature based person tracking
sajit1975
 
Iaetsd deblurring of noisy or blurred
Iaetsd deblurring of noisy or blurredIaetsd deblurring of noisy or blurred
Iaetsd deblurring of noisy or blurred
Iaetsd Iaetsd
 
Effective Object Detection and Background Subtraction by using M.O.I
Effective Object Detection and Background Subtraction by using M.O.IEffective Object Detection and Background Subtraction by using M.O.I
Effective Object Detection and Background Subtraction by using M.O.I
IJMTST Journal
 

Similar to Mulligan Robust Optical Eye Detection During Head Movement (20)

Ryan Match Moving For Area Based Analysis Of Eye Movements In Natural Tasks
Ryan Match Moving For Area Based Analysis Of Eye Movements In Natural TasksRyan Match Moving For Area Based Analysis Of Eye Movements In Natural Tasks
Ryan Match Moving For Area Based Analysis Of Eye Movements In Natural Tasks
 
Eye Gaze Tracking With a Web Camera in a Desktop Environment
Eye Gaze Tracking With a Web Camera in a Desktop EnvironmentEye Gaze Tracking With a Web Camera in a Desktop Environment
Eye Gaze Tracking With a Web Camera in a Desktop Environment
 
26.motion and feature based person tracking
26.motion and feature based person tracking26.motion and feature based person tracking
26.motion and feature based person tracking
 
Eye gaze tracking with a web camera
Eye gaze tracking with a web cameraEye gaze tracking with a web camera
Eye gaze tracking with a web camera
 
AR Glasses with Gaze-locked projection
AR Glasses with Gaze-locked projection AR Glasses with Gaze-locked projection
AR Glasses with Gaze-locked projection
 
Model User Calibration Free Remote Gaze Estimation System
Model User Calibration Free Remote Gaze Estimation SystemModel User Calibration Free Remote Gaze Estimation System
Model User Calibration Free Remote Gaze Estimation System
 
Deep vo and slam iii
Deep vo and slam iiiDeep vo and slam iii
Deep vo and slam iii
 
IRJET-Motion Segmentation
IRJET-Motion SegmentationIRJET-Motion Segmentation
IRJET-Motion Segmentation
 
ICCE 2016 NCTU Courses Papers
ICCE 2016 NCTU Courses PapersICCE 2016 NCTU Courses Papers
ICCE 2016 NCTU Courses Papers
 
G01114650
G01114650G01114650
G01114650
 
A Literature Review on Iris Segmentation Techniques for Iris Recognition Systems
A Literature Review on Iris Segmentation Techniques for Iris Recognition SystemsA Literature Review on Iris Segmentation Techniques for Iris Recognition Systems
A Literature Review on Iris Segmentation Techniques for Iris Recognition Systems
 
Automatic identification of animal using visual and motion saliency
Automatic identification of animal using visual and motion saliencyAutomatic identification of animal using visual and motion saliency
Automatic identification of animal using visual and motion saliency
 
Nagamatsu User Calibration Free Gaze Tracking With Estimation Of The Horizont...
Nagamatsu User Calibration Free Gaze Tracking With Estimation Of The Horizont...Nagamatsu User Calibration Free Gaze Tracking With Estimation Of The Horizont...
Nagamatsu User Calibration Free Gaze Tracking With Estimation Of The Horizont...
 
Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)
 
An Object Detection, Tracking And Parametric Classification– A Review
An Object Detection, Tracking And Parametric Classification– A ReviewAn Object Detection, Tracking And Parametric Classification– A Review
An Object Detection, Tracking And Parametric Classification– A Review
 
Iaetsd deblurring of noisy or blurred
Iaetsd deblurring of noisy or blurredIaetsd deblurring of noisy or blurred
Iaetsd deblurring of noisy or blurred
 
Effective Object Detection and Background Subtraction by using M.O.I
Effective Object Detection and Background Subtraction by using M.O.IEffective Object Detection and Background Subtraction by using M.O.I
Effective Object Detection and Background Subtraction by using M.O.I
 
Face recognition across non uniform motion blur, illumination, and pose
Face recognition across non uniform motion blur, illumination, and poseFace recognition across non uniform motion blur, illumination, and pose
Face recognition across non uniform motion blur, illumination, and pose
 
Gait analysis report
Gait analysis reportGait analysis report
Gait analysis report
 
X36141145
X36141145X36141145
X36141145
 

More from Kalle

More from Kalle (20)

Blignaut Visual Span And Other Parameters For The Generation Of Heatmaps
Blignaut Visual Span And Other Parameters For The Generation Of HeatmapsBlignaut Visual Span And Other Parameters For The Generation Of Heatmaps
Blignaut Visual Span And Other Parameters For The Generation Of Heatmaps
 
Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...
Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...
Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...
 
Yamamoto Development Of Eye Tracking Pen Display Based On Stereo Bright Pupil...
Yamamoto Development Of Eye Tracking Pen Display Based On Stereo Bright Pupil...Yamamoto Development Of Eye Tracking Pen Display Based On Stereo Bright Pupil...
Yamamoto Development Of Eye Tracking Pen Display Based On Stereo Bright Pupil...
 
Wastlund What You See Is Where You Go Testing A Gaze Driven Power Wheelchair ...
Wastlund What You See Is Where You Go Testing A Gaze Driven Power Wheelchair ...Wastlund What You See Is Where You Go Testing A Gaze Driven Power Wheelchair ...
Wastlund What You See Is Where You Go Testing A Gaze Driven Power Wheelchair ...
 
Vinnikov Contingency Evaluation Of Gaze Contingent Displays For Real Time Vis...
Vinnikov Contingency Evaluation Of Gaze Contingent Displays For Real Time Vis...Vinnikov Contingency Evaluation Of Gaze Contingent Displays For Real Time Vis...
Vinnikov Contingency Evaluation Of Gaze Contingent Displays For Real Time Vis...
 
Urbina Pies With Ey Es The Limits Of Hierarchical Pie Menus In Gaze Control
Urbina Pies With Ey Es The Limits Of Hierarchical Pie Menus In Gaze ControlUrbina Pies With Ey Es The Limits Of Hierarchical Pie Menus In Gaze Control
Urbina Pies With Ey Es The Limits Of Hierarchical Pie Menus In Gaze Control
 
Urbina Alternatives To Single Character Entry And Dwell Time Selection On Eye...
Urbina Alternatives To Single Character Entry And Dwell Time Selection On Eye...Urbina Alternatives To Single Character Entry And Dwell Time Selection On Eye...
Urbina Alternatives To Single Character Entry And Dwell Time Selection On Eye...
 
Tien Measuring Situation Awareness Of Surgeons In Laparoscopic Training
Tien Measuring Situation Awareness Of Surgeons In Laparoscopic TrainingTien Measuring Situation Awareness Of Surgeons In Laparoscopic Training
Tien Measuring Situation Awareness Of Surgeons In Laparoscopic Training
 
Stellmach Advanced Gaze Visualizations For Three Dimensional Virtual Environm...
Stellmach Advanced Gaze Visualizations For Three Dimensional Virtual Environm...Stellmach Advanced Gaze Visualizations For Three Dimensional Virtual Environm...
Stellmach Advanced Gaze Visualizations For Three Dimensional Virtual Environm...
 
Skovsgaard Small Target Selection With Gaze Alone
Skovsgaard Small Target Selection With Gaze AloneSkovsgaard Small Target Selection With Gaze Alone
Skovsgaard Small Target Selection With Gaze Alone
 
San Agustin Evaluation Of A Low Cost Open Source Gaze Tracker
San Agustin Evaluation Of A Low Cost Open Source Gaze TrackerSan Agustin Evaluation Of A Low Cost Open Source Gaze Tracker
San Agustin Evaluation Of A Low Cost Open Source Gaze Tracker
 
Rosengrant Gaze Scribing In Physics Problem Solving
Rosengrant Gaze Scribing In Physics Problem SolvingRosengrant Gaze Scribing In Physics Problem Solving
Rosengrant Gaze Scribing In Physics Problem Solving
 
Qvarfordt Understanding The Benefits Of Gaze Enhanced Visual Search
Qvarfordt Understanding The Benefits Of Gaze Enhanced Visual SearchQvarfordt Understanding The Benefits Of Gaze Enhanced Visual Search
Qvarfordt Understanding The Benefits Of Gaze Enhanced Visual Search
 
Prats Interpretation Of Geometric Shapes An Eye Movement Study
Prats Interpretation Of Geometric Shapes An Eye Movement StudyPrats Interpretation Of Geometric Shapes An Eye Movement Study
Prats Interpretation Of Geometric Shapes An Eye Movement Study
 
Porta Ce Cursor A Contextual Eye Cursor For General Pointing In Windows Envir...
Porta Ce Cursor A Contextual Eye Cursor For General Pointing In Windows Envir...Porta Ce Cursor A Contextual Eye Cursor For General Pointing In Windows Envir...
Porta Ce Cursor A Contextual Eye Cursor For General Pointing In Windows Envir...
 
Pontillo Semanti Code Using Content Similarity And Database Driven Matching T...
Pontillo Semanti Code Using Content Similarity And Database Driven Matching T...Pontillo Semanti Code Using Content Similarity And Database Driven Matching T...
Pontillo Semanti Code Using Content Similarity And Database Driven Matching T...
 
Park Quantification Of Aesthetic Viewing Using Eye Tracking Technology The In...
Park Quantification Of Aesthetic Viewing Using Eye Tracking Technology The In...Park Quantification Of Aesthetic Viewing Using Eye Tracking Technology The In...
Park Quantification Of Aesthetic Viewing Using Eye Tracking Technology The In...
 
Palinko Estimating Cognitive Load Using Remote Eye Tracking In A Driving Simu...
Palinko Estimating Cognitive Load Using Remote Eye Tracking In A Driving Simu...Palinko Estimating Cognitive Load Using Remote Eye Tracking In A Driving Simu...
Palinko Estimating Cognitive Load Using Remote Eye Tracking In A Driving Simu...
 
Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks...
Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks...Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks...
Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks...
 
Nagamatsu Gaze Estimation Method Based On An Aspherical Model Of The Cornea S...
Nagamatsu Gaze Estimation Method Based On An Aspherical Model Of The Cornea S...Nagamatsu Gaze Estimation Method Based On An Aspherical Model Of The Cornea S...
Nagamatsu Gaze Estimation Method Based On An Aspherical Model Of The Cornea S...
 

Mulligan Robust Optical Eye Detection During Head Movement

  • 1. Robust Optical Eye Detection During Head Movement Jeffrey B. Mulligan∗ Kevin N. Gabayan† NASA Ames Research Center Stanford University Abstract be accomplished automatically. A number of techniques are avail- able for following the eyes using conventional imagery [Tong and Finding the eye(s) in an image is a critical first step in a remote Ji 2008]; in the following section we consider the special case of gaze-tracking system with a working volume large enough to en- optical pupil detection, and how it is affected by head motion. compass head movements occurring during normal user behavior. We briefly review an optical method which exploits the retroreflec- 2 Optical eye detection tive properties of the eye, and present a novel method for combin- ing difference images to reject motion artifacts. Best performance The optical properties of the eye itself enable a unique approach to is obtained when a curvature operator is used to enhance punctate eye detection which exploits the fact that the eye is a natural retrore- features, and search is restricted to a neighborhood about the last flector [Ebisawa and Satoh 1993; Morimoto et al. 2000]. This is known location. Optimal setting of the size of this neighborhood commonly observed as the ”red eye” effect seen in flash photog- is aided by a statistical model of naturally-occurring head move- raphy, occurring when the light source is sufficiently close to the ments; we present head-movement statistics mined from a corpus camera lens that light reflected by its image on the retina returns and of around 800 hours of video, collected in a team-performance ex- enters the camera lens. By rapidly alternating on- and off-axis illu- periment. mination, a pair of images can be obtained which are nearly iden- tical except for the pupil region; subtracting these images results in Keywords: eye finding, pupil detection, active illumination, head an image in which the pupils are prominent, greatly simplifying the movement machine vision problem. There are a variety of ways in which the two illumination chan- 1 Introduction nels can be multiplexed. One early system [Grace et al. 2001] used a pair of cameras, a beam splitter and narrow-band filters to im- To obtain high-accuracy, a video-based eye-tracker must acquire a plement wavelength multiplexing, allowing simultaneous acquisi- high-resolution image of the eye. For any given camera, the best tion of bright- and dark-pupil images. A single-camera solution image will be one in which the eye occupies the entire frame. For a using wavelength multiplexing has been demonstrated [Fouquet fixed (non-steerable) camera, however, the camera must maintain a et al. 2004], using a custom sensor chip equipped with a checker- constant position relative to the head, to prevent the eye from leav- board array of filters. While this approach is elegant, the sensors ing the frame when the subject makes a head movement. Some sys- are not commercially available, and so it is unavailable to most tems solve this problem by attaching the camera(s) to the head; this would-be users. A more common approach that is amenable to a provides good eye image quality, but requires the subject to wear single-camera system is temporal multiplexing, in which the illu- a head-mount, and requires tracking of the head in order to refer minator channels are energized in alternation, synchronized with gaze to objects in the world. Alternatively, a remote camera may be the camera’s frame rate. Temporal multiplexing requires additional used, with the subject’s head stabilized with a chin-rest or bite-bar. circuitry for dynamic control of the illuminators, and more compli- Neither of these approaches provides the subject with a particularly cated software to correctly reject artifacts arising from head motion, natural experience, however, and are inappropriate when the goal is which we will explore in this section. to covertly observe behavior ”in the wild.” We have previously incorporated a stereo pair of wide-field cam- To maintain high accuracy while still allowing a range of natural eras equipped with such an active illumination system into a multi- movement, the image acquisition system must be able to follow the camera platform for monitoring workstation use [Brolly and Mul- eye as it moves in space. This can be accomplished by mechanical ligan 2004]. Our system employs standard analog cameras out- steering of the viewing axis (e.g., by using a pan-tilt-zoom camera, putting interlaced video, with field-rate alternation of illuminator or steering the line-of-sight with galvonometer mirrors), or by mov- channels, as in Ebisawa and Satoh [1993]. This system works quite ing a region-of-interest inset in a large, high-resolution image from well for eye acquisition when the head is still. Because the two illu- a fixed camera with a wide field-of-view. Currently, commercial minator channels are alternated in time, however, the system suffers offerings are available using both approaches. from motion artifacts (see figure 1). One approach that has been employed to reduce the deleterious effects of head motion in this Regardless of how repositioning the area of analysis is accom- situation is to measure the motion in the field and shift one of the plished, the system must determine where to reposition it. Addi- images prior to computing the difference image [Ebisawa 2009]. tionally, when the initial position of the head is undefined, the eye(s) Here we present an alternative approach that does not depend on must be located to begin tracking, and it is desirable for this to measurement of image motion. The key to our method is the ob- servation that the motion artifacts differ in polarity for within-frame ∗ e-mail: jeffrey.b.mulligan@nasa.gov and across-frame difference images. † e-mail: kevingabayan@gmail.com Figure 1 shows a series of images obtained with this system during a lateral head movement. The top two rows show the raw images, with bright pupil illumination delivered during the odd fields, and dark pupil illumination delivered during the even fields. The third and fourth rows show the difference images: the third row shows © 2010 Association for Computing Machinery. ACM acknowledges that this contribution the ”within frame” difference, in which the even field of the current was authored by an employee, contractor or affiliate of the U.S. Government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, frame is subtracted from the odd field of the current frame, while or to allow others to do so, for Government purposes only. the fourth row shows the ”across frame” differences, in which the ETRA 2010, Austin, TX, March 22 – 24, 2010. © 2010 ACM 978-1-60558-994-7/10/0003 $10.00 129
  • 2. Figure 1: A series of images from a wide-field camera equipped with an active illumination system. Bright pupil illumination is provided during odd fields, while dark pupil illumination is provided during even fields. Within-frame difference images are obtained by subtracting the two fields from a single frame, while across-frame difference images are obtained by subtracting fields from different frames. These two types of difference images show different types of motion artifacts, which may be in large part suppressed by taking their product, as shown in the bottom row. To enhance visibility, all difference images have been inverted, blurred, and normalized. second (even) field of the previous frame is subtracted from the first (odd) field of the current frame. The difference images are clipped from below at zero, in order to suppress regions where the dark wi = max(oi − ei , 0), (1) pupil image is brighter than the bright pupil image. The difference ai = max(oi − ei−1 , 0), and (2) images have been inverted and slightly blurred to enhance visibility ci = a i wi , (3) in the present reproduction. where the all vector operations are understood to represent the cor- To get the most up-to-date estimate of the pupil location, we should responding pixel-wise operation on the images. use the most recently acquired pair of bright- and dark-pupil im- It can be seen in figure 1 that the combined difference image has a ages. But, as illustrated in figure 1, the temporal ordering of strong signal in the eye regions, and has reduced the strength of the the bright- and dark-pupil fields alternates, and so the polarity of motion artifacts. The source of the eye signal in this case can be motion-induced artifacts alternates as well. The half-wave rectifi- quite different than in the case of a stationary head; when the head cation performed after the differencing suppresses roughly half of and eyes are still, the only difference between the two images will the motion-induced artifacts; in figure 1 the head is moving to the be the brightness of the pupil region - the glints (often the brightest left, and so strong motion-induced artifacts are seen at the left edge part of the image) will cancel. When the head and/or eye is in of the head in the within-frame differences, and on the right edge motion, on the other hand, the glints will not cancel each other; but of the head in the across-frame differences. Because a particular when our goal is to simply find the eye (as opposed to accurately head-edge artifact is in general only present in one of the two types segmenting the pupil region), then this is not a problem: because the of difference image, we are able to eliminate most of them (and glint in the bright pupil image can be expected to be brighter than reduce the strength of the remainder relative to the eye signals) by anything else with which it might be (mis)aligned, we are likely pixel-wise multiplication the pair of difference images derived from to get a strong signal in the difference image even when the pupil a single bright-pupil image. We can express this more formally as signal is degraded. follows: let oi and ei represent the odd and even fields (respec- tively) from frame i. We further let wi and ai represent the within- We have performed a preliminary evaluation of the technique using and across-frame differences, respectively, and let ci represent the a video sequence captured while a subject executed voluntary, side- combined difference image. Then to-side head movements, ranging from slow to as fast as possible. 130
  • 3. Figure 2: Proportion of frames in which both eyes were correctly Figure 3: Similar to figure 2, for four algorithms in which within- localized as a function of average head speed, for four algorithms, and across-frame difference images were combined as illustrated based on: single inter-frame differences (inverted triangles); dif- in figure 1: combined difference images (inverted triangles); com- ference images with a restricted search region (upright triangles); bined difference images with a restricted search area (upright trian- Gaussian curvature of difference image (squares); and Gaussian gles); combined curvature images (squares); and combined curva- curvature with a restricted search region (diamonds). ture images with a restricted search area (diamonds). The imaging configuration was similar to that shown in figure 1, size of the search window, it is therefore useful to know something with the head taking up roughly half the width of the frame (about about the expected movements of the subjects. Such data are also 300 pixels), and an effective frame rate of 60 Hz after deinterlacing. useful for determining parameters of a remote gaze tracking sys- Difference images were computed, and blurred slightly. Our base- tem, such as field-of-view and steering speed. To this end, we an- line algorithm was to search for the first two local maxima in the alyzed a large corpus of video, provided to us by a research group blurred difference image. While this simple procedure performs studying team decision-making [Fischer et al. 2007]. In this study, satisfactorily when the head is still, it breaks down as significant 5-member teams participated in a (simulated) tele-robotic search head motion is encountered (see figure 2). A couple of simple en- mission, by manning console stations in each of 5 separate exper- hancements can improve performance somewhat: first we can re- imental chambers. Video footage of each subject’s work session strict the search area for each eye to a small neighborhood centered was recorded by a video camera mounted above their workstation, on the previous position (in our example we used a radius of 7 pix- with a horizontal field-of-view of about 40 degrees. Although the els, larger than the highest head speed). Secondly, we can apply subjects in the videos assume a variety of seated poses while at a transformation to the difference image which accentuates punc- work, their detected faces typically occupied between roughly one- tate features while suppressing linear features; this is motivated by quarter to one-third of a frame width (80 - 106 pixels). The corpus the observation that many of the most severe motion artifacts occur made available to us consisted of 258 separate recordings, compris- around the edges of the face and are elongated. The transformation ing a total of 793 hours (42.8 million images). The recordings had we chose was Gaussian curvature [Zetzsche and Barth 1990; Barth already been subjected to some preprocessing; as received by us, et al. 1998], a combination of second derivatives which is zero for each movie had a resolution of 320 x 240, at a temporal sampling linear features but responds strongly to spots and corners. We can rate of 15 frames per second. see in figure 2 that, while application of the curvature operator does produce a performance benefit, the greatest benefit comes from as- To detect the location of the face in each frame, we used a boosted suming the eyes are not far from their previous locations. This too cascade of classifiers [Viola and Jones 2001]. We used the imple- breaks down, however, at moderate head speeds. mentation provided in the OpenCV library [Lienhart and Maydt 2002]. Exactly 1 face was found in approximately 50.9% of the Figure 3 shows similar results for various combinations of within- frames analyzed. The recordings included periods that were not and across-frame differences. Simple combination of difference part of the study; typically recordings began with an empty booth, images (as illustrated in figure 1) produces a significant improve- depicted a brief flurry of activity as the subject and experimenter ment over single difference images, but still fails frequently. The entered, and then a relatively long period with the subject seated at within- and across-frame combination can be applied to curvature the console. So, some fraction of the frames in which no faces were images as well, and some improvement is seen. But as was the case detected are indeed correct rejections. However, the face detector for the simple differences, restriction of the search area produces as implemented is trained to detect frontal faces, and fails when the the largest single improvement; for our test sequence, we obtain head is turned, presenting an oblique or profile view. Similarly, it near-perfect performance for combination of curvatures of within- fails when the subject tilts his or her head down (for example, to and across-frame difference images, with a search area restricted to read a clipboard held in the lap). We feel that these misses do not a neighborhood about the previous location. greatly diminish the value of the dataset, because in these cases a gaze tracking system would similarly be hard-put to make sense of the image. 3 Modeling natural head movements The outputs of the face detector are the location and size of a square In the previous section, we saw that the eye regions can be tracked window containing the face. From the raw data, we computed the much more robustly when they are assumed to be relatively near inter-frame displacement of the center of the face-box, for all pairs to their previous locations. In order to intelligently determine the of temporally contiguous frames in which exactly one face was de- 131
  • 4. References BARTH , E., Z ETZSCHE , C., AND K RIEGER , G. 1998. Curvature measures in visual information processing. Open Systems and Information Dynamics 5, 25–39. B ROLLY, X. L. C., AND M ULLIGAN , J. B. 2004. Implicit calibra- tion of a remote gaze tracker. In IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), vol. 8, 134. E BISAWA , Y., AND S ATOH , S. 1993. Effectiveness of pupil area detection technique using two light sources and image difference method. In Proceedings of the 15th Annual International Con- ference of the IEEE Engineering in Medicine and Biology Soci- ety, San Diego, CA, A. Y. J. Szeto and R. M. Rangayan, Eds., 1268–1269. E BISAWA , Y. 2009. Robust pupil detection by image dif- ference with positional compensation. In Proceedings of the 2009 IEEE International Conference on Virtual Environments, Human-Computer Interfaces and Measurement Systems, 143– 148. Figure 4: Plot showing histogram for inter-frame face displace- ment, computed from a large corpus of video data using the Viola- F ISCHER , U., M C D ONNEL , L., AND O RASANU , J. 2007. Lin- Jones face detector, as implemented in OpenCV. guistic correlates of team performance: toward a tool for moni- toring team functioning during space missions. Aviation, Space, and Environmental Medicine 78, 5 Suppl., B86–B95. tected in each frame of the pair. Because the detector missed the F OUQUET, J. E., H AVEN , R. E., V ENKATESH , S., AND W EN - face on some frames when the subject was present, there are gaps STRAND , J. S. 2004. A new optical imaging approach for detect- in the records, and the number of transitions available for analysis ing drowsy drivers. In IEEE Intelligent Transportation Systems is somewhat less than the number of frames. The number of tran- Conference, IEEE, WeA6. sitions analyzed was approximately 47.1% of all transitions, corre- sponding to 92.4% of the frames in which one face was detected. G RACE , R., B YRNE , V. E., B IERMAN , D. M., L EGRAND , J., The results are shown as a histogram of displacements in figure 4. G RICOURT, D., DAVIS , B. K., S TASZEWSKI , J. J., AND C AR - Face displacements in pixels have been converted to face-widths by NAHAN , B. 2001. A drowsy driver detection system for heavy dividing each displacement by the average of the box sizes in the vehicles. In Proceedings of the 17th Digital Avionics Systems two frames making up the pair. Note that the vertical axis is loga- Conference, vol. 2, I36/1–I36/8. rithmic; the bulk of the displacements are at or near zero, reflecting the fact that the subjects sat still more than they moved. L IENHART, R., AND M AYDT, J. 2002. An extended set of haar-like features for rapid object detection. In IEEE ICIP 2002, 900–903. It can be seen that the central portion of the distribution in figure 4 is within ±1 head-width per second. (Because the voluntary head M ORIMOTO , C. H., KOONS , D., A MIR , A., AND F LICKNER , M. 2000. Pupil detection and tracking using multiple light sources. movements shown in figures 2 and 3 do not exceed 2 head-widths Image and Vision Computing 18, 4, 331–335. per second, we assume that the tails of the distribution shown in figure 4, which extend beyond the range plotted, represent large T ONG , Y., AND J I , Q. 2008. Automatic eye position detection and jumps of the face detector to false targets.) Assuming that an eye- tracking under natural facial movement. In Passive Eye Moni- width is about one fifth of a head-width, this corresponds to 5 eye- toring, Springer, R. I. Hammoud, Ed., 83–107. widths per second, or 0.083 eye-widths per field of 60 Hz video. For a high-magnification image in which the eye fills a frame having an V IOLA , P., AND J ONES , M. 2001. Rapid object detection using a angular extent of 4 x 3 degrees (corresponding to a camera located boosted cascade of simple features. In Proc. IEEE Conference on at arm’s length from the subject), we see that this corresponds to an Computer Vision and Pattern Recognition (CVPR), vol. 1, 511– angular rate at the camera of 0.33 degrees per field, or 20 degrees 518. per second. This provides a performance goal for robotic cameras Z ETZSCHE , C., AND BARTH , E. 1990. Fundamental limits of used for remote gaze tracking. linear filters in the visual processing of two-dimensional signals. Vision Res. 30, 1111–1117. 4 Conclusion In this paper we have demonstrated an novel approach to active il- lumination eye detection, that combines pairs of difference images to reduce the effects of head movements. We have also examined the movement statistics of users working in a naturalistic setting at a computer console, and found that the majority of movements do not exceed a speed of 1 head-width per second. This is therefore a sensible target goal for a head-tracking system designed to keep the eye region within the area of analysis in a remote gaze-tracking sys- tem. By considering the camera field-of-view, and average distance from the subject, we can appropriately size the search region when repositioning the region-of-interest in a high-resolution image. 132