Hansen Homography Normalization For Robust Gaze Estimation In Uncalibrated Setups

1,325 views
1,285 views

Published on

Homography normalization is presented as a novel gaze estimation method for uncalibrated setups. The method applies when head movements are present but without any requirements to camera calibration or geometric calibration. The method is geometrically and empirically demonstrated to be robust to head pose changes and despite being less constrained than cross-ratio methods, it consistently
performs favorably by several degrees on both simulated data and data from physical setups. The physical setups include the use of off-the-shelf web cameras with infrared light (night vision) and standard cameras with and without infrared light. The benefits of homography normalization and uncalibrated setups in general are also demonstrated through obtaining gaze estimates (in the visible spectrum) using only the screen reflections on the cornea.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,325
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Hansen Homography Normalization For Robust Gaze Estimation In Uncalibrated Setups

  1. 1. Homography Normalization for Robust Gaze Estimation in Uncalibrated Setups Dan Witzner Hansen∗ Javier San Agustin† Arantxa Villanueva‡ IT University, Copenhagen IT University, Copenhagen Public University of Navarra Abstract ubiquitous, and convenient for the general public. So far, it has not been possible to meet these constraints concurrently. Homography normalization is presented as a novel gaze estimation Many gaze models require a fully calibrated setup and detailed eye method for uncalibrated setups. The method applies when head models (a strong prior model) to be able to minimize user calibra- movements are present but without any requirements to camera cal- tion and maintain high accuracy. A major limitation of fully cal- ibration or geometric calibration. The method is geometrically and ibrated setups is that they require exact knowledge of the relative empirically demonstrated to be robust to head pose changes and positions of the camera, light sources and monitor. Geometric cal- despite being less constrained than cross-ratio methods, it consis- ibration is usually tedious and time consuming to perform and au- tently performs favorably by several degrees on both simulated data tomated techniques are sparse [Brolly and Mulligan 2004]. Slight and data from physical setups. The physical setups include the use unintentional movement of a system part or change in focal length of off-the-shelf web cameras with infrared light (night vision) and may result in a significant drop in accuracy when relying on a cali- standard cameras with and without infrared light. The benefits of brated setup. The accuracy is therefore difficult to maintain unless homography normalization and uncalibrated setups in general are the hardware is placed in a rigid setup. Such requirements add to also demonstrated through obtaining gaze estimates (in the visible the cost of the system. Gaze models may alternatively use mul- spectrum) using only the screen reflections on the cornea. tiple calibration points in order to be less dependent on prior as- sumptions (e.g. using polynomial approximations [Hansen and Ji Keywords: Eye tracking, Gaze estimation, Homography normal- 2010]). Models employing a weak prior model have not been able ization, Gaussian process, Uncalibrated setup, HCI to demonstrate head pose invariance to date. This paper will both geometrically and empirically demonstrate that 1 Introduction it is possible to obtain robust gaze estimation in the presence of head movements when using a weak prior model of the geometric Eye and gaze tracking have a long history but only recently have setup. The model relies on homography normalization and does gaze trackers become robust enough for use outside laboratories. not require any direct measurements of the relative position of the The precision of current gaze trackers is sufficient for many types screen, camera and light source, nor does it need camera calibra- of applications, but are we really satisfied with their current capa- tion. This means that it is possible to obtain a highly flexible eye bilities? tracker that can be made compact, mobile and suit individual needs. Both research and commercial gaze trackers have been driven by Besides, the method is very simple to implement. Homography nor- the urge to obtain high accuracy gaze position data while simpli- malization is shown to consistently provide higher accuracies than fying user calibration, often by reducing the number of points nec- cross-ratio-based methods on both simulated data (section 4) and essary for calibrating an individual user to the system. Both high data recorded from a physical setup (section 5). One reason for accuracy and few calibration points are desirable properties of a considering uncalibrated setups is to facilitate the general public gaze tracker, but they are not necessarily the only parameters which with affordable and flexible gaze trackers that are robust with regard should be optimized [Scott and Findlay 1993]. Price is obviously to head movements. In section 5.2 this is shown to be achievable an issue, but may be partially resolved with technological devel- through purely off-the-shelf components. It is additionally shown opments. Today even cheap web cameras are of sufficient quality possible to use screen reflections on the cornea as an alternative for reliable gaze tracking. In some situations, however, it would be to IR glints (section 5.3). Through this paper we intend to show convenient if light sources, cameras and monitors could be placed that flexible, mobile and low cost gaze trackers are indeed feasible according to particular needs rather than being constrained by man- without sacrificing significant accuracy. ufacturer specifications. Avoiding external light sources or allow- ing the user to change the zoom of the camera to suit their particular 2 Related Work needs would be desirable. Gaze models that support flexible setups eliminate the need for rigid frames that keep individual components The primary task of a gaze tracker is to determine gaze, where gaze in place and allow for more compact, lightweight, adaptable and may either be a gaze direction or the point of regard (PoR). Gaze perhaps cheap eye trackers. If the models employed in the gaze modeling consequently focuses on the relations between the image trackers only required a few calibration targets and could maintain data and gaze. A comprehensive review of eye and gaze models is accuracy while avoiding the need for light sources, then eye track- provided in Hansen & Ji [2010]. ing technology would take an important step towards being flexible, All gaze estimation methods need to determine a set of parame- ∗ e-mail: witzner@itu.dk ters through calibration. Some parameters may be estimated for † e-mail: javier@itu.dk each session by letting the user look at a set of predefined targets ‡ e-mail: avilla@unavarra.es on the screen, others need only be calculated once (e.g. human spe- cific parameters) and yet other parameters are estimated prior to use Copyright © 2010 by the Association for Computing Machinery, Inc. (e.g. camera parameters, geometric and physical parameters such Permission to make digital or hard copies of part or all of this work for personal or as angles and location between camera and monitor). A system classroom use is granted without fee provided that copies are not made or distributed for commercial advantage and that copies bear this notice and the full citation on the where the camera parameters and the geometry are a priori known first page. Copyrights for components of this work owned by others than ACM must be is termed fully calibrated [Hansen and Ji 2010]. honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. This paper focuses primarily on feature-based methods but alterna- Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail tive methods based on appearance also exist [Hansen and Ji 2010]. permissions@acm.org. ETRA 2010, Austin, TX, March 22 – 24, 2010. © 2010 ACM 978-1-60558-994-7/10/0003 $10.00 13
  2. 2. Feature-based methods explore the characteristics of the human eye screen positions). Coutinho and Morimoto [2006] extend the model to identify a set of distinctive and informative features around the of Yoo et al. [2005], by using the offset between visual and optical eyes that are less sensitive to variations in illumination and view- axes as an argument to learn a constant on-screen offset. They ad- point. Ensuring head pose invariance is a common problem often ditionally perform an elaborate evaluation of the consequences of solved through the use of external light sources and their reflections changing the calibration of the virtual calibration parameter (α). (glints) on the cornea. Besides the glints, the pupil is the most com- Based on this, they argue that a simpler model can be made by mon feature to use, since it is easy to extract in IR spectrum images. learning a single α value rather than four different values as orig- The image measurements (e.g. the pupil) however, are influenced inally proposed. Where calibration in [Yoo and Chung 2005] can by refraction [Guestrin and Eizenman 2006]. The limbus is less only be done by looking at the light sources in the screen corners, influenced by refraction, but since its boundary may be partially the method of [Coutinho and Morimoto 2006] may use multiple occluded, it may be more difficult to obtain reliable measurements. on-screen targets. Two types of feature-based gaze estimation approaches exist: the Since the cross-ratio is defined on projective planes and is invariant interpolation-based (regression-based) and the model-based (geo- to any projective transformation, scale changes will not influence metric) Using a single camera, the 2D regression methods model the cross-ratio. The method is therefore not directly applicable to the optical properties, geometry and the eye physiology indirectly depth translations. Coutinho and Morimoto [2006] show signifi- and may, therefore, be considered as approximate models which cant accuracy improvements compared to the original paper, pro- may not strictly guarantee head pose invariance. They are, how- vided the user does not change their distance to the camera and ever, simple to implement, do not require camera or geometric cal- monitor. The advantage of the method, compared to methods based ibration (a.k.a weak prior model) and may still provide good re- on calibrated setups, is that full hardware calibration is needless. sults under conditions of small head movements. More recent 2D The method only requires light source position data relative to the regression-based methods attempt to improve performance under screen. One limitation is that the light sources should be placed larger head movements through compensation, or by adding addi- right on the corners of the screen. In practice the method is highly tional cameras [Hansen and Ji 2010]. The 3D model-based meth- sensitive to the individual eye and formal analysis of the method is ods, on the other hand, directly compute the gaze direction from presented by Kang et al. [2008]. They identified two main sources the eye features based on a geometric model of the eye. Most 3D of errors: (1) the angular offset between visual and optical axes and model-based (or geometric) approaches rely on metric information (2) the offset between pupil and glint planes. Depending on the and thus require camera calibration and a global geometric model point configuration, the cross-ratio is also known for not being par- (external to the eye) of light sources, camera and monitor position ticularly robust to noise, since small changes in point positions can and orientation. Gaze direction is modeled either as the optical axis result in large variations in the cross-ratio. or the visual axis. The optical axis is the line connecting the pupil center, cornea center and the eyeball center. The line connecting 3 Homography Normalization for Gaze Esti- the fovea and the center of the cornea is the visual axis. The visual axis is presumably the true direction of gaze. The visual and optical mation axes intersect at the cornea center with subject dependent angular This section presents the fundamental model for a robust point of offsets. In a typical adult, the fovea is located about 4 − 5◦ horizon- regard estimation method in uncalibrated setups (a priori unknown tally and about 1.5◦ below the point of the optic axis and the retina geometry and camera parameters). The components of the model and may vary up to 3◦ vertically between subjects. Much of the the- are illustrated in figure 1. ory behind geometric models using fully calibrated setups, has been formalized by Guestrin and Eizenman [2006]. Their model covers L2 L1 a variable number of light sources and cameras, human specific pa- rameters, light source positions, refraction, and camera parameters but is limited by only applying to fully calibrated setups. Methods Cornea relying on fully calibrated setups are most common in commercial L3 l1 and research-based systems but are limited for public use unless l2 placed in a rigid setup. Any change (e.g. placing the camera dif- ferently or changing the zoom of the camera) requires a tedious l3 recalibration. Πc L4 l4 Pupil An alternative to the fully calibrated systems while allowing for head movements is to use projective invariants and multiple light Πs fc p sources [Yoo and Chung 2005; Coutinho and Morimoto 2006]. c C Contrary to the previous methods, Yoo et al. [2005] describe a method which is capable of determining the point of regard based Camera Πi solely on the availability of light source position information (e.g. Center no camera calibration or prior knowledge of rigid transformations between hardware units) by exploiting the cross-ratio of four points Figure 1: Geometric model of the human eye, light sources, screen, (light sources) in projective space. Yoo et al. [2005] use two cam- camera and projections (dashed line). The pupil is depicted as an eras and four IR light sources placed around the screen to project ellipse with center pc and the cornea as a hemisphere with center these corners on the corneal surface, but only one camera is needed C. The corneal-reflection plane, Πc , and its projection in the image for gaze estimation. When looking at the screen the pupil center are shown by quadrilaterals. Both Πc and the cornea focal point, should ideally be within the four glint area. A fifth IR light emitter fc , are displaced relative to each other and to the pupil center for is placed on-axis to produce bright pupil images and to be able to illustration purposes. account for non-linear displacements (modeled by four αi parame- ters) of the glints. The method of Yoo et al. [2005] was shown to be The cornea is approximately spherical and has a radius, Rc , about prone to large person specific errors [Coutinho and Morimoto 2006] 7.8mm. The cornea reflects light similarly to a convex mirror and and can only use the light sources for calibration (e.g. not on other has a focal point, fc , located halfway between the corneal surface 14
  3. 3. and the center of corneal curvature (fc = Rc ≈ 3.9 mm). Re- 2 n n (normalized plane) spanned by four points g1 . . . g4 . Πn represents flections on the cornea consequently appear further away than the the (unknown) corneal-reflection plane given up to a homography. n n corneal surface (a.k.a virtual reflections). Let gj (j = 1..4) be the corners of the unit square and define Hi n n such that gj = Hi gj . Notice, using the screen corners to span the Denote the screen plane Πs and four (virtual) reflection on the normalized space would be equally viable. The basic idea is that the c c cornea (g1 . . . g4 ). The reflections may come from any point in 3D n pupil is mapped to the normalized space through Hi to normalize space, for example external light sources (Li ) or the corners of a the effects of head pose prior to any calibration or gaze estimation screen reflected on the cornea. The issue of screen projections will s procedure (Fn , in figure 2). The mapping of the reflections from be addressed in section 5.3. For the sake of simplicity and with- s s the image Πi to the screen Πs via Πn is therefore Hi = Hn ◦ Hi . n c c out loss of generality, the following description assumes (g1 . . . g4 ) s s That is, a homography Hn is a sufficient model for Fn when the come from point light sources. Provided the eye is stationary then pupil and Πc coincide. any location of a light source, Li , on li with same direction produce s the same point of reflection on the cornea. The light sources can Hi can be found through a user calibration consisting of a min- therefore and interchangeably be assumed located on e.g. the screen imum of 4 calibration targets, t1 . . . tN on the screen. Denote the plane Πs or at infinity as depicted in figure 1. Projected points at general principle of normalizing eye data (pupil center, pupil or lim- infinity lie in the focal plane of the convex mirror. With four light bus contours) with respect to the reflections by homography nor- s s source there will exist a plane Πc (in fact a family of planes related malization. The method of using Fn = Hn in connection with by homographies), spanned by the lines li . This plane is denoted homography normalization is referred to as (Hom). the corneal-reflection plane and is close to fc when Li at infin- ity. When considering the reflection laws (e.g. not a projection) the The cross-ratio method do not model the visual axis well [Kang corneal reflections may only be approximately planar. et al. 2008]. Homography normalization, on the other hand, does model the offset between the optical and visual axes to a much Without loss of generality suppose the light sources are located on higher degree. Points in normalized space are based on the pupil c c Πs . The quadrilateral of glints (g1 . . . g4 ) is consequently related center i.e. a model of the optical axis without the interference of i i to the corresponding quadrilateral (g1 . . . g4 ) in the image via a ho- head movements. However, as offsets between the optical and vi- i mography, Hc , from the cornea (Πc ) to the image (Πi ) [Hartley sual axes correspond to translations in normalized space, the visual s s and Zisserman 2004]. Similarly, the mapping from the cornea to and optical axis offset is modeled implicitly through Fn = Hn . s the screen is also given by a homography Hc . The homography s s c from the image to the screen Hi = Hc ◦ Hi via the Πc will 3.1 Model Error from Planarity Assumption therefore exist regardless of the location of the cornea, provided the geometric setup does not change. These arguments also apply The previous section describes a generalized approach for head to cross-ratio-based methods [Coutinho and Morimoto 2006; Yoo pose invariant PoR estimation under the assumption that the pupil and Chung 2005]. and Πc coincide. If the pupil had been located on Πc , it would be a head pose invariant gaze estimation method that models the The pupil center is located about 4.2 mm from the cornea center visual and optical axis offset. Euclidean information is not avail- but its location vary between subjects and over time for a particu- able in uncalibrated settings. Using metric information (e.g. be- lar subject [Guestrin and Eizenman 2006]. However, the pupil is tween the pupil and the Πc ) does therefore not apply in this setting. located approximately 0.3 mm (| Rc − 4.2|) from the corneal focal 2 This section provides an analysis of the model error and section point, fc , and thus also close to Πc . In the following suppose that 3.2 discusses an approach to accommodate the errors. Figure 3 il- Πc and the pupil coincide. The pupil may under these assumptions s lustrates two different gaze directions and the associated modeling be mapped through Hi from the image to the screen via the corneal error measured from the camera. reflections. Camera center Image space Normalized space Screen Pupil gi gn gn gi 2 1 pc n 2 Camera 1 optical axis n H i Fs n gi gi gn gn PoR Gaze direction 1 3 4 3 4 Gaze direction 2 pci Πc X X e1 Figure 2: (left) Reflection points (crosses) and the pupil (gray el- e2 lipse) are observed in the image and (middle) the pupil mapped to Pupil position 2 Pupil position 1 the normalized space using the four reflection points. (right) from the normalized space the pupil is mapped to the point of regard. Figure 3: Projected differences between pupil and the correspond- These basic observations are sufficient to describe the fundamen- ing point on Πc for two gaze directions. Πc is kept constant for tal and simple algorithm for PoR estimation in an uncalibrated set- clarity. ting. The method is illustrated in figure 2 and is based on locating i i and tracking four reflections (g1 . . . g4 ) (e.g. glints) and the pupil in the image. The pupil center, pc , will be used in the following When the user looks away from the camera (’gaze direction 1’) it is description. However, the presented method may alternatively use evident that the error in the image plane is related to the projected the limbus center or the pupil/limbus ellipse contours directly in the line segment (between the point on Πc and the actual location of mapping since homographies allow for mappings of points, lines the pupil), el , onto the image plane. A gaze vector directed to- and conics. wards the camera (’gaze direction 2’) yields a point and therefore no error. Hence equal angular offsets from the optical axis of the It is convenient, though not necessary, to define a virtual plane, Πn , camera generate offset vectors ∆c (i, j) with the same magnitude 15
  4. 4. when viewed from the camera. The largest magnitude of errors oc- seen for single or dual glint systems [Morimoto and Mimica 2005]. cur when the gaze direction is perpendicular to the optical axis of One of the limitation when using polynomials is that any increase the camera. The magnitude field |∆c (i, j)| in camera coordinates of the order of the polynomial would require additional calibration consequently consists of elliptic iso-contours, centered around the targets in order to estimate the parameters of the polynomial. A cu- optical axis of the camera. However, it is the error, ∆s , in screen bic polynomial seem to be a good approximation for ∆i [Cerrolaza coordinates, that is of interest. The true point of regard in screen co- et al. 2008], however it would require at least 10 calibration targets. ordinates, ρ∗ = ρs + ∆s is a function of the estimated gaze ρs and s ˆ ˆ Different from the ’weight space’ approach of polynomials is the the error ∆s . That is ρ∗ = Hi (pc + ∆i ) = Hi pc + Hi ∆i , hence s s s s function view approach of Gaussian processes (GP). Gaussian pro- s errors on the screen ∆s = Hi ∆i are merely errors in the camera cess (GP) interpolation method is used to estimate ∆i by using a propagated to the screen through the homography. An example of squared exponential covariance function [Rasmussen and Williams the error vector field, ∆s , using a simulator and the corresponding 2006]: vector magnitudes is shown in Figure 4. 1 |xp − xq | cov(xp , xq ) = k1 ∗ exp(− 2 ) + k3 σ 2 Calibration Targets 2 k2 Vector field of PoR errors Magnitudes of PoR error vector field 16 14 where xp and xq are data points and ki are weights. GP’s have 12 several innate properties that make them highly suited for gaze es- 10 16 timation. Gaussian processes do not model weights directly and 8 12 14 thus there are no requirements on the minimum number of calibra- 6 0.015 10 tion targets needed to infer model parameters. Each additional cal- 4 0.01 6 8 ibration target provides additional information that will be used to 2 0.005 0 4 increase accuracy. Each estimate also comes with an error measure- ment which, via the covariance function, is related to the distance 2 0 5 0 10 0 0 5 10 15 15 from the input data to the calibration data. This information can Camera location potentially be used to regularize output data. The exponential co- variance function has been adopted since it is highly smooth (like ∆i ) and it makes it possible to account for noise directly in the co- Figure 4: (left) Error vector field and (right) corresponding mag- variance function through k3 σ 2 . In the following we denote with nitudes obtained from simulated data. Crosses indicate calibration s (GP) the method of Fn that use (Hom) together with Gaussian pro- targets and the circles the projection of the camera center. cess modeling of ∆i . To argue for the characteristics of ∆s it is without loss of general- 4 Assessment on Simulated Data ity and for the sake of simplicity assumed that only four calibration points, (t1 . . . t4 ), are used (crosses in figure 4). When estimat- s Head pose, head position, the offset between visual and optical ing the homography, Hi , through user calibration, the errors in the axes, refraction, measurement noise, relative position of hardware calibration targets, ∆s (ti) = 0, are minimized to zero and there and camera parameters are factors that mostly influence the accu- will therefore be 5 points (calibration targets and the camera opti- racy of gaze estimation methods. We will in the following sec- cal axis) where the ∆s is zero. tions evaluate the homography normalization methods ((Hom) and One way of thinking of a homography is that it generates a linear (GP)) to the cross-ratio methods ((Yoo)[Yoo and Chung 2005] and s (Cou)[Coutinho and Morimoto 2006]). These methods have been vector field of displacements. ∆s = Hi ∆i is therefore a compo- sition of two vector fields (∆s = Vh + ∆i ), a linear vector field chosen since they operate under similar premises as homography corresponding to the homography (Vh ) and an ellipsoidal vector normalization (e.g. uncalibrated/semi-calibrated setup). Simulated field ∆i . Since ∆s (ti ) = 0 then Vh (ti ) = −∆s (ti ). Vh (ti ) is data is used in this section to be able to asses the effects of potential consequently defined through the negative error vectors of ∆i (ti ). noise-factors separately. The simulator [B¨ hme et al. 2008] allows o It is worth noting that as the camera location is unknown due to the for detailed modeling of the different components of the setup and uncalibrated setup assumption and the location of the maximum er- eye specific parameters. The evaluation is divided according to the ror depends on the location of the camera, it would be impossible presence of head movements and the number of calibration targets to determine the extremal location without additional information. (N). Notice the methods, except (Yoo), allow for multiple on-screen However, despite of this, it is be shown in the following sections calibration targets. The effects of eye specific parameters such as that it is possible through homography normalization to obtain re- refraction and offset between the visual and optical axis as well sults quite similar to fully calibrated setups. as the effect of the number of calibration targets and errors asso- ciated with the model assumptions are evaluated when the head is 3.2 Modeling Error Vectors fixed (section 4.2). The methods are examined with respect to head movements in section 4.3. In some experiments the (GP) method This section discusses one approach of modeling the error caused has been left out since it is a derivative of (Hom) and would not alter by the non-coplanarity of Πc and the pupil. Even though the loca- the inherent properties of using homography normalization, it only tion of the largest errors cannot be determined (a priori) due to the makes a difference to the accuracy when the number of calibration uncalibrated setup, it may be worthwhile to accommodate the er- targets is larger than four (N > 4). rors to the extent possible. That is to estimate a vector field similar to figure 4. When the camera is placed outside the screen area, the 4.1 Setup error due to the homography is zero in 5 points (e.g. the calibration targets and the camera projection center) and non-zero elsewhere. The camera is located slightly below and to the right of the cen- s After estimating Hi it is possible to measure the error due to the ho- ter of the screen as to simulate a realistic setup (e.g. users do not mography for each additional calibration target. Since the error vec- place the components in an exact position). All tests have been tor field is smooth, a simplified yet effective approach would be to conducted with the same camera focal length. The cornea is mod- model the error through polynomials in a similar way as previously eled as a sphere with radius 7.98 mm. Four light sources are placed 16
  5. 5. at the corners of a planar surface (screen) to be able compare ho- offset, γ ( with β = 0), has a significant effect on the accuracies of mography and cross-ratio methods. In the following denote with the cross-ratio methods but not on homography normalization. The N the number of calibration targets. γ and β correspond to the an- reason is that homography normalization models the optical visual gular offsets between the visual and optical axes in horizontal and offset to a much higher degree. vertical directions, respectively. 4.2 Stationary Head Accuracy with variable optical/visual−axis offset 3.5 Yoo Cou Basic Settings and Refraction In this section the methods are 3 Hom evaluated as if the head is kept still while gazing at a uniformly On−screen error (deg) distributed set of 64 × 64 targets. Figure 5 shows the mean ac- 2.5 curacy (degrees) with error-bars (variance) in the hypothetical eye 2 model, where there is no offset between visual and optical axes E0 = {γ = β = 0} and a more realistic setting with eye model 1.5 E1 ={γ = 4.5, β = 1.5}. Each sub-figure shows the cases where refraction is included and when it is not. E0 is a physically infea- 1 sible setup since the optical and visual axis are different, but the model avoids eye specific biases. It is clear from figure 5 that the 0.5 methods exhibit similar accuracies in E0 , but the offset between vi- sual and optical axes in E1 makes a notable difference between the 0 −5 −3.9 −2.8 −1.7 −0.6 0.6 1.7 2.8 3.9 5 methods. Refraction has only a minor effect on the methods. Offset (degrees) Influence of refraction with eye model 0 Influence of refraction with eye model 1 Figure 7: Accuracy as a function of the angular offset. 0.8 Refraction 3.5 No refraction Refraction 0.7 No refraction 3 0.6 Error magnitude (deg) 2.5 Error magnitude (deg) 0.5 4.3 Head Movements 2 0.4 1.5 0.3 0.2 1 Gaze trackers should ideally be head pose invariant. This section 0.1 0.5 evaluates the methods in scenarios where the eye location changes 0 0 in space (±300 mm in both x and y directions from the camera Yoo Cou Hom Yoo Cou Hom Method Method center) but the target location remains fixed on the screen. Figure 5: Comparison of methods (with/without refraction) when the head is kept still using eye model (left) E0 =(γ = β = 0) and Influence of N and γ Figure 8 shows the accuracies of using (right) eye model E1 =(γ = 4.5, β = 1.5) and N = 4 calibration a variable number of calibration targets and eye parameters in the targets. presence of head movements. The results show similarities to the head still experiments by also revealing that the offset between the optical and visual axes makes a significant difference to the cross- Changing N The previous test is based on a minimum number ratio methods, but not to the homography-based methods. The of calibration targets. However, the methods may, besides (Yoo), number of calibration targets has only a minor effect on accuracy. improve accuracy as the N uniformly distributed calibration targets Non-linear modeling improves accuracy and especially the differ- increase. Figure 6 shows accuracy of the methods as a function of ence between 4 and 9 calibration targets makes a significant dif- N for both eye models. (GP) exhibit a rapid increase of accuracy ference. When considering the nuisance of calibration and the ob- when increasing N . Both (Hom) and (Cou) may be improved by tained accuracy, it is task dependent whether the rather small in- increasing N , but large N implies a accuracy decrease for (Cou). crease in accuracy between 9 and 16 calibration targets is worth- The accuracy for (Yoo) is as expected. while. Varying the number of calibration targets eye model 0 Varying the number of calibration targets eye model 1 0.8 3.5 Yoo Cou Yoo Cou Depth Translation The methods analyzed here are all using 0.7 Hom Hom GP 3 GP properties on projective planes. Movements in depth is therefore 0.6 2.5 not an inherent property to the methods. The influence of head Accuracy (deg) Accuracy (deg) 0.5 2 movements will therefore be examined by evaluating head move- 0.4 1.5 ments as translations parallel to the screen plane (or equivalently 0.3 1 Πc ) as depicted in figure 9 and movements in depth (figure 10). A 0.2 single depth is used for calibration. The results show that none of 0.5 0.1 the methods are invariant to neither depth or in-plane translations, 0 4 9 16 25 36 49 Number of calibration targets 64 0 4 9 16 25 36 49 Number of calibration targets 64 but that the homography normalization-based methods have better performance. For depth changes larger than 150 mm (see figure 10) Figure 6: Changing the number of calibration targets, N , for E0 the (GP) method does not perform as well as (Hom). The reason is (left) E1 (right). that the learned offsets in (GP) are only valid for a single scale. The graphs in figure 10 show the accuracy as a function of depth Offset between Visual and Optical Axes There is a noticeable changes (from the calibration depth) when using different eye pa- accuracy difference when using E0 and E1 in the previous experi- rameters (E0 and E1 ) and with a variable number of calibration ments. Figure 7 shows that the influence of the angular horizontal targets, N . 17

×