Droege Pupil Center Detection In Low Resolution Images


Published on

In some situations, high quality eye tracking systems are not affordable. This generates the demand for inexpensive systems built upon non-specialized, off the shelf devices. Investigations show that algorithms developed for high resolution systems do not perform satisfactorily on such lowcost and low resolution systems. We investigate
algorithms specifically tailored to such low resolution input devices, based on combination of different strategies. An approach called gradient direction consensus is introduced and compared to image based correlation with adaptive templates as well as other known methods. The results are compared using synthetic input data with known ground truth.

  • Be the first to comment

  • Be the first to like this

Droege Pupil Center Detection In Low Resolution Images

  1. 1. Pupil Center Detection in Low Resolution Images Detlev Droege Dietrich Paulus Active Vision Group Active Vision Group University of Koblenz-Landau University of Koblenz-Landau droege@uni-koblenz.de paulus@uni-koblenz.de Abstract In some situations, high quality eye tracking systems are not afford- able. This generates the demand for inexpensive systems built upon non-specialized, off the shelf devices. Investigations show that al- gorithms developed for high resolution systems do not perform sat- isfactorily on such lowcost and low resolution systems. We inves- tigate algorithms specifically tailored to such low resolution input devices, based on combinations of different strategies. An approach called gradient direction consensus is introduced and compared to image based correlation with adaptive templates as well as other known methods. The results are compared using synthetic input data with known ground truth. CR Categories: I.4.8 [Image Processing and Computer Vi- sion]: Scene Analysis—Tracking; I.5.4 [Pattern Recognition]: Applications—Computer vision Figure 1: Experimental setup. Keywords: pupil center detection, low resolution, gaze tracking A number of algorithms has been published to determine the pupil 1 Introduction center in eye tracking systems like [Daunys and Ramanauskas 2004], [Li et al. 2005], [Ohno et al. 2002], and others. These al- The quality and accuracy of common eye and gaze tracking de- gorithms were shown to work good on medium to high resolution vices provides a very solid base for gaze based interaction systems. images, for low resolution input however they leave room for im- However, they are usually not precise enough to select tiny screen provement. elements of common graphical user interfaces, leading to the devel- opment of specific user interfaces for gaze interaction. Such inter- faces are designed to be used with a much lower need for accuracy, 2 Related Work not only due to the technical limitations, but also to account for the users capabilities [Majaranta and R¨ ih¨ 2002]. Gaze interaction a a Numerous approaches to the detection of the eye in video- systems are often used by disabled users, who, depending on the oculography (video based eye trackers) have been described in var- type and severeness of their handicaps, might also not be able to ious publications. A thorough overview is given by Hansen and Ji position their gaze as exact as would be required by conventional [2009]. Most of these algorithms were developed with specific se- user interfaces. tups and conditions in mind, as the usage scenarios for eye detection and tracking are quite different. Depending on the service capability of the health care system in different countries, affected persons might often not be funded to In the context of iris recognition, close-up high resolution images obtain a gaze tracking system. Therefore, several research groups of the subjects eye are self-evident. Appropriate oculars, specific work on systems using inexpensive off the shelf devices to compile illumination and appropriate cameras deliver almost perfect images simple gaze tracking systems like e. g. [San Agustin et al. 2009]. of the iris, however accept a distraction of the eye using extra il- Such systems will perform at a significantly lower accuracy than the lumination. Such images were the basis for Daugman’s algorithm established systems, but still good enough for being used with gaze described in [Daugman 2004]. He defines the integrodifferential interaction systems. While the goal of using cheap (US$ 20-30) operator, based on a circular integral around the currently estimated web cams could not yet be met, system costs of less than US$ 300 pupil/iris center. Iteratively increasing the radius large changes in are currently realistic. Figure 1 shows an experimental setup built the integral indicate the limbus and ar used to find new center esti- using such components. mates. In [Daunys and Ramanauskas 2004] two methods to determine the Copyright © 2010 by the Association for Computing Machinery, Inc. center of a pupil are presented. First the transitions from the pupil to Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed the iris is approximated by a polynomial to determine the pupil rim for commercial advantage and that copies bear this notice and the full citation on the with sub-pixel accuracy. The coordinates averaging approach then first page. Copyrights for components of this work owned by others than ACM must be forms horizontal and vertical scan lines between corresponding rim honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on points. The pupil center is estimated by averaging the horizontal servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail and vertical mid points for the respective component. For images permissions@acm.org. without artifacts from e. g. glints this gives rather accurate results. ETRA 2010, Austin, TX, March 22 – 24, 2010. © 2010 ACM 978-1-60558-994-7/10/0003 $10.00 169
  2. 2. Figure 2: Low resolution pupil image (infra red, with glint), en- Figure 3: Gradients Figure 4: Relevant gradients larged to identify individual pixels (red arrows are shortened) chosen (see sect. 3.1) For their circle approximation method the rim points are used to For remote infrared eye trackers like our system, pupil and glint estimate a circle. Obvious outliers to the circle fitting are weighted centers are the key values for the gaze estimation. Their relative down to find a good approximation after a few iterations. position, together with the geometry of the setup, provide the input for subsequent gaze estimation. While the glint is easily identified The Starburst approach presented in [Li et al. 2005] first removes as the brightest area in the eye image, the pupils outer form is not any glints and makes an initial rough guess for the pupil center. intact when the glint partly covers it. Additionally, the pupil border From here radial rays are sent and observed for a significant jump is not very sharp, causing further handicaps in its determination. To in intensity. From these pupil rim points secondary rays are sent a certain degree this can be compensated by using the iris border as like a fan in the opposite direction in a range of ±50◦ , providing an additional support, but it might be partly covered by the upper additional points on the rim. Using the RANSAC algorithm, an (and sometimes lower) eye lid(s) (see Figure 2). ellipse is fitted to the detected rim points. For low resolution images the comparison presented in [Droege [P´ rez et al. 2003] describes a similar technique, starting from an e et al. 2008] shows that none of the tested algorithms (e.g those in initial pupil center guess, the center of gravity of pixels which are Section 2) performs significantly better than the calculation of the darker than a threshold. However, only primary rays are used for center of gravity of sufficiently dark pixels. This center of gravity the pupil rim detection, which is done by employing a Laplace filter. though often gives a good first guess for subsequent steps as per- If the detected points are not equidistant from the estimated center, formed in the Starburst algorithm and others. iteratively a new center is chosen by using the mid points of the diagonals. 3.1 Gradient Direction Consensus The algorithm described in [Ohno et al. 2002] is called double el- Observation shows that in an image around the iris, a large num- lipse fitting. First, dark, round regions give an initial guess for ber of gradients point near the pupil/iris center (Figure 3). Most of pupils. Rim points are determined similar to Starburst’s primary those not pointing towards the center are caused by noise and iris rays. An ellipse is fitted to these points and its center is used as structure, mostly having a small strength. Other outliers are caused starting point for a second run with a doubled number of rays, ig- by the glint and denoted by a high gradient strength. By using a noriong obvious outliers. lower and a higher threshold for the gradient strength all ”reason- ably” strong gradients are selected for further processing. This is by far not a comprehensive list of published algorithms, sev- eral other approaches do exist, e. g. [Poursaberi and Araabi 2005], Using the Hessian form for straight lines we can describe any point but throughout follow similar principles. p on a line by p · n = d where n is the straight lines normal and d is its distance from the origin. The gradient components deter- mined using the Sobel operator give the gradient direction as g = 3 Approaches (gx , gy )T and thus the normal to it as ng = (−gy , gx )T . Given any point on the straight line, which in this case is just the coordinate Using lowcost COTS (commercial off the shelf) devices limits the (x, y)T where we determined the gradient, the distance dg to the possibilities for the system setup. Common inexpensive cameras origin can be calculated as dg = (x, y)T ·ng = y ·gx −x·gy . Thus, have a rather large field of view, resulting in very small projections every point p = (x, y)T on the gradient line solves p ·ng −dg = ∆ of the eye on the image sensor. While this has the advantage of with ∆ = 0, for other points ∆ gives the distance to the line. covering a larger range of head movements, which is desirable for some users, the disadvantage is the very small amount of pixels ˆ The goal is to estimate a common center point m minimizing forming the eye in the image as shown in Figure 2. ˆ m · n g − dg = ∆ (1) Given such coarse input data it is obvious that working on the level of discrete pixels is not sufficient to achieve the desired accuracy for all (relevant) gradients g. Expanding (1) yields required to control a gaze based interaction system. Only some of the published algorithms deal with sub-pixel accuracy and only ˆ ∆ = m · ng − dg = (my − y)gx + (x − mx )gy very few put a specific focus on this aspect. For head mounted systems as well as for remote systems with a narrow field of view, and the partial derivatives compute as sub-pixel accuracy is of limited importance, which explains this commonly neglected point. For a setup with COTS components ∂∆ ∂∆ however it is of great significance. = −gy , = gx . ∂mx ∂my 170
  3. 3. Algorithm Tag RMSE center of gravity CG11 0.1351 [P´ rez et al. 2003] e PE10 0.3047 ellipse fit, 20◦ steps EFS2 0.4016 ellipse fit, 10◦ steps EFS1 0.4022 Adaptive image pattern match PM01 0.1059 Gradient direction consensus GD02 0.0501 Table 1: RMSE values for different algorithms 3.3 Ellipse Fitting Figure 5: Example template images. Black denotes masked out For comparison, an ellipse fitting approach Inspired by the algo- pixels. The glint position is masked out. rithms presented in Section 2 has been implemented. First, pupil rim points are detected similar to [P´ rez et al. 2003]. To eliminate e the influence of erroneously detected points algorithms again an M- Using an M-estimator (described e. g. in [Staudte and Sheather estimator is employed. 1990]), a common intersection point is determined. Figure 4 shows the positions in the image with gradient strengths Using 18 sample lines the result is within the abovementioned limits. Used positions are marked in not satisfactory. The estimator some- dark blue, outliers in light blue. It is apparent in this sample image, times still includes glint border points that the resulting center (red cross) is not ’pushed away’ by the in the set of input samples to opti- glint opposed to the reference point (green) calculated by the simple mize, resulting in an ellipse distorted center of gravity approach. by the glint. 3.2 Image Based Template Matching Sending a ray every 20◦ seems rather coarse, raising the number of rays to Given the relatively small number of pixels to be analyzed direct 36 (10◦ steps) however dose not give image comparison with an adaptable image template is possible. a noticeable improvement (see Sec- Figure 6: Ellipse fitting Such a template must reflect the situation of low resolution by tak- tion 4). ing into account the smooth transitions in intensities between the different regions. That is, the generation of the template image 4 Comparison must be able to perform sub-pixel accuracy and anti aliasing and blending in the drawing process. Providing reference data for eye tracking is difficult. Instructing Choosing appropriate (initial) pupil and iris radii and luminosity a subject to follow a point with known coordinates on a computer values for the eye regions, an initial template is generated. By iter- screen bears two error sources when comparing pupil detection al- atively shifting it to different positions around the initial guess, the gorithms: errors introduced by the gaze estimation algorithms due position with the best match is taken for the next step with reduced to non-linearities cannot be distinguished from detection errors and search range and step width. This is repeated until a minimum step due to unintentional saccades the position of the pupil center does width, e. g. 0.1 pixel, is reached. not necessarily correspond to the reference position on screen. Of course, using appropriate parameters for the generation is es- Given the low contrast in eye images gathered from a low resolution sential. Therefore these parameters should be updated over time to device it is also difficult to assign a ground truth value to real images account for changes in the appearance of the eye. Intensity values matching the desired accuracy. Hence, as a first mean of evaluation, can be adapted by biasing the current values towards the actual val- synthetic images have been generated with graphics algorithms ca- ues found after every successful match. The pupil radius can be pable of producing sub-pixel accurate anti aliased output. While adapted from the ellipse parameters determined for an analysis as these images cannot be seen as a valid replacement for real images described in Section 3.3. it can at least be argued that bad performance on such almost ideal images will not become better in real images. The image comparison of two images f and g of size W × H is performed using common methods like the mean squared error or To distinguish between the pure pupil center detection and the sit- by employing a cross-correlation measure. uation where glints put an additional strain on the algorithm the first tests have been performed without glints. As some algorithms W,H 1 X benefit from prior knowledge of the glint position (if any), an accu- MSE = (fxy − gxy )2 (2) rate glint position detection should be performed prior to the pupil W · H x,y center detection and has been neglected in this work. W,H 1 X A series of 700 images with synthetic eye positions at regular sub- COR = (fxy · gxy ) (3) W · H x,y pixel positions was generated as test input. The algorithms from Section 3 as well as the center of gravity and the algorithm de- Producing a template image with all relevant aspects would cause a scribed in [P´ rez et al. 2003] (which performed best in [Droege e multitude of additional parameters to be considered e. g. glint posi- et al. 2008]) have been applied to this sequence. The Euclidean dis- tion(s), glint radius, upper and lower lid positions and form. Since tance to the reference coordinate is measured, the resulting rooted most of these parameters are hard to determine and of little further mean square error (RMSE) is listed in Table 1. The deviation of the use it is easier to mask out these regions from the comparison (Fig- measured positions (red dots) from the reference (light blue dots) is ure 5). shown in Figures 7-9. 171
  4. 4. synth_01_noglint L synth_01_noglint L synth_01_noglint L 223 223 223 EFS1 PM01 GD02 GNDT GNDT GNDT RMSE: 0.4022 RMSE: 0.1059 RMSE: 0.05005 222 222 222 221 221 221 220 220 220 219 219 219 223 224 225 226 227 228 223 224 225 226 227 228 223 224 225 226 227 228 Figure 7: Ellipse fit sub-sampled, Figure 8: Adaptive image pattern match, Figure 9: Gradient direction consensus, 10◦ steps, synthetic input synthetic input synthetic input Some algorithms show periodic patterns repeating at pixel fre- By improving the test framework and increasing the naturalness of quency, like in Figure 7. This is easily explained by the effects the test image series a more thorough investigation of new and ex- of aliasing and thresholds, but nevertheless undesirable. isting algorithms will be possible and lead to improved results. The RMSE values from Table 1 seem to contradict the plots at a first glance, but the variance for some of the other algorithms is References much higher resulting in a bad RMSE result. The most appeal- ing result is found in Figure 9 for the gradient direction consensus. DAUGMAN , J. 2004. How iris recognition works. Circuits and This is backed by the best RMSE value, however it must be re- Systems for Video Technology, IEEE Transactions 14, 1, 21–30. membered that this is measured on ’clean’ synthetic input images. DAUNYS , G., AND R AMANAUSKAS , N. 2004. The accuracy of Adding noise to the images reveals that the performance decreases eye tracking using image processing. In NordiCHI ’04, ACM, the stronger the noise in the images is. Additionally, no real iris New York, NY, USA, 377–380. structure is present in the synthetic images which would introduce noisy regions even in the case of a good input signal. D ROEGE , D., S CHMIDT, C., AND PAULUS , D. 2008. A compar- ison of pupil center estimation algorithms. In COGAIN 2008, 0.4 GD02,L GD02,R H. Istance, O. Stepankova, and R. Bates, Eds. (short paper). EFB2,L 0.35 EFB2,R PE10,L PE10,R H ANSEN , D. W., AND J I , Q. 2009. In the eye of the beholder: 0.3 A survey of models for eyes and gaze. IEEE Transactions on 0.25 Pattern Analysis and Machine Intelligence. (in print). 0.2 L I , D., W INFIELD , D., AND PARKHURST, D. J. 2005. Star- 0.15 burst: A hybrid algorithm for video-based eye tracking combin- ing feature-based and model-based approaches. In CVPR ’05, 0.1 IEEE, Washington, 79–86. 0.05 ¨ ¨ M AJARANTA , P., AND R AIH A , K.-J. 2002. Twenty years of eye 0 0 1 2 3 4 5 6 7 typing. In Proceedings of Eye Tracking Research and Applica- tions, ACM, 15–22. Figure 10: RMSE values dependent on noise strength (σ) O HNO , T., M UKAWA , N., AND YOSHIKAWA , A. 2002. Freegaze: Figure 10 shows the RMSE values for some of the algorithms as A gaze tracking system for everyday gaze interaction. In Pro- a function of the strength of noise added to the synthetic images, ceedings of the symposium on ETRA 2002: eye tracking research left and right eye (marked ’L’ and ’R’). The sigma of the Gaussian and applications symposium, 125–132. noise is varied from 0 to 7. The labels correspond to Table 1. ´ ´ ´ P E REZ , A., C ORDOBA , M., G ARCIA , A., M E NDEZ , R., M UNOZ , While the subjective impression of the performance on real input ´ M., P EDRAZA , J., AND S ANCHEZ , F. 2003. A precise eye-gaze images looks good, no quantifiable results can be given up to now. detection and tracking system. In WSCG POSTERS proceedings, Since the overall performance in a complete eye tracking system is February 3-7, 2003. always influenced by the gaze estimation process such results are P OURSABERI , A., AND A RAABI , B. 2005. A novel iris recog- difficult to interpret and to account for the specific algorithm used. nition system using morphological edge detector and wavelet phase features. ICGST International Journal on Graphics, Vi- 5 Conclusion sion and Image Processing 05. S AN AGUSTIN , J., S KOVSGAARD , H., H ANSEN , J. P., AND It is evident that a reasonable comparison must not be based on H ANSEN , D. W. 2009. Low-cost gaze interaction: ready to de- RMSE values of the dislocation distance alone. Additional mea- liver the promises. In CHI EA ’09: 27th intl. conf. on Human fac- sures like the variance must be used to characterize the perfor- tors in computing systems, ACM, New York, USA, 4453–4458. mance. The approaches of adaptive pattern matching and gradient direction consensus seem to be good candidates for further investi- S TAUDTE , R. G., AND S HEATHER , S. J. 1990. Robust Estimation gation. Their robustness to noisy signals and distortions by glints is and Testing. Wiley, New York. still to be investigated further, the subjective impression observed with real input images look promising. 172