Skovsgaard.2011.evaluation of a remote webcam based eye trackerDocument Transcript
Evaluation of a Remote Webcam-Based Eye Tracker Henrik Skovsgaard Javier San Agustin Sune Alstrup Johansen IT University of Copenhagen IT University of Copenhagen IT University of Copenhagen Rued Langgaards Vej 7 Rued Langgaards Vej 7 Rued Langgaards Vej 7 2300 Copenhagen S 2300 Copenhagen S 2300 Copenhagen S firstname.lastname@example.org email@example.com firstname.lastname@example.org John Paulin Hansen Martin Tall IT University of Copenhagen Duke University Rued Langgaards Vej 7 2424 Erwin Rd. (Hock Plaza) 2300 Copenhagen S Durham, NC 27705 email@example.com firstname.lastname@example.orgABSTRACT The use of oﬀ the-shelf hardware components in gaze track-In this paper we assess the performance of an open-source ing represents a growing research ﬁeld .gaze tracker in a remote (i.e. table-mounted) setup, and In 2004, Babcock and Pelz  presented a head mountedcompare it with two other commercial eye trackers. An ex- eye-tracker that uses two small cameras attached to a pairperiment with 5 subjects showed the open-source eye tracker of safety glasses. Li et al.  extended their work and builtto have a signiﬁcantly higher level of accuracy than one of a similar system that worked in real time, called OpenEyes.the commercial systems, Mirametrix S1, but also a higher Being headmounted, both systems are aﬀected by head move-error rate than the other commercial system, a Tobii T60. ments and are thus not suitable for use in combination withWe conclude that the web-camera solution may be viable a desktop computer. Although the components used infor people who need a substitute for the mouse input but the systems described above are inexpensive, assembling thecannot aﬀord a commercial system. hardware requires advanced knowledge of electronics. Zielin- ski’s Opengazer system , based on a remote webcam, takes a simpler hardware approach. The gaze estimationCategories and Subject Descriptors method is not tolerant to head movements, and thereforeH.5.2 [Information Interfaces and Presentation]: User the user needs to keep the head still after calibration.interfaces—Evaluation/methodology Sewell and Komogortsev  developed a neural-network based eye tracker able to run on a personal computer’s built-General Terms in webcam under normal lightning conditions (i.e., no in- frared light). The aim of their study was to employ eyeHuman factors, Experimentation, Performance, Measure- tracking without any modiﬁcations to the hardware. Thement ﬁve participants in the study complained that even during ﬁxations they felt a jumpy sensation of the marker, and thatKeywords the marker was unstable during use.Gaze interaction, low-cost gaze tracking, performance eval- The ITU Gaze Tracker1 is an open-source gaze trackinguation, universal access software that can be used with low-cost and oﬀ-the-shelf hardware, such as webcams and video cameras. The soft-1. INTRODUCTION ware tracks the pupil and one or two corneal reﬂections pro- duced by infrared light sources. The ﬁrst version of the Gaze tracking systems enable people with severe motor system was introduced and evaluated by San Agustin et al.disabilities to communicate using only their eye movements. in . The results obtained indicated that a low-cost systemHowever, some of them cannot aﬀord a commercial system, built with a webcam could have the same performance as ex-which cost between $5,000 and $30,000. While the quality pensive commercial systems. However, the system requiredof the systems has improved dramatically over the years, the placing the webcam very close to the user’s eye, which wasprice has remained more or less constant. Systems that em- not comfortable. Furthermore, the camera blocked part ofploy low-cost and oﬀ-the-shelf hardware components are be- the user’s view. Being a headmounted system, it also re-coming increasingly popular as camera technology improves. quired the user to sit completely still, as head movements aﬀected the cursor position. The second version of the system enables remote eye track-Permission to make digital or hard copies of all or part of this work for ing by using a camera with a narrow ﬁeld-of-view. The samepersonal or classroom use is granted without fee provided that copies are webcam used in  can be employed by replacing the stan-not made or distributed for proﬁt or commercial advantage and that copies dard wide-angle lens with an inexpensive 16 mm zoom lens.bear this notice and the full citation on the ﬁrst page. To copy otherwise, to Figure 1 shows the hardware conﬁguration for such remoterepublish, to post on servers or to redistribute to lists, requires prior speciﬁcpermission and/or a fee. system with a webcam and two light sources.NGCA ’11 May 26-27 2011, Karlskrona, Sweden 1Copyright 2011 ACM 978-1-4503-0680-5/11/05 ...$10.00. http://www.gazegroup.org
2.2 Target Acquisition In order to evaluate the performance of the diﬀerent input devices, we followed the methodology described by the ISO 9241-9 standard for non-keyboard input devices . The performance is quantiﬁed by the throughput and error of each device. Calculating the throughput is based on the eﬀective tar- get width We and the eﬀective distance De , which are used to calculate the eﬀective index of diﬃculty IDe following Equation 4. Throughput is measured in bps and is calcu- lated as the relationship between eﬀective index of diﬃculty IDe and movement time M T (Equation 5) . „ « De IDe = log2 + 1 , We = 4.133 · SDx (4)Figure 1: Hardware conﬁguration for the webcam- Webased gaze tracker. IDe T hroughput = (5) MT The aim of this study is to investigate whether the per-formance of the remote, webcam-based ITU Gaze Tracker(costing around $100) can match the performance of twocommercial gaze-tracking systems, a Tobii T60 ($25,000)and a Mirametrix S1 ($6,000) in an interaction task. 3. PERFORMANCE EVALUATION2. PERFORMANCE METRICS 3.1 Participants A total of ﬁve participants, three male and two female,2.1 Accuracy and Precision with ages ranging from 29 to 39 years (M = 34 years, SD = The performance of a sensor it typically measured in ac- 4.3), volunteered to participate the study. Three of the par-curacy and precision, where accuracy refers to the degree to ticipants had no previous experience with gaze interaction.which the sensor readings represent the true value of what is One of them used contact lenses.measured, while precision (also known as spatial resolution) 3.2 Apparatusrefers to the extent to which successive readings of the samephysical phenomenon agree in value . The computer used was a 2.6 GHz Intel Dual Core pro- The working copy of the COGAIN report: Eye tracker cessor desktop computer with 3 GB RAM running Windowsaccuracy terms and deﬁnitions  has a set of deﬁnitions XP SP3. We used the 17" monitor with a resolution ofand terminologies for measuring accuracy and precision of 1280×1024 that comes with the Tobii T60 system. Threean eye tracking system. Here, accuracy Adeg is deﬁned as the gaze trackers and a Logitech optical mouse (for baseline com-average angular distance, θi (measured in degrees of visual parison) were tested as input devices. Two of the three gazeangle) between n ﬁxation locations and the corresponding trackers were the commercial systems Tobii T60 and Mi-ﬁxation targets (see Equation 1). rametrix. The third system was the ITU Gaze Tracker us- ing a Sandberg Nightcam 2 webcam running at 30 fps with a n 1X 16 mm lens, and two Sony HVL-IRM infrared light sources. Adeg = θi (1) The total cost was around $100. The three gaze trackers n i=1 used a 9-point calibration procedure. Figure 2 shows theSpatial precision is calculated as the Root Mean Square, experimental setup.RMS, of the angular distance θi (measured in degrees of vi-sual angle) between successive samples (xi , yi ) to (xi+1 , yi+1 ) 3.3 Design and Procedure(Equation 2). After calibrating the system, participants completed an v accuracy test followed by a 2D target-selection task. Partic- u n u1 X ipants sat approximately 60 cm away from the monitor, and RM Sdeg = t θ2 (2) n i=1 i were asked to sit as still as possible. The experiment was conducted employing a within-subjects factorial design. The The working copy of the COGAIN report does not state target-selection task had the following independent variableshow angular distances, θ should be calculated. Distances and levels:are typically measured in pixels on computers and for thisexperiment, we used a function to map distances to pixels, • Device (4): Mouse, Tobii T60, Mirametrix, Webcam∆px to degrees of visual angle,∆ ◦ . Besides knowing the • Amplitude (2): 450, 900 pixelsdistance in pixels, the physical size of a pixel S and thedistance from user to screen D need to be known (Equation • Target Width (2): 75, 100 pixels3). „ « The dependent variables in the study were accuracy (de- 360 ∆px · S ∆◦ = · tan−1 (3) grees), precision (degrees), throughput (bps) and error rate π 2·D (%). Each participant completed 4 blocks of 1 trial (i.e., 4
(#" (#&" =77+127>" ?1-70,0*@" (#%" (#$" 8-91--,":;<" (" !#" !#&" !#%" !#$" !" )*+,-" .*/00" )0123-4105" 6-/723"Figure 2: Experimental setup. The participant isconducting the test using the Mirametrix system. Figure 3: Accuracy and Precision by device. Error bars show ± SD.trials) for the accuracy and precision test, and 16 blocks of 15trials (i.e., 240 trials) for the target-selection task, where de- in Figure 3). The main eﬀect of device on accuracy wasvice, amplitude, and target width were ﬁxed within blocks. statistically signiﬁcant, F (3, 12) = 16.03, p < 0.001. TheThe orders of input device and task were counterbalanced post-hoc test showed a signiﬁcant diﬀerence between mouseacross users to neutralize learning eﬀects. Participants were and all of the gaze trackers. Tobii performed signiﬁcantlyencouraged to take a comfortable position in front of the better than Mirametrix, t(4) = 3.65, p < 0.05. The webcamcomputer and remain as still as possible during the test. also performed signiﬁcantly better than Mirametrix, t(4) =The total test session lasted approximately 15 minutes. 4.42, p < 0.05. There was no signiﬁcant diﬀerence between Immediately after a successful calibration participants were the webcam and Tobii with t(4) = 1.57, p > 0.05.instructed to gaze on a randomly appearing target in a 4×4 Mean precision for mouse, Tobii, Mirametrix and web-matrix (evenly distributed with 100 pixels to the borders cam was 0.05◦ , 0.08◦ , 0.43◦ and 0.31◦ , respectively (right-of the monitor). A new target would appear when a to- side bar in Figure 3). Mauchly’s test indicated that thetal of 50 samples had been recorded at 30 Hz. Premature assumption of sphericity had been violated, χ(5) = 16.60,samples were avoided with a smooth animated transition be- p < 0.01, therefore degrees of freedom were corrected us-tween targets plus a reaction delay of 600 ms. Furthermore, ing Greenhouse-Geisser estimates of sphericity ( = 0.47).samples further than M ± 3 × SD away were considered as The results show that there was no signiﬁcant eﬀect on theoutliers. To prevent distractions from cursor movements, we precision of the devices, F (1.42, 5.67) = 4.38, p = 0.08.hid the cursor throughout the blocks except, of course, forthe mouse condition. 4.2 Throughput and Error Rate Once the accuracy test was completed, the target selec- Analysis of the target selection task was performed using ation task started. Participants were presented with 15 cir- 4×2×2 ANOVA, with device, amplitude and target width ascular targets arranged in a circle in the center of the screen. the independent variables. Throughput and error rate wereTargets were highlighted one-by-one, and participants were analyzed as the dependent variables. An LSD post-hoc testinstructed to select the highlighted target as quickly and as was applied after the analysis. All data were included.accurately as possible. Selections were performed with the Mean throughput for mouse, Tobii, Mirametrix and web-spacebar for the gaze trackers and a left-button click for the cam was 4.00, 2.63, 2.00 and 2.31 bps, respectively (left-sidemouse condition. Activations outside the target area were bars in Figure 4). The main eﬀect of device on throughputregarded as misses and were thus considered as the error was statistically signiﬁcant, F (3, 12) = 9.61, p < 0.01. Therate. Every selection ended the current trial and started the post-hoc test showed a signiﬁcant diﬀerence between mousenext one. Based on the amplitudes and target widths, the and all other devices. There was a main eﬀect of ampli-nominal indexes of diﬃculty were between 2.5 and 3.7 bits. tude F (3, 12) = 10.73, p < 0.05, with short amplitudes (M = 2.83 bps) having a signiﬁcantly higher throughput than4. RESULTS long amplitudes (M = 2.62 bps), t(4) = 3.30, p < 0.05. No signiﬁcance of target width was found F (3, 12) = 2.00, p =4.1 Accuracy and Precision 0.23. Analysis of the accuracy and precision was performed us- Mean error rate for Mouse, Tobii, Mirametrix and Web-ing a one-way ANOVA, with device as independent variable. cam was 5.34%, 19.21%, 39.29% and 27.50%, respectivelyAccuracy and precision were analyzed as the dependent vari- (right-side bars in Figure 4). The main eﬀect of device onables. 228 outliers of the 16,000 samples were removed from error rate was statistically signiﬁcant, F (3, 12) = 9.71, p <the analysis. An LSD post-hoc test was applied after the 0.01. The post-hoc test showed a signiﬁcant diﬀerence be-analysis. Figure 3 shows a plot of the average accuracy and tween mouse and all other devices. Tobii had a signiﬁcantlyprecision per device. lower error rate than the webcam, t(4) = 4.96, p < 0.05. Mean accuracy for mouse, Tobii, Mirametrix and webcam We found no eﬀect of amplitude F (3, 12) = 0.37, p = 0.58was 0.14◦ , 0.67◦ , 1.34◦ and 0.88◦ , respectively (left-side bar nor target width F (3, 12) = 0.37, p = 0.58.
#" #!" In our future work, we aim to further investigate these .=1*+>=?+4" (#" #" issues and implement new algorithms to improve the perfor- 811*1"924-" mance. Speciﬁcally, we would like to explore how continuous " !".=1*+>=?+4":/?,<" recalibrations and repositioning of the participants can im- 811*1"924-":;<" &(#" &#" prove performance over time. In this study we would also &" &!" like test various hardware setups for the ITU Gaze Tracker %(#" %#" (e.g. better cameras), and diﬀerent algorithms for calcu- lating the point-of-regard. A usability and user experience %" %!" study should also be employed to include subjective mea- $(#" $#" sures of the diﬀerent systems. $" $!" Finally, it is our hope that researchers, students and hob- !(#" #" byists will collaborate in the development of the software, and contribute to make the open-source ITU Gaze Tracker !" !" a more reliable system. )*+,-" .*/00" )0123-4105" 6-/723" Figure 4: Overall throughput and error rate by de- 7. ACKNOWLEDGEMENTS vice. Error bars show ± SD. We would like to thank EYEFACT for supporting the ex- periment, and the open source community for their help with improving the ITU Gaze Tracker. 8. REFERENCES  J. S. Babcock and J. B. Pelz. Building a lightweight 5. DISCUSSION eyetracking headgear. In Proceedings of the 2004 Our results suggest that the accuracy of the webcam- symposium on Eye tracking research & applications, based gaze tracker (0.88◦ ) is signiﬁcantly better than the pages 109–114, San Antonio, Texas, 2004. ACM. accuracy of the Mirametrix system (1.34◦ ), while showing  J. P. Hansen, D. Hansen, and A. Johansen. Bringing no signiﬁcant diﬀerence to the Tobii T60 (0.67◦ ). This in- gaze-based interaction back to basics. In Universal dicates that the ITU Gaze Tracker can be used in software Access in HCI (UAHCI): Towards an Information applications meant to be controlled by gaze input. Society for All, volume 3, pages 325–329, New Although we did not ﬁnd any signiﬁcant eﬀect of the indi- Orleans, USA, 2001. Lawrence Erlbaum. vidual devices in the precision study, the data indicates that  ISO. Ergonomic requirements for oﬃce work with the mouse and the Tobii system had a higher precision than visual display terminals (VDTs) - part 9. In the Mirametrix S1 and the webcam-based system. It must Requirements for nonkeyboard input devices. be noted that the precision is calculated after the low-pass International Organization for Standardization, 2000. ﬁltering that the eye trackers perform on the data samples  D. Li, J. Babcock, and D. J. Parkhurst. openEyes. In during ﬁxations. This is done to smooth the signal and pre- Proceedings of Eye tracking research & applications, vent a jittery cursor from annoying the user. The ITU Gaze pages 95–100, San Diego, California, 2006. ACM. Tracker gives users control over the level of smooth during  F. Mulvey. Eye tracker accuracy terms and deﬁnitions ﬁxations, a feature that many commercial systems do not - working copy. Technical report, COGAIN, 2010. provide.  J. San Agustin, H. Skovsgaard, J. P. Hansen, and The results obtained in the target-selection task indicate D. W. Hansen. Low-cost gaze interaction: ready to that the webcam-based eye tracker has a similar perfor- deliver the promises. In Proceedings of CHI’09, pages mance to the other two commercial systems in terms of 4453–4458, Boston, MA, USA, 2009. ACM. throughput. The error rate of the webcam tracker was, how-  W. Sewell and O. Komogortsev. Real-time eye gaze ever, signiﬁcantly higher than the error rate of the Tobii T60. tracking with an unmodiﬁed commodity webcam Throughput values were slightly lower than in previous stud- employing a neural network. In Proceedings of the 28th ies [6, 9]. This can be due to the lower control over hardware of the international conference extended abstracts on setup in our experiment, as well as the lack of experience of Human factors in computing systems, page 3739–3744, novice users, who tended to be rather slow. New York, USA, 2010. ACM. ACM ID: 1754048.  A. D. Wilson. Sensor- and Recognition-Based input for interaction. In The Human Computer Interaction Handbook, pages 177–199. Lawrence Erlbaum Associates, 2007.  X. Zhang and I. S. MacKenzie. Evaluating eye 6. CONCLUSION tracking with ISO 9241 - part 9. In Proceedings of the Our study on performance evaluation shows that a re- 12th international conference on HCI: intelligent mote, webcam-based eye tracker can have a performance multimodal interaction environments, pages 779–788, comparable to expensive systems. However, there are other Beijing, China, 2007. Springer. crucial factors for the practical usefulness of an eye track-  P. Zielinski. Opengazer: open-source gaze tracker for ing device that have not been evaluated in this study, such ordinary webcams. as the quality of the documentation, API, tolerance against http://www.inference.phy.cam.ac.uk/opengazer/, head movements, ease of use and stability over time. 2010.