Match-Moving for Area-Based Analysis of Eye Movements in Natural Tasks
         Wayne J. Ryan                          And...
2 Background                                                                      dated (by today’s standards) video recor...
Figure 3: Screen flash for synchronization visible as eye reflection.                 Figure 4: Initialization of pupil/limb...
(a) constrained ray origin and extent (b) constrained rays and fit ellipse

Figure 5: Constrained search for limbic featur...
                                                                                                             θ         ...
Apparatus, Environment, & Data Collected.              Participants
(a)                                    (b)                                                                      (c)       ...
BABCOCK , J. S. AND P ELZ , J. B. 2004. Building a Lightweight               M EGAW, E. D. AND R ICHARDSON , J. 1979. Eye ...
Upcoming SlideShare
Loading in …5

Ryan Match Moving For Area Based Analysis Of Eye Movements In Natural Tasks


Published on

Analysis of recordings made by a wearable eye tracker is complicated by video stream synchronization, pupil coordinate mapping, eye movement analysis, and tracking of dynamic Areas Of Interest (AOIs) within the scene. In this paper a semi-automatic system is developed to help automate these processes. Synchronization is accomplished
via side by side video playback control. A deformable eye template and calibration dot marker allow reliable initialization via simple drag and drop as well as a user-friendly way to correct the algorithm when it fails. Specifically, drift may be corrected by nudging the detected pupil center to the appropriate coordinates. In a case study, the impact of surrogate nature views on physiological health and perceived well-being is examined via analysis of gaze over images of nature. A match-moving methodology was developed to track AOIs for this particular application but is applicable toward similar future studies.

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Ryan Match Moving For Area Based Analysis Of Eye Movements In Natural Tasks

  1. 1. Match-Moving for Area-Based Analysis of Eye Movements in Natural Tasks Wayne J. Ryan Andrew T. Duchowski ∗ Ellen A. Vincent Dina Battisto School of Computing School of Computing Department of Horticulture Department of Architecture Clemson University (a) Headgear. (b) Complete device. (c) User interface. Figure 1: Our do-it-yourself wearable eye tracker (a) from off-the-shelf components (b) and the graphical user interface (c) featuring VCR controls for frame advancement and match-moving search boxes for dynamic object tracking. Abstract of generally unconstrained eye, head, and hand movements. The Analysis of recordings made by a wearable eye tracker is compli- most common eye tracking metrics sought include the number of cated by video stream synchronization, pupil coordinate mapping, fixations, fixation durations, and number and duration of fixations eye movement analysis, and tracking of dynamic Areas Of Interest per Area Of Interest, or AOI, among several others [Megaw and (AOIs) within the scene. In this paper a semi-automatic system is Richardson 1979; Jacob and Karn 2003]. Longer fixations gener- developed to help automate these processes. Synchronization is ac- ally indicate greater cognitive processing of the fixated area and the complished via side by side video playback control. A deformable fixation and percentage of fixation time devoted to a particular area eye template and calibration dot marker allow reliable initialization may indicate its saliency [Webb and Renshaw 2008]. via simple drag and drop as well as a user-friendly way to correct Complications in analysis arise in synchronization of the video the algorithm when it fails. Specifically, drift may be corrected by streams, mapping of eye position in the eye image frame to the nudging the detected pupil center to the appropriate coordinates. In point of gaze in the scene image frame, distinction of fixations a case study, the impact of surrogate nature views on physiological from saccades within the raw gaze point data stream, and determi- health and perceived well-being is examined via analysis of gaze nation of the frame-to-frame location of dynamic AOIs within the over images of nature. A match-moving methodology was devel- scene video stream. Most previous work relied on manual video oped to track AOIs for this particular application but is applicable frame alignment as well as manual (frame-by-frame) classification toward similar future studies. of eye movements. In this paper, a semi-automatic system is de- CR Categories: I.3.6 [Computer Graphics]: Methodology and veloped to help automate these processes, inspired by established Techniques—Ergonomics; J.4 [Computer Applications]: Social computer graphics methods primarily employed in video composit- and Behavioral Sciences—Psychology. ing. The system consists of video synchronization, calibration dot and limbus tracking (as means of estimation of parameters for gaze Keywords: eye tracking, match moving point mapping and of the pupil center, respectively), fixation detec- tion via a signal analysis approach independent of video frame rate, 1 Introduction and AOI tracking for eventual statistical analysis of fixations within Buswell’s [1935] seminal exploration of eye gaze over complex AOIs. Tracking of the calibration dot and of AOIs is achieved by scenes, e.g., photographs of paintings, patterns, architecture, and implementation of a simple 2D variant of match-moving [Paolini interior design, helped influence development of techniques for 2006], a technique used for tracking markers in film, primarily recording and analysis of human eye movements during perfor- to facilitate compositing of special effects (e.g., texture mapping mance of natural tasks. Wearable eye trackers allow collection computer-generated elements atop principal photography). The re- of eye movements in natural situations, usually involving the use sult is a semi-automatic approach akin to keyframing to set the lo- cation of markers over scene elements in specific video frames and ∗ e-mail: {wryan | andrewd} inbetweening their location coordinates (usually) by linear interpo- Copyright © 2010 by the Association for Computing Machinery, Inc. lation. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed A case study is presented where eye movements are analyzed for commercial advantage and that copies bear this notice and the full citation on the over images viewed in a hospital setting. The analysis is part of first page. Copyrights for components of this work owned by others than ACM must be an experiment conducted to better understand the potential health honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on benefits of images of nature toward patient recovery. Although de- servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail scriptive statistics of gaze locations over AOIs are underwhelming, the given methodology is applicable toward similar future studies. ETRA 2010, Austin, TX, March 22 – 24, 2010. © 2010 ACM 978-1-60558-994-7/10/0003 $10.00 235
  2. 2. 2 Background dated (by today’s standards) video recording equipment (a Sony DCR-TRV19 DVR). Nevertheless, the system served its purpose in Classic work analyzing eye movements during performance of a fostering a nascent open-source approach to (wearable) eye track- well-learned task in a natural setting (making tea) aimed to deter- ing that is still influential today. mine the pattern of fixations and to classify the types of monitoring action that the eyes perform [Land et al. 1999]. A head-mounted Recent work from the same research lab advanced the analyti- eye-movement video camera was used that provided a continuous cal capability of the wearable eye tracker in two important ways view of the scene ahead, with a dot indicating foveal direction with [Munn et al. 2008]. First, fixation detection was used to analyze an accuracy of about 1◦ . Results indicated that even automated rou- raw eye movement data. Second, a method was presented for track- tine activities require a surprising level of continuous monitoring. ing objects in the scene. Both developments are somewhat similar Land et al. concluded that although the actions of tea-making are to what is presented in this paper, but with significant distinctions. ‘automated’ and proceed with little conscious involvement, the eyes First, the prior method of fixation detection is tied to the eye video closely monitor every step of the process. This type of unconscious frame rate. In this paper we show that eye movement analysis is in- attention must be a common phenomenon in everyday life. dependent of frame rate, insofar as it operates on the eye movement Relations of eye and hand movements in extended food prepara- data stream (x, y,t) where the timestamp of each gaze coordinate tion tasks have also been examined [Land and Hayhoe 2001]. The is encoded in the data tuple. This form of analysis is not new, in task of tea-making was compared to the task of making peanut but- fact the analysis code was originally developed commercially (by ter and jelly sandwiches. In both cases the location of foveal gaze LC Technologies), made publicly available, and successfully used was monitored continuously using a head-mounted eye tracker with in at least one instance [Freed 2003]. Second, the prior technique an accuracy of about 1◦ , with the head free to move. The eyes usu- for scene object tracking uses structure from motion to compute 3D ally reached the next object in the task sequence before any sign information [Munn and Pelz 2008]. We show that such complex of manipulative action, indicating that eye movements are planned computation is unnecessary and a simple 2D translational tracker is into the motor pattern and lead each action. Eye movements during sufficient. This form of tracking, known as match-moving, is also this kind of task are nearly all to task-relevant objects, and thus their well established in the practice of video compositing. control is seen primarily ‘top-down’, and influenced very little by the ‘intrinsic salience’ of objects. 3 Technical Development Examination of short-term memory in the course of a natural Hardware design follows the description of Li [2006], with mini- hand-eye task [Ballard et al. 1995], showed employment of deic- mal modifications. The apparatus is constructed entirely from in- tic primitives through serialization of the task with eye movements expensive commercial off-the-shelf (COTS) components (see Fig- (e.g., using the eyes to “point to” scene objects in lieu of memo- ures 1 and 2). The entire parts list for the device includes one pair rizing all of the objects’ positions and other properties). A head of safety glasses (AOSafety X-Factor XF503), a more comfortable mounted eye tracker was used to measure eye movements over a nose piece of a second pair of plastic sunglasses (AOSafety I-Riot three-dimensional physical workplace block display. By recording 90714), black polyester braided elastic for wrapping the wires, two eye movements during a block pick-and-place task, it was shown screws to connect the scene camera bracket and nose piece, a small that subjects frequently directed gaze to the model pattern before aluminum or brass rod for mounting the eye camera, and two digital arranging blocks in the workspace area. This suggests that infor- video minicams. mation is acquired incrementally during the task and is not acquired The two digital video mini-camcorders used are the Camwear in toto at the beginning of the tasks. That is, subjects appeared to Model 200 from DejaView [Reich et al. 2004]. Each De- use short-term memory frugally, acquiring information just prior jaView wearable digital mini-camcorder uses the NW901 MPEG-4 to its use, and did not appear to memorize the entire model block CODEC from Divio, Inc., enabling MPEG-4 video recording at 30 configuration before making a copy of the block arrangement. fps. Each DejaView camera’s field of view subtends 60◦ . In a similar block-moving experiment, horizontal movements of The DejaView camera is connected via flexible cable to the gaze, head, and hand were shown to follow a coordinated pattern recorder box, which comes with a belt clip for hands-free use, but [Smeets et al. 1996]. A shift of gaze was generally followed by lacks an LCD display. Video is recorded on a 512MB SD mini a movement of the head, which preceded the movement of the disk. After recording, the video may be transferred to a computer hand. This relationship is to a large extent task-dependent. In for offline processing. The DejaView mini-camcorders do not sup- goal-directed tasks in which future points of interest are highly pre- port transfer of video while recording, precluding online process- dictable, while gaze and head movements may decouple, the actual ing. The lack of an LCD display also prevents verification of correct position of the hand is a likely candidate for the next gaze shift. camera positioning until after the recording is complete. Up to this point, while significant in its contributions to vision research, the analysis employed in the above examples was often based manual inspection of video frames. Relatively recently, intentionally-based, termed “look-ahead”, eye movements were reported [Pelz et al. 2000]. A commercially available wearable eye tracker from ASL was worn on the head with a computer carried in a backpack. Subsequently, a custom-built wearable eye tracker was assembled with off-the-shelf components, initiating an open-source movement to develop practical eye track- ing software and hardware [Babcock and Pelz 2004]. Tips were provided on constructing the tracker, opening the door to open- source software development. This corneal reflection eye tracker, mainly constructed from inexpensive components (a Radio Shack parts list was made available at one time), was one of the first Do-It- Yourself eye trackers, but suffered from two significant problems. First, it required the inclusion of one expensive video component, namely a video multiplexer, used to synchronize the video feeds of the scene and eye cameras. Second, the system relied on somewhat Figure 2: Eye tracker assembly [Ryan et al. 2008] © ACM 2008. 236
  3. 3. Figure 3: Screen flash for synchronization visible as eye reflection. Figure 4: Initialization of pupil/limbus and dot tracking. No IR illumination is used, simplifying the hardware and reduc- by mapping the pupil center (x, y) to scene coordinates (sx , sy ) via ing cost. The eye-tracker functions in environments with signifi- a second order polynomial [Morimoto and Mimica 2005], cant ambient IR illumination (e.g., outdoors on a sunny day, see sx = a0 + a1 x + a2 y + a3 xy + a4 x2 + a5 y2 Ryan et al. [2008]). However, lacking a stable corneal reflection and visible spectrum filtering, video processing is more challeng- sy = b0 + b1 x + b2 y + b3 xy + b4 x2 + b5 y2 . (1) ing. Specular reflections often occlude the limbus and contrast at the pupil boundary is inconsistent. The unknown parameters ak and bk are computed via least squares fitting (e.g., see Lancaster and Šalkauskas [1986]). 3.1 Stimulus for Video Processing 3.3.1 Initialization of Pupil/Limbus and Dot Tracking For video synchronization and calibration, a laptop computer is Pupil center in the eye video stream and calibration dot in the scene placed in front of the participant. To synchronize the two videos video stream are tracked by different local search algorithms, both a simple program that flashes the display several times is executed. initialized by manually positioning a template over recognizable Next, a roving dot is displayed for calibration purposes. The par- eye features and a crosshair over the calibration dot. Grip boxes ticipant is asked to visually track the dot as it moves. The laptop allow for adjustment of the eye template (see Figure 4). During display is then flashed again to signify the end of calibration. For initialization, only one playback control is visible, controlling ad- good calibration the laptop display should appear entirely within the vancement of both video streams. It may be necessary to advance scene image frame, and should span most of the frame. After cali- to the first frame with a clearly visible calibration dot. Subsequent bration the laptop is moved away and the participant is free to view searches exploit temporal coherence by using the previous search the scene normally. After a period of time (in this instance about result as the starting location. two minutes) the recording is stopped and video collection is com- plete. All subsequent processing is then carried out offline. Note 3.3.2 Dot Tracking that during recording it is impossible to judge camera alignment. A simple greedy algorithm is used to track the calibration dot. The Poor camera alignment is the single greatest impediment toward underlying assumption is that the dot is a set of bright pixels sur- successful data processing. rounded by darker pixels (see Figure 4). The sum of differences is largest at a bright pixel surrounded by dark pixels. The dot moves 3.2 Synchronization from one location to the next in discrete steps determined by the Video processing begins with synchronization. Synchronization refresh rate of the display. To the human eye this appears as smooth is necessary because the two cameras might not begin recording motion, but in a single frame of video it appears as a short trail of at precisely the same time. This situation would be alleviated if multiple dots. To mitigate this effect the image is blurred with a the cameras could be synchronized via hardware or software con- Gaussian smoothing function, increasing the algorithm’s tolerance trol (e.g., via IEEE 1394 bus control). In the present case, no to variations in dot size. In the present application the dot radius such mechanism was available. As suggested previously [Li and was roughly 3 to 5 pixels in the scene image frame. Parkhurst 2006], a flash of light visible in both videos is used as The dot tracking algorithm begins with an assumed dot location a marker. Using the marker an offset necessary for proper frame obtained from the previous frame of video, or from initialization. A alignment is established. In order to find these marker locations in sum of differences is evaluated over an 8×8 reference window: the two video streams, they are both displayed side by side, each with its own playback control. The playback speed is adjustable in ∑ ∑ I(x, y) − I(x − i, y − j), −8 < i, j < 8. (2) i j forward and reverse directions. Single frame advance is also pos- sible. To synchronize the videos, the playback controls are used This evaluation is repeated over a 5×5 search field centered at the to manually advance/rewind each video to the last frame where the assumed location (x, y). If the assumed location yields a maximum light flash is visible (see Figure 3). within the 25 pixel field then the algorithm stops. Otherwise the location with the highest sum of differences becomes the new as- 3.3 Calibration & Gaze Point Mapping sumed location and the computation is repeated. Pupil center coordinates are produced by a search algorithm exe- One drawback of this approach is that the dot is not well tracked cuted over eye video frames. The goal is to map the pupil center near the edge of the laptop display. Reducing the search field and to gaze coordinates in the corresponding scene video frame. Cali- reference window allows better discrimination between the dot and bration requires sequential viewing of a set of spatially distributed display edges while reducing the tolerance to rapid dot movement. calibration points with known scene coordinates. Once calibration is complete the eye is tracked and gaze coordinates are computed 3.4 Pupil/Limbus Tracking for the remainder of the video. A traditional video-oculography ap- A two-step process is used to locate the limbus (iris-sclera bound- proach [Pelz et al. 2000; Li et al. 2006] calculates the point of gaze ary) and hence pupil center in an eye image. First, feature points are 237
  4. 4. (a) constrained ray origin and extent (b) constrained rays and fit ellipse Figure 5: Constrained search for limbic feature points: (a) con- strained ray origin and termination point; (b) resultant rays, fitted ellipse, and center. For clarity of presentation only 36 rays are dis- Figure 6: Display of fitted ellipse and computed gaze point. played in (b), in practice 360 feature points are identified. ellipse on the screen. If the user observes drift in the computed el- detected. Second, an ellipse is fit to the feature points. The ellipse lipse the center may be nudged to the correct location using a simple center is a good estimate of the pupil center. drag and drop action. These strategies are analogous to traditional keyframing opera- 3.4.1 Feature Detection tions, e.g., when match-moving. If a feature tracker fails to track The purpose of feature detection is to identify point locations on the a given pixel pattern, manual intervention is required at specific limbus. We use a technique similar to Starburst [Li et al. 2005]. A frames. The result is a semi-automatic combination of manual candidate feature point is found by casting a ray R away from an trackbox positioning and automatic trackbox translation. Although origin point O and terminating the ray as it exits a dark region. We not as fast as a fully automatic approach, this is still considerably determine if the ray is exiting a dark region by checking the gradi- better than the fully manual, frame-by-frame alternative. A screen- ent magnitude collinear with the ray. The location with maximum shot of the user interface is shown in Figure 6. collinear gradient component max ∇ is recorded as a feature point. 3.4.4 Tracking Accuracy Starburst used a fixed threshold value rather than the maximum and did not constrain the length of the rays. The DejaView camera has approximately a 60◦ field of view, with Consistent and accurate feature point identification and selection video resolution of 320×240. Therefore a simple multiplication by is critical for stable and accurate eye-tracking. Erroneous feature 0.1875 converts our measurement in pixels of Euclidean distance points are often located at the edges of the pupil, eyelid, or at a spec- between gaze point and calibration coordinates to degrees visual ular reflection. To mitigate these effects the feature point search angle. Using this metric, the eye tracker’s horizontal accuracy is area is constrained by further exploiting temporal coherence. The better than 2◦ , on average [Ryan et al. 2008]. Vertical and horizon- limbic boundary is not expected to move much from one frame to tal accuracy is roughly equivalent. the next, therefore it is assumed that feature points will be near the 3.5 Fixation Detection ellipse E identified in the previous frame. If P is the intersection of ray R and ellipse E the search is constrained according to: After mapping eye coordinates to scene coordinates via Equation (1), the collected gaze points and timestamp x = (x, y,t) are ana- max ∇(O + α(P − O) : 0.8 < α < 1.2), (3) lyzed to detect fixations in the data stream. Prior to this type of analysis, raw eye movement data is not very useful as it represents as depicted in Figure 5. For the first frame in the video we use the a conjugate eye movement signal, composed of a rapidly changing eye model manually aligned at initialization to determine P. component (generated by fast saccadic eye movements) with the comparatively stationary component representative of fixations, the 3.4.2 Ellipse Fitting and Evaluation eye movements generally associated with cognitive processing. Ellipses are fit to the set of feature points using linear least squares There are two leading methods for detecting fixations in the raw minimization (e.g., [Lancaster and Šalkauskas 1986]). This method eye movement data stream: the position-variance or velocity-based will generate ellipses even during blinks when no valid ellipse is approaches. The former defines fixations spatially, with centroid attainable. In order to detect these invalid ellipses we implemented and variance indicating spatial distribution [Anliker 1976]. If the an ellipse evaluation method. variance of a given point is above some threshold, then that point Each pixel that the ellipse passes through is labeled as acceptable is considered outside of any fixation cluster and is considered to or not depending upon the magnitude and direction of the gradient be part of a saccade. The latter approach, which could be consid- at that pixel. The percentage of acceptable pixels is computed and ered a dual of the former, examines the velocity of a gaze point, 1 included in the output as a confidence measure. e.g., via differential filtering, xi = ∆t ∑k xi+ j g j , i ∈ [0, n − k), ˙ j=0 3.4.3 Recovery From Failure where k is the filter length, ∆t = k − i. A 2-tap filter with coeffi- cients g j = {1, −1}, while noisy, can produce acceptable results. The ellipse fitting algorithm occasionally fails to identify a valid ˙ The point xi is considered to be a saccade if the velocity xi is above ellipse due to blinks or other occlusions. Reliance on temporal co- threshold [Duchowski et al. 2002]. It is possible to combine these herence can prevent the algorithm from recovering from such situ- methods by either checking the two threshold detector outputs (e.g., ations. To mitigate this problem we incorporated both manual and for agreement) or by deriving the state-probability estimates, e.g., automatic recovery strategies. Automatic recovery relies on ellipse via Hidden Markov Models [Salvucci and Goldberg 2000]. evaluation: if an ellipse evaluates poorly, it is not used to constrain In the present implementation, fixations are identified by a vari- the search for feature points in the subsequent frame. Instead, we ant of the position-variance approach, with a spatial deviation revert to using the radius of the eye model as determined at ini- threshold of 19 pixels and number of samples set to 10 (the fixa- tialization, in conjunction with the center of the last good ellipse. tion analysis code is freely available on the web1 ). Note that this Sometimes this automatic recovery is insufficient to provide a good fit, however. Manual recovery is provided by displaying each fitted 1 The position-variance fixation analysis code was originally made 238
  5. 5. t1 θ t2 A B C D x E t3 F Figure 7: AOI trackbox with corners labeled (A, B,C, D). G H I approach is independent of frame rate, so long as each gaze point is listed with its timestamp, unlike a previous approach where fixation detection was tied to the video frame rate [Munn et al. 2008]. Figure 8: Trackboxes t1 , t2 , t3 , AOIs A, B, . . ., I, and fixation x. The sequence of detected fixations can be processed to gain insight into the attentional deployment strategy employed by the wearer of the eye tracking apparatus. A common approach is to within the reference window of the trackbox (see Figure 7). As in count the number of fixations observed over given Areas Of Inter- dot tracking, a 5×5 search field is used within an 8×8 reference est, or AOIs, in the scene. To do so in dynamic media, i.e., over window. Equation (2) is now replaced with I(x, y) − µ, where video, it is necessary to track the AOIs as their apparent position in (S(A) + S(B)) − (S(C) + S(D)) the video translates due to camera movement. µ= . p×q 3.6 Feature Tracking Trackable features include both bright spots and dark spots in By tracking the movement of individual features it is possible to ap- the scene image. For a bright spot, I(x, y) − µ is maximum at the proximate the movement of identified AOIs. We allow the user to target location. Dark spots produce minima at target locations. Ini- place trackboxes at any desired feature in the scene. The trackbox tial placement of the trackbox determines whether the feature to then follows the feature as it translates from frame to frame. This be tracked is a bright or dark spot, based on the sign of the initial is similar in principle to the match-moving tracker window in com- evaluation of I(x, y) − µ. mon compositing software packages (e.g., Apple’s Shake [Paolini Some features cannot be correctly tracked because they exit the 2006]). Locations of trackboxes are written to the output data file camera field. For this study three trackboxes were sufficient to along with corresponding gaze coordinates. We then post-process properly track all areas of interest within the scene viewed by par- the data to compute fixation and AOI information from gazepoint ticipants in the study. Extra trackboxes were placed and the three and trackbox data. that appeared to be providing the best track were selected manu- The user places a trackbox by clicking on the trackbox sym- ally. Our implementation output a text file and a video. The text file bol, dragging and dropping it onto the desired feature. A user may contained one line per frame of video. Each line included a frame place as many trackboxes as desired. For our study trackboxes were number, the (x, y) coordinates of each trackbox, the (x, y) coordi- placed at the corners of each monitor. nates of the corresponding gaze point, and a confidence number. Feature tracking is similar to that used for tracking the calibra- See Figure 6 for a sample frame of the output video. Note the frame tion dot with some minor adaptations. Computation is reduced by number in the upper left corner. precomputing a summed area table S [Crow 1984]. The value of The video was visually inspected to determine frame numbers any pixel in S stores the sum of all pixels above and to the left of for the beginning and end of stimulus presentation, and most us- the corresponding pixel in the original image, able trackboxes. Text files were then manually edited to remove extraneous information. S(x, y) = ∑ ∑ I(i, j), 0 < i < x, 0 < j < y. (4) i j 3.7 AOI Labeling Computation of the summation table is efficiently performed by a The most recent approach to AOI tracking used structure from mo- dynamic programming approach (see Algorithm 1). The summa- tion to compute 3D information from eye gaze data [Munn and Pelz tion table is then used to efficiently compute the average pixel value 2008]. We found such complex computation unnecessary because we did not need 3D information. We only wanted analysis of fixa- available by LC Technologies. The original fixfunc.c can tions in AOIs. While structure from motion is able to extract 3D in- still be found on Andrew R. Freed’s eye tracking web page: formation including head movement, it assumes a static scene. Our <>. The method makes no such assumption, AOIs may move independently C++ interface and implementation ported from C by Mike Ashmore are from the observer, and independently from each other. Structure available at: <>. from motion can however handle some degree of occlusion that our approach does not. Trackboxes are unable to locate any feature that becomes obstructed from view. AOI labeling begins with the text files containing gaze data and for (y = 0 to h) sum = 0 track box locations as described above. The text files were then for (x = 0 to w) automatically parsed and fed into our fixation detection algorithm. sum = sum + I(x, y) Using the location of the trackboxes at the end of fixation, we were S(x, y) = sum + S(x, y − 1) able to assign AOI labels to each fixation. For each video a short program was written to apply translation, rotation, and scaling be- Algorithm. 1: Single-pass computation of summation table. fore labeling the fixations, with selected trackboxes defining the local frame of reference. The programs varied slightly depending 239
  6. 6. Apparatus, Environment, & Data Collected. Participants viewed each image on a display wall consisting of nine video mon- itors arranged in a 3×3 grid. Each of the nine video monitors’ display areas measured 36 wide × 21 high, with each monitor framed by a 1/2 black frame for an overall measurement of 9 wide × 5 3 high. The mock patient room measured approximately 15.6 × 18.6 . Participants viewed the display wall from a hospital bed facing the monitors. The bed was located approximately 5 3 from the dis- play wall with its footboard measuring 3.6 high off the floor (the monitors were mounted 3 from the floor). As each participant lay on the bed, their head location measured approximately 9.6 to the center of the monitors. Given these dimensions and distances and using θ = 2 tan−1 (r/(2D)) to represent visual angle, with r = 9 Figure 9: Labeling AOIs. Trackboxes, usually at image corners, and D = 9.6 , the monitors subtended θ = 50.2◦ visual angle. are used to maintain position and orientation of the 9-window dis- Pain perception, mood, blood pressure, and heart rate were con- play panel; each of the AOIs is labeled in sequential alphanumeric tinually assessed during the experiment. Results from these mea- order from top-left to bottom-right—the letter ‘O’ is used to record surements are omitted here, they are mentioned to give the reader a when a fixation falls outside of the display panels. In this screen- sense of the complete procedure employed in the experiment. shot, the viewer is looking at the purple flower field. Procedure. Each participant was greeted and asked to provide documentation of informed consent. After situating themselves on the bed facing the display wall, each participant involved in the eye upon which track boxes were chosen. For example, consider a fix- tracking portion of the study donned the wearable eye tracker. A ation detected at location x, with trackboxes t1 , t2 , t3 , and AOIs A, laptop was then placed in front of them on a small rolling table B, . . ., I as illustrated in Figure 8. Treating t1 as the origin of the and the participant was asked to view the calibration dot sequence. reference frame, trackboxes t2 and t3 as well as the fixation x are Following calibration, each participant viewed the image stimulus translated to the origin by subtracting the coordinates of trackbox (or blank monitors) for two minutes as timed by a stopwatch. t1 . Following translation, the coordinates of trackbox t2 define the rotation angle, θ = tan−1 (t2y /t2x ). A standard rotation matrix is Subjects. 109 healthy college students took part in the study, used to rotate fixation point x to bring it in alignment with the hor- with a small subsample (21) participating in the eye tracking por- izontal x-axis. Finally, if trackbox t3 is located two-thirds across tion. and down the panel display, then the fixation coordinates are scaled by 2/3. The now axis-aligned and scaled fixation point x is checked Experimental Design. The study used a mixed randomized de- for which third of the axis-aligned box it is positioned in and the ap- sign. Analysis of recorded gaze points by participants wearing the propriate label is assigned. Note that this method of AOI tracking eye tracker was performed based on a repeated measures design is scale- and 2D-rotationally-invariant. It is not, however, invari- where the set of fixations generated by each individual was treated ant to shear, resulting from feature rotation in 3D (e.g., perspective as the within-subjects fixed factor. rotation). Following fixation localization, another text file is then output Discarded Data. Four recordings were collected over each of with one line per fixation. Each line contains the subject number, four stimulus images with four additional recordings displaying stimulus identifier, AOI label, and fixation duration. This informa- no image as control. There was one failed attempt to record data tion is then reformatted for subsequent statistical analysis by the over the purple flower field stimulus. A replacement recording was statistical package used (R in this case). made. There were 21 sessions in all. Ten recordings were discarded during post processing because 4 Applied Example video quality prohibited effective eye tracking. In each of these videos some combination of multiple factors rendered them unus- In an experiment conducted to better understand the potential health able. These factors included heavy mascara, eyelid occlusion, fre- benefits of images of nature in a hospital setting, participants’ gaze quent blinking, low contrast between iris and sclera, poor position- was recorded along with physiological and self-reported psycho- ing of eye cameras, and calibration dots not in the field of view. We logical data collected. successfully processed 2 control, 4 yellow field, 1 tree, 2 fire, and 2 purple flower field videos. Eye Movement Analysis. For analysis of fixations within AOIs, Poor camera positioning could have been discovered and cor- trackboxes were placed at the corners of the corners of the 3×3 rected if the cameras provided real-time video feedback. Our hard- panel display in the scene video. All 9 AOIs were assumed to be ware did not support online processing. Online processing could equally-sized connected rectangles (see Figure 9). Trackboxes were have provided additional feedback allowing for detection and miti- used to determine AOI position orientation and scale. Out of plane gation of most other video quality issues. rotation was not considered. Trackboxes on the outside corners of the 3×3 grid were preferred. Otherwise linear interpolation was 5 Results used to determine exterior boundaries of the grid. Using AOIs and image type as fixed factors (with participant as the Stimulus. Using the prospect-refuge theory of landscape prefer- random factor [Baron and Li 2007]), repeated-measures two-way ence [Appleton 1996], four different categories of images (see Fig- ANOVA indicates a marginally significant main effect of AOI on ure 10) were viewed by participants before and after undergoing fixation duration (F(9,1069) = 2.08, p < 0.05, see Figure 11).2 Av- a pain stressor (hand in ice water for up to 120 seconds). A fifth eraging over image types, pair-wise t-tests with pooled SD indicate group of participants (control) viewed the same display wall (see below) with the monitors turned off. 2 Assuming sphericity as computed by R. 240
  7. 7. (a) (b) (c) (d) Figure 10: Stimulus images: (a) yellow field: prospect (Getty Images), (b) tree: refuge (Getty Images), (c) fire: hazard (Getty Images), (d) purple flower field: mixed prospect and refuge (courtesy Ellen A. Vincent). Fixation Durations vs. AOI Fixation Durations vs. AOI Mean Fixation Durations (in ms; with SE) Mean Fixation Durations (in ms; with SE) 1800 4000 control 3500 yellow field 1600 tree 3000 fire lavender field 1400 2500 1200 2000 1500 1000 1000 800 500 600 0 A B C D E F G H I O A B C D E F G H I O AOI AOI Figure 11: Comparison of mean fixation duration per AOI aver- Figure 12: Comparison of mean fixation duration per AOI and per aged over image types, with standard error bars. image type, with standard error bars. no significant differences in fixation durations between any pair of a propensity of viewers to look around more when presented with AOIs. stimulus than when there is nothing of interest at all. A similar observation could be made regarding fixation durations Repeated-measures ANOVA also indicates a significant main ef- found over region C (upper right) for the purple flower field image, fect of image type on fixation duration (F(34,1044) = 1.78, p < an image with which viewers perceived lower sensory pain com- 0.01), with AOI × image interaction not significant (see Figure 12). pared to those who viewed other landscape images and no images Averaging over AOIs, pair-wise t-tests with pooled SD indicate sig- with statistical significance at α = 0.1 [Vincent et al. 2009]. How- nificantly different fixation durations between the control image ever, the difference in fixation durations over region C is not signif- (blank screen) and the tree image (p < 0.01, with Bonferroni cor- icant according to the pair-wise post-hoc analysis. rection). No other significant differences were detected. 7 Conclusion 6 Discussion A match-moving approach was presented to help automate analy- Averaging over image types, the marginally significant difference sis of eye movements collected by a wearable eye tracker. Tech- in fixation durations over AOIs suggests that longest durations tend nical contributions addressed video stream synchronization, pupil to fall on central AOIs (E and H). This simply suggests that viewers detection, eye movement analysis, and tracking of dynamic scene tend to fixate the image center. This is not unusual, particularly Areas Of Interest (AOIs). The techniques were demonstrated in in the absence of a specific viewing task [Wooding 2002]. Post- the evaluation of eye movements on images of nature viewed by hoc pair-wise comparisons failed to reveal significant differences, subjects participating in an experiment on the perception of well- which is likely due to the relatively high variability of the data. being. Although descriptive statistics of gaze locations over AOIs Averaging over AOIs shows that the tree image drew signifi- failed to show significance of any particular AOI except the center, cantly shorter fixations than the control (blank) screen. Due to av- the methodology is applicable toward similar future studies. eraging, however, it is difficult to infer further details regarding fix- ation duration distributions over particular image regions. Cursory References examination of Figure 12 suggests shorter fixations over the center A NLIKER , J. 1976. Eye Movements: On-Line Measurement, Anal- panels (E & H), compared to the longer dwell times made when the ysis, and Control. In Eye Movements and Psychological Pro- screen was blank. Considering the averaging inherent in ANOVA, cesses, R. A. Monty and J. W. Senders, Eds. Lawrence Erlbaum this could just mean that fixations are more evenly distributed over Associates, Hillsdale, NJ, 185–202. the tree image than over the blank display, where it is fairly clear that viewers mainly looked at the center panels. This may suggest A PPLETON , J. 1996. The Experience of Landscape. John Wiley & a greater amount of visual interest offered by the tree image and Sons, Ltd., Chicester, UK. 241
  8. 8. BABCOCK , J. S. AND P ELZ , J. B. 2004. Building a Lightweight M EGAW, E. D. AND R ICHARDSON , J. 1979. Eye Movements and Eyetracking Headgear. In ETRA ’04: Proceedings of the 2004 Industrial Inspection. Applied Ergonomics 10, 145–154. Symposium on Eye Tracking Research & Applications. ACM, San Antonio, TX, 109–114. M ORIMOTO , C. H. AND M IMICA , M. R. M. 2005. Eye Gaze Tracking Techniques for Interactive Applications. Computer Vi- BALLARD , D. H., H AYHOE , M. M., AND P ELZ , J. B. 1995. Mem- sion and Image Understanding 98, 4–24. ory Representations in Natural Tasks. Journal of Cognitive Neu- roscience 7, 1, 66–80. M UNN , S. M. AND P ELZ , J. B. 2008. 3D point-of-regard, position and head orientation from a portable monocular video-based eye BARON , J. AND L I , Y. 2007. Notes on the use of R for psy- tracker. In ETRA ’08: Proceedings of the 2008 Symposium on chology experiments and questionnaires. Online Notes. URL: Eye Tracking Research & Applications. ACM, Savannah, GA, <∼baron/rpsych/rpsych.html> 181–188. (last accessed December 2007). M UNN , S. M., S TEFANO , L., AND P ELZ , J. B. 2008. Fixation- B USWELL , G. T. 1935. How People Look At Pictures. University identification in dynamic scenes: Comparing an automated al- of Chicago Press, Chicago, IL. gorithm to manual coding. In APGV ’08: Proceedings of the 5th Symposium on Applied Perception in Graphics and Visual- C ROW, F. C. 1984. Summed-area tables for texture mapping. In ization. ACM, New York, NY, 33–42. SIGGRAPH ’84: Proceedings of the 11th Annual Conference on Computer Graphics and Interactive Techniques. ACM, New PAOLINI , M. 2006. Apple Pro Training Series: Shake 4. Peachpit York, NY, 207–212. Press, Berkeley, CA. D UCHOWSKI , A., M EDLIN , E., C OURNIA , N., G RAMOPADHYE , P ELZ , J. B., C ANOSA , R., AND BABCOCK , J. 2000. Extended A., NAIR , S., VORAH , J., AND M ELLOY, B. 2002. 3D Eye Tasks Elicit Complex Eye Movement Patterns. In ETRA ’00: Movement Analysis. Behavior Research Methods, Instruments, Proceedings of the 2000 Symposium on Eye Tracking Research Computers (BRMIC) 34, 4 (November), 573–591. & Applications. ACM, Palm Beach Gardens, FL, 37–43. F REED , A. R. 2003. The Effects of Interface Design on Telephone R EICH , S., G OLDBERG , L., AND H UDEK , S. 2004. Deja View Dialing Performance. M.S. thesis, Pennsylvania State Univer- Camwear Model 100. In CARPE’04: Proceedings of the 1st sity, University Park, PA. ACM Workshop on Continuous Archival and Retrieval of Per- sonal Experiences. ACM Press, New York, NY, 110–111. JACOB , R. J. K. AND K ARN , K. S. 2003. Eye Tracking in Human- Computer Interaction and Usability Research: Ready to Deliver RYAN , W. J., D UCHOWSKI , A. T., AND B IRCHFIELD , S. T. 2008. the Promises. In The Mind’s Eye: Cognitive and Applied Aspects Limbus/pupil switching for wearable eye tracking under variable of Eye Movement Research, J. Hyönä, R. Radach, and H. Deubel, lighting conditions. In ETRA ’08: Proceedings of the 2008 Sym- Eds. Elsevier Science, Amsterdam, The Netherlands, 573–605. posium on Eye Tracking Research & Applications. ACM, New York, NY, 61–64. L ANCASTER , P. AND Š ALKAUSKAS , K. 1986. Curve and Surface Fitting: An Introduction. Academic Press, San Diego, CA. S ALVUCCI , D. D. AND G OLDBERG , J. H. 2000. Identifying Fix- ations and Saccades in Eye-Tracking Protocols. In ETRA ’00: L AND , M., M ENNIE , N., AND RUSTED , J. 1999. The Roles of Proceedings of the 2000 Symposium on Eye Tracking Research Vision and Eye Movements in the Control of Activities of Daily & Applications. ACM, Palm Beach Gardens, FL, 71–78. Living. Perception 28, 11, 1307–1432. S MEETS , J. B. J., H AYHOE , H. M., AND BALLARD , D. H. 1996. L AND , M. F. AND H AYHOE , M. 2001. In What Ways Do Goal-Directed Arm Movements Change Eye-Head Coordina- Eye Movements Contribute to Everyday Activities. Vision Re- tion. Experimental Brain Research 109, 434–440. search 41, 25-26, 3559–3565. (Special Issue on Eye Movements and Vision in the Natual World, with most contributions to the V INCENT, E., BATTISTO , D., G RIMES , L., AND M C C UBBIN , J. volume originally presented at the ‘Eye Movements and Vision 2009. Effects of nature images on pain in a simulated hospital in the Natural World’ symposium held at the Royal Netherlands patient room. Health Environments Research and Design. In Academy of Sciences, Amsterdam, September 2000). press. L I , D. 2006. Low-Cost Eye-Tracking for Human Computer Inter- W EBB , N. AND R ENSHAW, T. 2008. Eyetracking in HCI. In Re- action. M.S. thesis, Iowa State University, Ames, IA. Techreport search Methods for Human-Computer Interaction, P. Cairns and TAMU-88-010. A. L. Cox, Eds. Cambridge University Press, Cambridge, UK, 35–69. L I , D., BABCOCK , J., AND PARKHURST, D. J. 2006. openEyes: A Low-Cost Head-Mounted Eye-Tracking Solution. In ETRA ’06: W OODING , D. 2002. Fixation Maps: Quantifying Eye-Movement Proceedings of the 2006 Symposium on Eye Tracking Research Traces. In Proceedings of ETRA ’02. ACM, New Orleans, LA. & Applications. ACM, San Diego, CA. L I , D. AND PARKHURST, D. 2006. Open-Source Software for Real-Time Visible-Spectrum Eye Tracking. In Conference on Communication by Gaze Interaction. COGAIN, Turin, Italy. L I , D., W INFIELD , D., AND PARKHURST, D. J. 2005. Star- burst: A hybrid algorithm for video-based eye tracking com- bining feature-based and model-based approaches. In Vision for Human-Computer Interaction Workshop (in conjunction with CVPR). 242