Skovsgaard Small Target Selection With Gaze Alone


Published on

Accessing the smallest targets in mainstream interfaces using gaze
alone is difficult, but interface tools that effectively increase the size of selectable objects can help. In this paper, we propose a conceptual framework to organize existing tools and guide the development of new tools. We designed a discrete zoom tool and conducted a proof-of-concept experiment to test the potential of the framework and the tool. Our tool was as fast as and more accurate than the currently available two-step magnification tool. Our framework shows potential to guide the design, development, and testing of zoom tools to facilitate the accessibility of mainstream
interfaces for gaze users.

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Skovsgaard Small Target Selection With Gaze Alone

  1. 1. Small-Target Selection with Gaze Alone Henrik Skovsgaard∗ Julio C. Mateo† John Paulin Hansen§ IT University of Copenhagen John M. Flach‡ IT University of Copenhagen Wright State University Abstract sets), negatively impacts gaze-pointing accuracy. To reliably iden- tify fixations and saccades, gaze-tracking systems use algorithms Accessing the smallest targets in mainstream interfaces using gaze based on velocity, dispersion, or a combination of both [Duchowski alone is difficult, but interface tools that effectively increase the 2007]. For example, a velocity threshold can be set such that gaze size of selectable objects can help. In this paper, we propose a velocities faster than this threshold are considered part of a saccade conceptual framework to organize existing tools and guide the de- whereas slower velocities are considered part of a fixation. velopment of new tools. We designed a discrete zoom tool and conducted a proof-of-concept experiment to test the potential of the Gaze-tracking systems use detected fixations and saccades to break framework and the tool. Our tool was as fast as and more accu- gaze movements into pointing and selection components. If a sac- rate than the currently available two-step magnification tool. Our cade is detected, it is assumed to belong to the pointing component. framework shows potential to guide the design, development, and However, fixations can occur both during pointing and during se- testing of zoom tools to facilitate the accessibility of mainstream lection. That is, users may look at an object because they want to interfaces for gaze users. inspect it further (i.e., inspection fixations) or because they want to select it (i.e., selection fixations). The most common method to dis- CR Categories: H.5.1 [INFORMATION INTERFACES tinguish inspection and selection fixations is to set a time threshold AND PRESENTATION]: Multimedia Information Systems— (i.e., dwell time). That is, fixations lasting longer than dwell time Evaluation/methodology are considered part of the selection component whereas shorter fix- ations are considered part of the pointing component. In general, a selection fixation results in an activation at the cursor location and, Keywords: gaze interaction, universal access, zoom interfaces if the cursor is on top of a target, a target selection. 1 Introduction Approaches to address the limited accuracy of gaze pointing in or- der to enhance the accessibility to mainstream GUIs can be grouped Mainstream graphical user interfaces (GUIs) are generally designed into two categories. Some approaches aim at reducing the noise in with the mouse user in mind. As a consequence, users who rely on the input (gaze) signal, whereas others aim at increasing the toler- alternative input devices may encounter difficulties when accessing ance of interfaces to noisy inputs. These two approaches are not these GUIs. In this paper, we will focus on issues encountered by mutually exclusive and, in fact, usually complement each other. users of gaze tracking systems when selecting the smallest targets in mainstream GUIs. The limited accuracy of gaze pointing (when 1.1 Reducing Noise in the Input Signal compared to mouse pointing) can make small-target selection very difficult for gaze-input users. Before discussing ways to address the The most common way to reduce the noise in the gaze signal is to limited accuracy of gaze input, we will briefly review how the gaze smooth (i.e., low-pass filter) the signal to increase the steadiness of signal is processed and which factors affect gaze-pointing accuracy. the cursor. Most commercial gaze trackers smooth the input signal Point-and-select operations, such as pointing at an icon and clicking before displaying the cursor. In fact, it is generally accepted that, on it to open an application, are typical of mainstream GUIs. Mouse given the jitter inherent to eye movements, some degree of smooth- users physically move the mouse to point and press the mouse but- ing is necessary to use gaze as an input signal. However, smoothing ton to issue an activation (i.e., select). Pointing is straightforward also results in reduced responsiveness to gaze movements (i.e., time for gaze-input users as well, but our eyes lack a selection mech- delay) and, therefore, there is a tradeoff between cursor steadiness anism. To identify when a user wants to issue an activation, gaze and responsiveness. Actually, cursor smoothing effectively reduces tracking systems divide eye movements into saccades and fixations. the frame rate of the system by averaging across gaze samples. Saccades are fast movements that cover relatively large spatial re- Signal smoothing and fixation-detection algorithms are not inde- gions when users move their gaze from one location of interest to pendent from each other. On the one hand, the amount of smooth- the next. Fixations are relatively slow movements performed in a ing applied to the gaze signal can impact the velocity threshold limited spatial region when a user is inspecting an object of interest. used in the fixation-detection algorithm. That is, smoother signals Even during fixations, the eyes are continuously moving. This in- need lower velocity thresholds than less smooth signals to reliably herent eye jitter, combined with gaze tracker inaccuracies (e.g., off- distinguish between fixations and saccades. On the other hand, the output of fixation-detection algorithms can be used to inform ∗ e-mail: when smoothing is applied. For example, cursor smoothing can be † stopped as soon as the algorithm detects a saccade and re-activated ‡ during fixations to increase cursor responsiveness. § e-mail: Copyright © 2010 by the Association for Computing Machinery, Inc. 1.2 Increasing Interface Tolerance to Noise Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for commercial advantage and that copies bear this notice and the full citation on the An alternative approach to dealing with noisy inputs is to design first page. Copyrights for components of this work owned by others than ACM must be GUIs that are tolerant to noise. For example, typing interfaces de- honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on veloped for gaze users display very large buttons (e.g., GazeTalk; servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail [Hansen et al. 2003]) or provide other interface features to avoid the need to select small targets (e.g., Dasher; [Ward et al. 2000]). ETRA 2010, Austin, TX, March 22 – 24, 2010. © 2010 ACM 978-1-60558-994-7/10/0003 $10.00 145
  2. 2. Start time - Dwell End time Figure 1: Illustration of the different zoom tools. Target of interest Row 1 depicts a target selection with dwell (i.e., 1 no tool). Row 2 depicts how the continuous zoom Start time - Continuous Zoom End time tool gradually magnifies the target area. Row 3 depicts how n-step tools 2 work. A two-step version would end before enter- ing the Additional Mag- Start time - N-Step Zoom End time nification loop, a three- step version would go through the loop once, 3 and so on. The shrinking red dots in row 1 and 3 indicate dwell time. Additional Magnification (N > 2) The use of dedicated software allows developers to have full access as the target increased in size. Third, we expected target selection to the information underlying the environment in which the user is to be faster because the user would not need to perform two sepa- acting (e.g., target locations). This information can be used to aid rate point-and-select operations. Fourth, we expected the maximum small-target selection (e.g., force fields; [Zhang et al. 2008]). How- magnification level possible to be greater than using a two-step tool ever, the development of dedicated GUIs for gaze users does not with a window of similar size because the entire region around the address accessibility to mainstream GUIs. cursor did not need to be magnified all at once. A way to increase the tolerance of mainstream GUIs to noise is to In our previous experiment, we found that this zoom tool facilitated develop tools that interface with these GUIs to effectively increase small-target selection when compared to no tool [Skovsgaard et al. the size of selectable objects. These tools are generally more lim- 2008], but it did not compare favorably to a two-step tool. Rather, ited than dedicated GUIs due to their inability to access all informa- the two-step tool was more accurate and rated more favorably than tion (e.g., target locations) underlying mainstream GUIs. The most the zoom tool. At least three factors might have contributed to the common of these tools is two-step magnification [Lankford 2000], poor performance and ratings of the zoom tool. First, our zoom- which is often available in commercial gaze trackers. This two- ing tool transformed a discrete point-and-select operation (with a step tool divides the point-and-select task into two steps requiring a still target) into a continuous tracking task (with a moving target). point-and-select operation each. During the first step, the detection Second, once zooming started, the user could not control the rate at of a selection component does not result in an activation. Rather, a which content zoomed in. Third, the impact of the time delay result- magnified (usually 2, 3, or 4x) version of the area surrounding the ing from processing and smoothing the gaze signal was amplified cursor pops up. During the second step, the detection of a selection due to the first two factors. As a result, users corrections often led to component (on the magnified window) results in an activation. As- instability (i.e., increasing error, rather than reducing it). It is pos- suming the target is within the magnified area, this tool effectively sible that performing a tracking task using gaze input would not be increases target size and, therefore, increases the GUI tolerance to problematic without delay. However, some delay is inherent to all noise. Although helpful for small-target selection, the two-step tool current gaze-tracking systems as a result of signal processing and slows down interaction and may feel unnatural to the user. smoothing. Therefore, tools developed to access mainstream GUIs must be tolerant to both noise and delay. 2 Unanticipated Limitations of Zoom Tools 3 Re-evaluating the Design of Zoom Tools In an attempt to address the limitations of the two-step tool, we de- veloped a zoom tool to access mainstream GUIs. This tool was in- In our first implementation, we did not anticipate how our con- spired by previous work with dedicated interfaces (e.g., StarGazer; tinuous zoom tool would change the task or how delay would af- [Hansen et al. 2008]), which showed that zooming could help with fect performance. Empirical results challenged our assumption that noisy input. Bates and Istance [2002] had also proposed the use continuous interaction would always be more natural than discrete of zooming interfaces to facilitate access to mainstream GUIs for interaction. Instead, continuous interaction seemed unnatural with gaze-input users. However, their tool magnified the whole screen delayed feedback. In fact, the manual-control literature suggests and was controlled manually. In contrast, our gaze-controlled tool that, in the presence of delays, users naturally adopt a move-and- presented a smooth animation surrounding the cursor. When a wait strategy [Ferrell 1965]. That is, users transform the continuous short fixation was detected, the content in this window gradually task into a series of discrete components. Ironically, our attempt to increased in size (as if approaching the user) for the duration of make the task more natural backfired because, even though con- a predetermined zoom time. After this time elapsed, an activation tinuous interaction may be more natural in real-world situations, was issued on the cursor position (i.e., the center of this window). discrete interaction is more natural in the presence of time delays. See row 2 of Figure 1 for an illustration. We expected this zoom tool to have at least four advantages over 3.1 Discrete Zoom Tools the two-step tool. First, we expected its continuous looming ap- pearance to feel more natural to the user. Second, we expected the Based on the results of our first study, we designed a discrete zoom user to be able to make online corrections to the cursor position tool, which is conceptually equivalent to an n-step tool, combining 146
  3. 3. 2 (Discrete) (Continuous) 8 and 6 females). Novices had no previous experience with gaze in- Steps teraction. We used an IG-30 eye tracker from Alea Technologies 2-Ste Disc Con in a desktop setting. Participants were instructed to use a gaze- p Dwe rete tinu controlled cursor to point to the target present in the workspace as ll Zo om ous Zoo quickly and accurately as possible. Circular targets appeared one at m a time at 1 of 16 possible locations equidistant (300 pixels) from the homing circle on the center. A trial started when a participant posi- Figure 2: The zoom framework. tioned the gaze cursor on the homing circle and ended as soon as the participant issued an activation using the corresponding method. A successful target selection was not required. Each participant com- features of two-step and zoom tools (see row 3 of Figure 1 for an il- pleted 16 blocks of 16 trials, resulting in a total of 256 activations lustration). Because zooming occurs in discrete steps, we expected per participant. All independent variables were manipulated within this tool to be more tolerant to delay than the continuous zoom tool. participants and fixed within blocks. When compared to the two-step tool, we expected more steps to We manipulated zoom tool, target size, and smoothing. Zoom tool permit greater magnification levels because, after the first step, the had 4 levels: dwell (no zoom), two-step tool, three-step tool, and content can be magnified further without increasing window size. optimized three-step tool. The magnification level (4x) and dwell Obviously, adding steps can also slow down performance. How- time (600 ms) of the two-step tool were chosen based on available ever, given that early steps require lower accuracy than the two- versions of this tool. In fact, we purposefully chose a relatively step tool, we expected discrete zoom to accommodate lower dwell high level of magnification and a relatively short dwell time. The times. We also expected the discrete zoom tool to result in more three-step tool had the same magnification level and dwell time as of a zooming sensation than two-step while providing users more the two-step tool, whereas the optimized three-step tool had twice control over zooming rate than continuous zoom. the magnification (8x) and half the dwell time (300 ms). Achiev- ing 8x magnification with a two-step tool is virtually impossible 3.2 The Zoom Framework with a magnified window of the size used in this experiment. The 2 levels of target size were 6- and 12-pixel diameters (to represent Based on our experience developing and testing tools to facilitate some of the smallest targets in the environment). The 2 levels of the selection of small targets using gaze alone, we created a concep- smoothing (no smoothing and 10-sample average) were applied to tual framework to organize existing tools designed for small-target the raw eye-tracker data and velocity thresholds were adjusted ac- selection (Figure 2). All the tools in this framework increase the cordingly. We measured hit rate, completion time, and subjective effective size of targets (i.e., zoom) to facilitate small-target selec- ratings. Data were analyzed with a repeated measures ANOVA and tion. This framework organizes tools in a discrete-to-continuous LSD correction in the post-hoc tests. continuum. The two-step and continuous zoom tools can be placed, respectively, on the discrete and continuous ends of this continuum. We expected the three-step tool to: (a) feel more natural, (b) be The two-step tool suddenly increases target size to its maximum more resistant to noisy input, and (c) enable reliable selection of magnification level, whereas continuous zoom increases target size smaller targets than the two-step tool. We did not expect discrete in what could be considered an infinite number of infinitely small zoom to be faster than the two-step tool, but we did expect an op- steps. Consistent with these two extremes, tools closer to the dis- timized three-step version to achieve similar speeds to the two-step crete end of the spectrum tend to have less steps of longer duration, tool without sacrificing accuracy. This optimized version was ex- whereas tools closer to the continuous end of the spectrum tend to pected to be able to accommodate lower dwell times and greater have more steps of shorter duration. The theoretical shorter dura- magnification levels than current two-step tools. tion per step of tools with more steps (i.e., more continuous) is the Due to space limitations, we emphasize the results that are most result of shorter dwell times when compared to tools with less steps relevant to the zoom framework. All data analyses were conducted (i.e., more discrete). Tools toward the continuous end of the spec- on the data from novices. Experts were used for comparison pur- trum tend to require the user to carry out a more tracking-like task, poses. Target size, smoothing, and subjective-rating results will not whereas tools toward the discrete end can be better characterized as be described in detail. Suffice to say that target size affected hit rate a series of point-and-select operations. In addition, tools towards but not completion time, whereas smoothing affected completion the continuous end of the spectrum tend to permit higher magnifi- time but not hit rate. Hit rate was lower for smaller targets than for cation levels because objects can increase in size within a window larger targets, F(1, 4) = 19.90, p < 0.05. Smoothing over 10 sam- of constant size. Therefore, more continuous tools are less limited ples resulted in longer completion times than no smoothing, F(1, by the size of the zooming window. 4) = 11.06, p < 0.05. We found no evidence suggesting that no In general, discrete zoom tools fall in between these two extremes. smoothing had a greater impact on the two-step than on the three- The specific three-step version we test below falls closer to the dis- step tool. Therefore, this experiment did not support the hypothesis crete end (see Figure 2). Even if close to two-step, we argue that that a three-step tool is more resistant to noise than two-step. Pre- this three-step tool can facilitate selection of very small targets and liminary analyses suggest that participants did not rate the three naturalness of interaction when compared to two-step magnifica- zoom tools different from each other, but some differences were tion. We also argue that this framework may facilitate comparisons apparent between dwell and all three tools (i.e., dwell was rated as among tools. By studying how tools vary along the continuum, this faster but less accurate than zoom tools). We found no evidence of framework could provide insights into useful tool features and sug- the three-step tool being perceived as more natural than the two-step gest ways in which future designs can combine these features. tool. Zoom tool had a significant effect on hit rate, F(3, 21) = 32.43, p 4 Discrete Zoom Tools: Proof of Concept < 0.05. Mean hit rate was lowest without zoom (M = 0.04, SD = 0.03). The hit rates of the two-step (M = 0.24, SD = 0.11) and three- In order to study the potential of discrete zoom tools, we conducted step tools (M = 0.29, SD = 0.12) were not significantly different an experiment to compare different zoom tools. Participants in- from each other, t(7) = 1.22, p > 0.05. The optimized three-step cluded 2 male expert users (first two authors) and 8 novices (2 males tool (M = 0.48, SD = 0.14) had a higher hit rate than the three-step 147
  4. 4. 1.0   termine whether this result is due to a lack of difference between 0.9   Novice   Expert   tools or to a lack of sensitivity of the measures we used. Finally, 0.8   even if mean values varied substantially, we found a similar pat- 0.7   tern of results across a wide range of expertise levels. This result suggests that findings from novices may generalize to more experi- Mean  Hit  Rate   0.6   0.5   enced users and novice-user data may be useful to evaluate interface 0.4   tools. 0.3   0.2   5 Summary and Conclusions 0.1   0.0   Selecting the smallest targets in mainstream GUIs using gaze alone Dwell   Two-­‐Step   Three-­‐Step   Three-­‐Step  Op:mized   is not easy. Although some tools exist, there is little theoretical Zoom  Tool   guidance for the development of tools to facilitate accessibility to mainstream GUIs for gaze users. Based on our previous work, we Figure 3: Mean hit rates for the 8 novices and the 2 experts as a proposed a conceptual framework to categorize existing tools and function of zoom tool. guide the development of new tools. As a proof of concept, we de- signed a discrete zoom tool and generated hypotheses about how 4500   it would compare to other zoom tools based on this framework. 4000   Novice   Expert   We conducted an experiment in which the optimized three-step dis- Mean  Comple+on  Time  (ms)   3500   crete zoom tool we proposed achieved better performance than a two-step tool modeled after existing tools. Results suggest that our 3000   framework holds potential to guide the development of zoom tools 2500   to enhance accessibility to mainstream GUIs for gaze users. 2000   1500   References 1000   500   BATES , R., AND I STANCE , H. 2002. Zooming interfaces!: en- hancing the performance of eye controlled pointing devices. In 0   Dwell   Two-­‐Step   Three-­‐Step   Three-­‐Step  Op5mized   Proceedings of the fifth international ACM conference on Assis- Zoom  Tool   tive technologies, ACM, Edinburgh, Scotland, 119–126. Figure 4: Mean completion times for the 8 novices and the 2 ex- D UCHOWSKI , A. T. 2007. Eye tracking methodology. Springer. perts as a function of zoom tool. F ERRELL , W. 1965. Remote manipulation with transmission delay. IEEE Transactions on Human Factors in Electronics 6, 24–32. H ANSEN , J. P., J OHANSEN , A. S., H ANSEN , D. W., I TOH , K., tool, t(7) = 4.57, p < 0.05. These results are consistent with our AND M ASHINO , S. 2003. Command without a click: Dwell hypothesis that better accuracy can be achieved with a three-step time typing by mouse and gaze selections. In INTERACT 2003, than with a two-step tool. Given the difference between three-step IOS Press, 121–128. and optimized three-step, the accuracy advantage is probably due to the latter’s greater magnification level. Mean hit rates across zoom H ANSEN , D. W., S KOVSGAARD , H. H. T., H ANSEN , J. P., AND tools show a similar pattern for novices and experts (Figure 3). M LLENBACH , E. 2008. Noise tolerant selection by gaze- controlled pan and zoom in 3D. In Proceedings of the 2008 Zoom tool also had a significant effect on completion time, F(3, 21) symposium on Eye tracking research & applications, ACM, Sa- = 119.04, p < 0.05. Completion times were shortest without zoom vannah, Georgia, 205–212. (M = 1581 ms, SD = 192 ms). The two-step (M = 3193 ms, SD = 441 ms) and optimized three-step tools (M = 3152 ms, SD = 375 L ANKFORD , C. 2000. Effective eye-gaze input into windows. In ms) were not significantly different from each other, t(7) = 0.39, Proceedings of the 2000 symposium on Eye tracking research p > 0.05. The three-step tool (M = 3905 ms, SD = 442 ms) took & applications, ACM, Palm Beach Gardens, Florida, United longer than the two-step tool, t(7) = 5.35, p < 0.05. These results States, 23–27. are consistent with our hypothesis that a three-step tool can achieve S KOVSGAARD , H., M ATEO , J., AND H ANSEN , J. P. 2008. How speeds comparable to a relatively fast version of the two-step tool can tiny buttons be hit using gaze only? In COGAIN 2008, (given shorter dwell time in the three-step tool). Again, the pattern COGAIN, Prague, Czech Republic, vol. 4, 38–42. of results was very similar for novices and experts (Figure 4). WARD , D. J., B LACKWELL , A. F., AND M AC K AY , D. J. C. 2000. Overall, the results of this experiment are promising. We found sup- Dasher - a data entry interface using continuous gestures and port for the possibility that discrete zoom tools can achieve similar language models. In Proceedings of the 13th annual ACM sym- speeds and greater accuracy than available two-step tools. Future posium on User interface software and technology, ACM, San research should explore whether this finding generalizes to situa- Diego, California, United States, 129–137. tions in which distractors are present and to tasks in which success- ful target selection is required. Future studies should also explore Z HANG , X., R EN , X., AND Z HA , H. 2008. Improving eye cursor’s whether a two-step tool could accommodate lower dwell times and stability for eye pointing tasks. In Proceeding of the twenty- whether having different dwell times for different steps could be sixth annual SIGCHI conference on Human factors in computing beneficial. Our smoothing manipulation and subjective ratings did systems, ACM, Florence, Italy, 525–534. not support our hypothesis that three-step tools are more tolerant to noise and natural than two-step tools. Research with a wider range of smoothing levels and subjective ratings could help de- 148