Paulin hansen.2011.gaze interaction from bed


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Paulin hansen.2011.gaze interaction from bed

  1. 1. GAZE INTERACTION FROM BED John Paulin Hansen IT University of Copenhagen Rued Langgaards vej 7 Copenhagen, Denmark +45 72185000 Javier San Augustin IT University of Copenhagen Rued Langgaards vej 7 Copenhagen, Denmark +45 72185000 Henrik Skovsgaard IT University of Copenhagen Rued Langgaards vej 7 Copenhagen, Denmark +45 72185000 ABSTRACT This paper presents a low-cost gaze tracking solution for bedbound people composed of free-ware tracking software and commodity hardware. Gaze interaction is done on a large wall- projected image, visible to all people present in the room. The hardware equipment leaves physical space free to assist the person. Accuracy and precision of the tracking system was tested in an experiment with 12 subjects. We obtained a tracking quality that is sufficiently good to control applications designed for gaze interaction. The best tracking condition were achieved when people were sitting up compared to lying down. Also, gaze tracking in the bottom part of the image was found to be more precise than in the top part. Categories and Subject Descriptors H.5.2 [User Interfaces]: Input devices and strategies, Standardization, Evaluation/methodology General Terms Design, Reliability, Experimentation, Human Factors, Standardization. Keywords Gaze tracking, gaze interaction, universal access, target selection, alternative input, interface design, disability, assistive technology, healthcare technology, augmented and alternative communication. 1. INTRODUCTION People with severe motor disabilities have been pioneering gaze interaction since the early 1990s [1]. They can communicate with friends and family by gaze typing, browse the Internet and play computer games. Several commercial gaze-tracking systems support these activities well. Most of the systems are fixed into a single hardware unit consisting of a monitor, one or more cameras and infrared (IR) light sources. Systems with all hardware components built-in usually offer high accuracy and tolerance to head movements. However, in some situations this might reduce the flexibility of the setup, making it difficult to use the system in a non-desktop scenario. Figure 1 shows a commercial gaze communication system mounted above a reclining person who has ALS/MND. First, the space requirements for this setup may seriously obstruct caretaking routines. Second, the limitation of the viewing-angle of the monitor makes it difficult for people standing around the bed to follow what this person is doing with his eyes. Third, if a single part of the unit breaks down, all of it will have to be sent off for replacement or repair, leaving the user without communication means for days. Finally, the relatively high cost of commercial gaze communication systems may prevent some people with severe disabilities from having access to one. Figure 1: A person with ALS/MND using a gaze communication system from his bed. People at hospitals who are paralyzed due to a severe medical condition may also be considered for bedside gaze communication. For instance, patients with frontal burns and a lung injury commonly have a tracheostomy tube in the front of the neck and therefore unable to speak. Obviously, it is of uttermost importance for patient safety that they are able to communicate with the medical staff. Furthermore, being able to talk with their relatives may help them getting through a difficult time. Consequently, we see a need for a gaze tracking system that does not occupy the physical space in front of the user. Preferably, the system could apply a large display that can be seen by a group of people and it should be composed of inexpensive hardware components (display, camera, IR lights and PC) that can be substituted immediately if they fail. In this paper we examine a system that meets these requirements with off-the-shelf hardware that tracks the user’s gaze from a distance. We use a standard video camera placed at the end of the bed, connected to a PC running an open-source gaze tracking system. A display image is projected onto a wall in front of the bed, providing visibility for everyone standing in the room and, most importantly, freeing the physical space around the user. The accuracy and precision of the system are evaluated in an experiment and the effect of lying down vs. seating is also investigated. 2. PREVIOUS WORK Accuracy refers to the degree to which the sensor readings represent the true value of what is measured. In gaze interaction, accuracy is measured as the distance between the point-of-gaze estimated by the system and the point where the user is actually looking. When the accuracy is low, the user will see an offset Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. NGCA '11, May 26-27 2011, Karlskrona, Sweden Copyright 2011 ACM 978-1-4503-0680-5/11/05…$10.00.
  2. 2. between the cursor location and the point on the screen where he is looking. Most gaze tracking systems introduce an offset. In some cases, big offsets can make it difficult to hit small targets. Furthermore, the offset may vary across the monitor, usually being larger in the corners. Precision refers to the extent to which successive readings of the eye tracking system agree in value, and measures the spread of the gaze point recordings over time. When the precision is low, the cursor might become jittery if it’s not smoothed. A fixation detection algorithm will use the spread of the gaze samples to detect fixations and smooth the signal to reduce jitter [2, 3]. Manufacturers of gaze tracking systems commonly state the spatial resolution of their systems between 0.5 and 1.0 degrees of visual angle, (see e.g. [4]). The accuracy is measured by calculating the error in the gaze position over a set of targets displayed on the screen. If a gaze tracking system provides an accuracy of 0.5 degrees, this will be sufficient to hit targets larger than approximately 20 x 20 pixels at a distance of 50 cm. However, in a standard user interface like Windows, several of the interactive elements are smaller than that - down to 6 x 6 pixels. Compensating zooming tools may provide gaze access to the small targets, for instance by first enlarging a part of the screen and then offering a second, final selection within the enlargement [5]. Several dedicated applications have been designed to support inaccurate gaze pointing. Some of them employ large screen buttons, e.g. GazeTalk [6] and some employ a continuous zoom build into the selection process, e.g. StarGazer [7]. Recent studies have looked into long-range gaze interaction with large displays. Kessels et al. [8] compared touch and gaze interaction with transparent displays on shop windows and found touch to be faster than gaze, while the participants in their study appreciated the novel experience of interacting with gaze. Sippl et al. [9] were able to distinguish between which of the four quarters of the screen people were looking at while standing at different distances and viewing angles in front of a large display. This resolution may be enough for e.g. marketing research (with objects of interest placed in each corner) but it would not be efficient for gaze interaction with a standard computer interface. San Agustin et al. [10] demonstrated how a low-cost gaze tracking system could effectively track people 1.5 to 2 meters in front of a 55” monitor after a short calibration procedure. They reported problems tracking people with glasses and some disturbances from external light sources. 3. METHODOLOGY 3.1 Participants Twelve participants, six women an six men ranging in ages from 24 to 52 years (M = 33.6 years, SD = 9.7 years) volunteered to participate in the study. All of them were daily users of computers and all but one participant had prior experience with gaze tracking. None of the participants were using glasses but 3 had contact lenses. 3.2 Apparatus A Mac-mini computer executed the ITU Gaze Tracker open- source software (see [11] for more details and download of this system) with Intel Core2Duo processor running Windows 7 Professional. An Optima HD67 projector was 3 meters away from a white wall, creating an image of 140 cm (w) x 110 cm (h) with a resolution of 1280 pixels (w) x 1024 pixels (h) (Figure 2). A Modux 4 Lojer nursing bed (210 cm x 90 cm) was standing between the projector and the wall. The bed was used in two positions. In the seated position the back was lifted to 45 degrees. From this position, the distance from the subject’s head to the top right and left image corners was 270 cm, and 240 cm to the bottom corners. In the lying position (flat) the distances were 300 cm to the top corners and 260 cm to the bottom corners. A pillow was offered for comfort. A Sony HDR-HC5 video camera was mounted on a stand behind the bed just below the projected image. The camera was then zoomed in to capture one of the subject’s eyes with night vision mode and telemacro enabled. The images were sent from the camera to the computer via a Firewire-connection. One Sony IVL IR lamp was mounted on the end of the bed. Total cost of the apparatus (excluding the nursing bed) is approximately 2000 €. 3.3 Procedure Half of the subjects started in a seating position and half of them would start in a lying position. First they conducted the standard calibration procedure on the ITU Gaze Tracker by looking at 9 points appearing in random order on the screen. The calibration was redone until the accuracy value reported by the software was better than 3 degrees. One female participant had to be excluded from the experiment at this point, since she could not achieve a satisfying initial calibration. Immediately after the calibration, participants were told to gaze on 16 points randomly appearing one-by-one in a 4 x 4 grid. Targets disappeared after a total of 50 samples had been collected at 30 Hz. In the second part of the experiment the positioning of the subject was changed and the full procedure was repeated (i.e., calibration plus 16 measures). The full experiment lasted less than 10 minutes for each participant. Figure 2: The experimental setup with a subject in a seated position. A monitor projects the gaze active display on the wall. A video camera standing behind the bed records the eye movements and an IR light, mounted on the end of the bed, provides the corneal reflection for the gaze tracking system. 3.4 Measures In the evaluation of the system we have based the performance measures on the recommendations given in the working copy of the COGAIN report on standard for measuring Eye tracker accuracy terms and definitions [12]. Accuracy, Adeg is defined as the average angular distance, θi (measured in degrees of visual angle) between n fixation locations and the corresponding fixation targets (Equation 1).
  3. 3.   (1) The spatial precision is calculated as the Root Mean Square (RMS) of the angular distance θi (measured in degrees of visual angle) between successive samples (xi, yi) to (xi + 1, yi + 1) (see Equation 2).   (2)   3.5 Design A within-participant factorial design was employed. Position (lying and seated) was used as the first independent variables. Based on our experience, accuracy and precision tend to differ across the screen area, especially between the middle area and the top and bottom areas. We treated target locations as the second independent variable, distinguishing between measures from the eight targets in the two middle rows of the grid, the four top targets and the four bottom target locations. Dependent variables were accuracy and precision. In total, 11 participants performed 2 trials, each consisting of 16 targets giving a total of 352 measures. In summary the design was: Independent variables: Position (seated, lying) Target location (top, middle or bottom row) Dependent variables: Accuracy (degrees) Precision (degrees) 4. RESULTS We removed a total of 27 outliers found to be 3 standard deviations above the mean of either accuracy or precision. We then conducted an ANOVA on the dependent variables. The grand mean of accuracy was 1.17 degrees (SD = 0.84 degrees). There was a main effect from Position F(1, 10) = 10.98, p < 0.001. The seated position (M = 0.96 degrees) was significantly different from the lying position (M = 1.31 degrees). There was no effect from top, middle and bottom target locations and no interaction effects. Accuracy was correlated with the initial accuracy value reported by the gaze tracker right after calibration, r = 0.46, (Pearson Product-Moment). The grand mean of precision was 0.73 degrees (SD = 1.27). Again, there was a main effect from Position F(1, 10) = 6.95, p < 0.01. The seated position (M = 0,47) was significantly different from the lying position (M = 0.89). There was a main effect from target locations F(2, 10) = 3.10, p < 0.05. The Scheffe post-hoc test showed the precision for the top row targets (M = 0.99) to be significantly lower than the precision for the bottom (M = 0.51), p < 0.05, while the middle rows (M = 0.71) were no different from the others. Figure 3 shows the effect of position on accuracy and precision from two of the subjects. Gaze samples in the lying position are noticeably more spread and less accurate than in the seated position Figure 3: Data from two subjects (right and left column) in the seated (top) and laying (bottom) position. 5. DISCUSSION The average accuracy of 1.17 degree is lower than what most commercial systems claim to offer. However, in the present setup the eye has to be recorded from a rather long distance (approx. 2 meters) and not the usual 50 cm. Also, we conducted the experiment with a standard video camera and not a high- performance machine vision camera like the ones normally used in commercial systems. Under these circumstances we were able to obtain a sufficient spatial resolution for the gaze tracking system to support interaction with a range of applications that have high tolerance to noise or even to a windows environment with additional zoom-selector tools. Both accuracy and precision were influenced by the user’s position. Raising the back 45 degrees would improve system performance considerably. While some people can easily be lifted to this position, the clinical conditions of other people may dictate them to keep lying flat. In this case we may consider creating a projection onto a white canvas hanging slightly tilted from the ceiling over the bed with IR lights mounted on the frame of the canvas (Figure 4A) and/or using a smaller camera placed close to the users eye (Figure 4B).     Figure 4A: A canvas hanging from the ceiling with a gaze interactive image projection. Figure 4B: A web camera mounted on a flexible arm close to the user’s eye. Precision turned out to be lesser for the upper part of the display compared to the lower part. No previous studies that we are aware of have looked into the impact that viewing angle may have on spatial resolution. It is likely that viewing angle may be of importance for systems that determine the point-of-regard by tracking the position of a glint relative to the centre of the pupil because the pupil will appear more elliptic when seen by a camera from a low angle. The camera is also more likely to capture disturbing IR reflections from the eyelids from a low angle perspective. Adeg = 1 n n i=1 θi RMS = 1 n n i=1 θ2 i
  4. 4. 6. FUTURE WORK Gaze control of smart home technology (e.g. lights, bed adjustment, television and music player) can make a paralyzed person more self-sufficient by offering control of appliances connected to a PC [13]. Video projections onto walls may further extend the environmental control to several audio-visual media sources running simultaneously. For instance, a person in a hospital bed might like to have both a digital photo slide show and, at the same time, take part in a videoconference running with his family at home. Furthermore, he needs advanced control of the lights in the room for the projections and the outgoing video signals to work well. In his seminal paper, Bolt [14] envisioned gaze-orchestrated control of dynamic windows: “Some of the windows come and go, reflecting their nature as direct TV linkages into real-time, real-world events. Others are non-real-time, some dynamic, others static but capable of jumping into motion. Such an ensemble of information inputs reflects the managerial world of the top-level executive of the not too distant electronic future.” (p. 109) We believe that Bolt’s vision of the future can now be deployed with affordable technology supporting communication and entertainment needs for people bound to bed. To explore this, we are building a full-scale mock-up of a hospital room equipped with various media and smart home technology that are to be controlled by gaze only. 7. ACKNOWLEDGMENTS Our thanks to Mr. Ms. Fujisawa for their hospitality and advice regarding the needs of people with ALS/MND. This work was supported by The Danish Research Council (grant number 09- 075700 and 2106-080046). 8. REFERENCES [1] Paivi Majaranta and Kari-Jouko Raiha. 2002. Twenty years of eye typing: systems and design issues. In Proceedings of the 2002 symposium on Eye tracking research applications (ETRA '02). ACM, New York, NY, USA, 15- 22. [2] Duchowski, A. T. 2002. A breadth-first survey of eye- tracking applications. Behavior Research Methods, Instruments, Computers 34, 455–470. [3] Salvucci, D. D. and Goldberg, J. H. 2000. Identifying fixations and saccades in eye-tracking protocols. In Proceedings of the 2000 symposium on Eye tracking research applications. ACM, Palm Beach Gardens, Florida, United States, 71–78. [4] [5] Henrik Skovsgaard, Julio C. Mateo, John Paulin Hansen. 2011. Evaluating gaze-based interface tools to facilitate point-and-select tasks with small targets. Journal of Behaviour Information Technology, 2011 (accepted). [6] Hirotaka, A.; Hansen, J. P.; Itoh, K. (2008). Learning to interact with a computer by gaze. Journal of Behaviour and Information Technology, Volume 27, Number 4, July 2008 , pp. 339-344(6) [7] Dan Witzner Hansen, Henrik H. T. Skovsgaard, John Paulin Hansen, and Emilie Møllenbach. 2008. Noise tolerant selection by gaze-controlled pan and zoom in 3D. In Proceedings of the 2008 symposium on Eye tracking research applications (ETRA '08). ACM, New York, NY, USA, 205-212.Hansen, Stargazer [8] Angelique Kessels, Evert Loenen, and Tatiana Lashina. 2009. Evaluating Gaze and Touch Interaction and Two Feedback Techniques on a Large Display in a Shopping Environment. In Proceedings of the 12th IFIP TC 13 International Conference on Human-Computer Interaction: Part I (INTERACT '09), Springer-Verlag, Berlin, Heidelberg, 595-607. [9] Andreas Sippl, Clemens Holzmann, Doris Zachhuber, and Alois Ferscha. 2010. Real-time gaze tracking for public displays. In Proceedings of the First international joint conference on Ambient intelligence (AmI'10), Springer- Verlag, Berlin, Heidelberg, 167-176. [10] Javier San Agustin, John Paulin Hansen, and Martin Tall. 2010. Gaze-based interaction with public displays using off- the-shelf components. In Proceedings of the 12th ACM international conference adjunct papers on Ubiquitous computing (Ubicomp '10). ACM, New York, NY, USA, 377- 378. [11] [12] ms.pdf [13] Fulvio Corno, Alastair Gale, Päivi Majaranta and Kari-Jouko Räihä (2010): Eye-based Direct Interaction for Environmental Control in Heterogeneous Smart Environments. In: Handbook of Ambient Intelligence and Smart Environments 2010, Part IX, 1117-1138, Springer+Science Business Media. [14] Richard A. Bolt. 1981. Gaze-orchestrated dynamic windows. In Proceedings of the 8th annual conference on Computer graphics and interactive techniques (SIGGRAPH '81). ACM, New York, NY, USA, 109-119.