Measuring Vergence Over Stereoscopic Video with a Remote Eye Tracker
            Brian C. Daugherty†                      ...
from about 5.3 to 7.3 cm) [Smith and Atchison 1997]. Vergence
(a) back visual plane                           (b) middle visual plane                             (c) front visual plane...
3 Results                                                                                               Acknowledgments
Upcoming SlideShare
Loading in …5

Daugherty Measuring Vergence Over Stereoscopic Video With A Remote Eye Tracker


Published on

A remote eye tracker is used to explore its utility for ocular vergence measurement. Subsequently, vergence measurements are compared in response to anaglyphic stereographic stimuli as well as in response to monoscopic stimulus presentation on a standard display. Results indicate a highly significant effect of anaglyphic stereoscopic display on ocular vergence when viewing a stereoscopic calibration video. Significant convergence measurements were obtained for stimuli fused in the anterior image plane.

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Daugherty Measuring Vergence Over Stereoscopic Video With A Remote Eye Tracker

  1. 1. Measuring Vergence Over Stereoscopic Video with a Remote Eye Tracker Brian C. Daugherty† Andrew T. Duchowski†∗ Donald H. House†∗ Celambarasan Ramasamy‡ † School of Computing & ‡ Digital Production Arts, Clemson University Abstract The angle between the visual axes is the vergence angle. When a person fixates a point at infinity, the visual axes are parallel and the A remote eye tracker is used to explore its utility for ocular ver- vergence angle is zero. The angle increases when the eyes converge. gence measurement. Subsequently, vergence measurements are For symmetrical convergence, the angle of horizontal vergence φ is compared in response to anaglyphic stereographic stimuli as well related to the interocular distance a and the distance of the point of as in response to monoscopic stimulus presentation on a standard fixation from a point midway between the eyes D by the expres- display. Results indicate a highly significant effect of anaglyphic sion: tan (φ/2) = a/(2D). Thus, the change in vergence per unit stereoscopic display on ocular vergence when viewing a stereo- change in distance is greater at near than at far viewing distances. scopic calibration video. Significant convergence measurements were obtained for stimuli fused in the anterior image plane. About 70% of a person’s normal range of vergence is used within one meter from the eyes. The angle of vergence changes about CR Categories: I.3.3 [Computer Graphics]: Picture/Im- 14◦ when gaze is moved from infinity to the nearest distance for age Generation—Display algorithms; I.3.6 [Computer Graphics]: comfortable convergence at about 25 cm. Vergence changes about Methodology and Techniques—Ergonomics; 36◦ when the gaze moves to the nearest point to which the eyes can converge. About 90% of this total change occurs when the eyes Keywords: eye tracking, stereoscopic rendering converge from 1 m. 1 Introduction & Background In this paper, vergence measurements are made over anaglyphic stereo imagery. Although depth perception has been studied on desktop 3D displays [Holliman et al. 2007], eye movements were Since their 1838 introduction by Sir Charles Wheatstone [Lipton not used to verify vergence. Holliman et al. conclude that depth 1982], stereoscopic images have appeared in a variety of forms, judgment cannot always be predicted from display geometry alone. including dichoptic stereo pairs (different image to each eye), The only other similar effort we are aware of is measurement of in- random-dot stereograms [Julesz 1964], autostereograms (e.g., the terocular distance on a stereo display during rendering of a stereo popular Magic Eye and Magic Eye II images), and anaglyphic im- image at five different depths [Kwon and Shul 2006]. Interocular ages and movies, with the latter currently resurging in popularity distance was seen to range by about 10 pixels across three partici- in American cinema (e.g., Monsters vs. Aliens, DreamWorks An- pants. In this paper, we report observations on how binocular an- imation and Paramount Pictures). An anaglyphic stereogram is a gular disparity (of twelve participants) is affected when viewing an composite image consisting of two colors and two slightly differ- anaglyphic stereo video clip. ent perspectives that produces a stereoscopic image when viewed through two corresponding colored filters. Although the elicited perception of depth appears to be effective, relatively little is known How can vergence be measured when viewing a dichoptic computer about how similarly this effect may be on viewers’ eye movements. animation presented anaglyphically? To gauge the effect of stereo Autostereograms, for example, are easily fused by some, but not by display on ocular vergence, it is sufficient to measure the disparity others. between the left and right horizontal gaze coordinates, e.g., xr − xl given the left and right gaze points, (xl , yl ), (xr , yr ) as delivered When the eyes move through equal angles in opposite directions, by current binocular eye trackers. Thus far, to our knowledge, the disjunctive movement, or vergence, is produced [Howard 2002]. In only such vergence measurements to have been carried out have horizontal vergence, each visual axis moves within a plane con- been performed over random dot stereograms [Essig et al. 2004]. taining the interocular axis. When the visual axes move inwards, the eyes converge; when the axes move outwards, they diverge. The question that we are attempting to address is whether the ver- The convergent movement of the eyes (binocular convergence), gence angle can be measured from gaze point data captured by a i.e., their simultaneous inward rotation toward each other (cf. di- (binocular) eye tracker. Of particular interest is the measure of rel- vergence denotes the outward rotation), ensures that the projec- ative vergence, that is, the change in vergence from fixating a point tion of images on the retina of both eyes are in registration with P placed some distance ∆d behind (or in front of) point F , the each other, allowing the brain to fuse the images into a single per- point at which the visual axes converge at viewing distance D. The cept. This fused percept provides stereoscopic vision of three- visual angle between P and F at the nodal point of the left eye is φl , dimensional space. Normal binocular vision is primarily charac- signed positive if P is to the right of the fixation point. The same terized by this type of fusional vergence of the disparate retinal im- angle for the right eye is φr , signed in the same way. The binocular ages [Shakhnovich 1977]. Vergence driven by retinal blur is distin- disparity of the images of F is zero, since each image is centered guished as accommodative vergence [B¨ ttner-Ennever 1988]. u on each eye’s visual axis. The angular disparity η of the images of ∗ Email: P is φl − φr . If θF is the binocular subtense of point F and θP is {andrewd | dhouse} the binocular subtense of point P , then η = φl − φr = θP − θF . Copyright © 2010 by the Association for Computing Machinery, Inc. Thus, the angular disparity between the images of a pair of objects Permission to make digital or hard copies of part or all of this work for personal or is the binocular subtense of one object minus the binocular subtense classroom use is granted without fee provided that copies are not made or distributed for commercial advantage and that copies bear this notice and the full citation on the of the other (see Figure 1). first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on Given the binocular gaze point coordinates reported by the eye servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail tracker, (xl , yl ) and (xr , yr ), an estimate of η can be derived fol- lowing calculation of the distance ∆d between F and P , obtained ETRA 2010, Austin, TX, March 22 – 24, 2010. © 2010 ACM 978-1-60558-994-7/10/0003 $10.00 97
  2. 2. from about 5.3 to 7.3 cm) [Smith and Atchison 1997]. Vergence is assumed to be symmetrical (although in our experiments no chin rest was used and so the head was free to rotate, violating this as- sumption; for a derivation of angular disparity under this condition see Howard and Rogers [2002]). Viewing distance is assumed to P be D = 50 cm, the operating range of the eye tracker, although the tracker allows head movement within a 30 × 15 × 20 cm volume (see below). It is also important to note that the above derivation of angular disparity (vergence) assumes zero binocular disparity when viewing point F at the screen surface. Empirical measure- θP ments during calibration show that this assumption does not always ∆d hold, or it may be obscured by the noise inherent in the eye tracker’s (x ,y ) position signal. l l F (x r ,yr ) 2 Methodology A within-subjects experiment was conducted to test vergence mea- φl surement. One of nine video clips served as the independent vari- θF able in the analysis, a subset of a larger study. Two versions of the φr left eye video clip were used to test vergence response. The only difference D between the two versions was that one was rendered in a standard two-dimensional monoscopic format while the other was rendered right eye in a red-cyan anaglyphic stereoscopic format. Ocular angular ver- a gence response served as the dependent variable. The operational hypothesis was simply that a significant difference in vergence re- sponse would be observed between watching the monoscopic and Figure 1: Binocular disparity of point P with respect to fixation stereoscopic versions of the video. point F , at viewing distance D with (assumed) interocular distance a [Howard and Rogers 2002]. Given the binocular gaze point co- 2.1 Apparatus ordinates on the image plane (xl , yl ) and (xr , yr ) the distance be- tween F and P , ∆d, is obtained via triangle similarity. Assuming A Tobii ET-1750 video-based corneal reflection (binocular) eye symmetrical vergence and small disparities, angular disparity η is tracker was used for real-time gaze coordinate measurement (and derived (see text). recording). The eye tracker operates at a sampling rate of 50 Hz with an accuracy typically better than 0.3◦ over a ±20◦ horizontal and vertical range using the pupil/corneal reflection difference [To- via triangle similarity: bii Technology AB 2003] (in practice, measurement error ranges roughly ± 10 pixels). The eye tracker’s 17 LCD monitor was a xr − xl (xr − xl )D set to 1280 × 1024 resolution and the stimulus display was maxi- = ⇒ ∆d = . (D + ∆d) ∆d a − (xr − xl ) mized to cover the entire screen (save for its title bar at the top of the screen). The eye tracking server ran on a dual 2.0 GHz AMD For objects in the median plane of the head, φl = φr so the total Opteron 246 PC (2 G RAM) running Windows XP. The client dis- disparity η is 2φ degrees. By elementary geometry, φ = θF − θP play application ran on a 2.2 GHz AMD Opteron 148 Sun Ultra 20 [Howard and Rogers 2002]. If the interocular distance is a, running the CentOS operating system. The client/server PCs were connected via 1 Gb Ethernet (connected via a switch on the same θP a θF a subnet). Participants initially sat at a viewing distance of about 50 tan = and tan = . cm from the monitor, the tracker video camera’s focal length. 2 2(D + ∆d) 2 2D For small angles, the tangent of an angle is equal to the angle in 2.2 Participants radians. Therefore, Twelve college students (9 M, 3 F; ages 22-27) participated in the a a −a∆d study, recruited verbally on a volunteer basis. Only three partici- η = 2φ ≈ − or η ≈ 2 . (1) 2(D + ∆d) 2D D + D∆d pants had previously seen a stereoscopic film. Participants were not screened for color blindness or impaired depth perception. Since for objects within Panum’s fusional area ∆d is usually small by comparison with D we can write 2.3 Stimulus −a∆d η≈ . (2) The anaglyphic stereogram used in this study was created with a red D2 image for the left eye, and a cyan image for the right eye. Likewise, Thus, for symmetrical vergence and small disparities, the disparity viewers of these images wore glasses with a red lens in front of the between the images of a small object is approximately proportional left eye, and a cyan lens in front of the right. The distance between to the distance in depth of the object from the fixation point. corresponding pixels in the red and cyan images creates an illusion of depth when the composite image is fused together by the viewer. In the current analysis, the following assumptions are made for sim- plicity. Interocular distance is assumed to be the average separation Eight anaglyphic video clips were shown to participants, with a between the eyes (a = 6.3 cm), i.e., the average for all people ninth rendered traditionally (monoscopically). All nine videos were regardless of gender, although this can vary considerably (ranging computer-generated. The first of the anaglyphic videos was of a 98
  3. 3. (a) back visual plane (b) middle visual plane (c) front visual plane Figure 2: Calibration video, showing a white disk visiting each of the four corners and the center of each visual plane, along with a viewer’s gaze point (represented by a small rectangle) during visualization: (a) the disk appears to sink into the screen, (b) the disk appears at the monocular, middle image plane; (c) the disk appears to “pop out” of the screen. The size of the gaze point rectangle is scaled to visually depict horizontal disparity. A smaller rectangle, as in (a), represents divergence, while a larger rectangle, as in (c), represents convergence. roving disk in three-dimensional space, as shown in Figure 2. The data be properly aligned with video frames over which eye move- purpose of this video was calibration of vergence normalization, as ments were recorded (this is needed for subsequent visualization). the stereoscopic depth of the roving disk matched the depth of the Neither data streams necessarily begin at the same time, nor are other video clips. The goal was to elicit divergent eye movements they streamed at the same data rate. The video display library as the disk sunk into and beyond the monocular image plane, and to used (xine-lib) provides media player style functionality (ver- elicit convergent eye movements as the disk passed through and in sus video processing), and as such is liable to drop frames following front of the monocular image plane. The roving disk moves within video stream decompression in order to maintain the desired play- a cube, texture-mapped with a checkerboard texture to provide ad- back speed. All videos used in the experiment were created to run ditional depth cues. The disk starts moving in the back plane. After at 25 frames per second. If no video frames were dropped, synchro- stopping at all four corners and the center, the disk moves closer to nization is straightforward, since the eye tracker records data at 50 the viewer to the middle plane. The disk again visits each of the Hz, and relies mainly on identification of a common start point (the four corners and the center, before translating to the front plane, eye tracker provides a timestamp that can be used for this purpose where again each of the four corners and center is visited. Only the assuming both streams are initiated at about the same time, e.g., as 40 s calibration video clip is relevant to the analysis given in this controlled by the application). paper. The video was always the first viewed by each participant. For binocular vergence analysis, eye movement analysis is required 2.4 Procedure to both smooth the data, to reduce inherent noise due to eye move- ment jitter, as well as to identify fixations within the eye movement Demographic information consisting of the age and gender of each data stream. A simple and popular approach to denoising is the participant was collected. Each participant filled out a short pre- use of a smoothing (averaging) filter. For visualization playback, test questionnaire regarding his or her familiarity with anaglyphic the coordinate used is the linear average of the filter (of width 10 stereographs. A quick (two-dimensional) calibration of the eye frames, in the present case). tracker was performed by having participants visually follow a rov- ing dot between nine different locations on the screen. After 2D The use of a smoothing filter can introduce lag, depending on the fil- calibration, participants were presented with a second, this time 3D ter’s temporal position within the data stream. If the filter is aligned (stereoscopic) calibration video of the roving disk, translating in to compute the average of the last ten gaze data points, and the 2D as well as in depth. Next, participants were shown three short difference in timestamps between successive gaze points is 20 ms, videos, either of the long videos (stereo or mono), three more short then the average gaze coordinate from the filter summation will be videos, and once again the long video (stereo or mono). The order 100 ms behind. To alleviate this lag, the filter is temporally shifted of presentation of the short videos followed a Latin square rotation. forward by half its length. The order of the long videos was toggled for viewers such that all odd-numbered viewers saw the stereoscopic version first, all even- Care in the filtering summation is taken to ignore invalid gaze data numbered viewers saw the monoscopic version first. points. The eye tracker will, on occasion, e.g., due to blinks or other reasons for loss of eye image in the tracker’s cameras, flag Participants were instructed to keep looking at the center of the rov- gaze data as invalid (a validity code is provided by the eye track- ing calibration disk as it moved on the screen, during each of the ing server). In addition to the validity code, gaze data is set to 2D and 3D calibrations. No other instructions were given to par- (−1, −1), which, if folded into the smoothing filter’s summation, ticipants as they viewed the other 9 stimulus video clips (a “free would inappropriately skew the gaze centroid. Invalid data is there- viewing” task was implied). fore ignored in the filter’s summation, resulting in potentially fewer gaze points considered for the average calculation. To avoid the 2.5 Data Stream Synchronization problem of the filter potentially being given only a few or no valid points with which to compute the average, a threshold of 80% is A major issue concerning gaze data analysis over dynamic me- used. If more than 80% of the filter’s data is invalid, then the filter’s dia such as video is synchronization. It is imperative that gaze output is flagged as invalid and is not drawn. 99
  4. 4. 3 Results Acknowledgments This work was supported in part by IIS grant #0915085 from Following recording of raw eye movement data, the collected gaze the National Science Foundation (HCC: Small: Eye Movement in points (xl , yl ), (xr , yr ) and timestamp t were analyzed to detect Stereoscopic Displays, Implications for Visualization). fixations in the data stream. The angular disparity between the left and right gaze coordinates, given by equation (1), was calculated for every gaze point that occurred during a fixation, as identified by References an implementation of the position-variance approach, with a spatial deviation threshold of 0.029 and number of samples set to 10. Note ¨ B UTTNER -E NNEVER , J. A., Ed. 1988. Neuroanatomy of the Ocu- that the gaze data used in this analysis is normalized, hence the lomotor System. Reviews of Oculomotor Research, vol. II. Else- deviation threshold specified is in dimensionless units although it is vier Press, Amsterdam, Holland. typically expressed in pixels or degrees visual angle. The fixation E SSIG , K., P OMPLUN , M., AND R ITTER , H. 2004. Application of analysis code is freely available on the web.1 The angular disparity a Novel Neural Approach to 3D Gaze Tracking: Vergence Eye- serves as the dependent variable in the analysis of the experiment. Movements in Autostereograms. In Proceedings of the Twenty- Sixth Annual Meeting of the Cognitive Science Society, K. For- Averaging across each of the three calibration segments, when the bus, D. Gentner, and T. Regier, Eds. Cognitive Science Society, calibration stimulus was shown in each of the back, mid, and front 357–362. stereo planes, with image plane and subject acting as fixed fac- tors, repeated-measures (within-subjects) one-way ANOVA show H OLLIMAN , N., F RONER , B., AND L IVERSEDGE , S. 2007. An a highly significant effect of stereo plane on vergence response Application Driven Comparison of Depth Perception on Desktop (F(2,22) = 8.15, p < 0.01).2 Pairwise comparisons using t-tests 3D Displays. In Stereoscopic Displays and Applications XVIII. with pooled SD indicate highly significant differences between dis- SPIE. parities measured when viewing the front plane and each of the back and mid planes (p < 0.01), but not between the back and mid H OWARD , I. P. 2002. Seeing in Depth. Vol. I: Basic Mechanisms. planes, as shown in Figure 3. I Porteous, University of Toronto Press, Thornhill, ON, Canada. H OWARD , I. P. AND ROGERS , B. J. 2002. Seeing in Depth. Vol. II: Depth Perception. I Porteous, University of Toronto Press, Mean Disparity During Calibration Thornhill, ON, Canada. Mean Disparity (deg. visual angle, with SE) 1 J ULESZ , B. 1964. Binocular Depth Perception without Familiarity 0.9 Cues. Science 145, 3630 (Jul), 356–362. 0.8 K WON , Y.-M. AND S HUL , J. K. 2006. Experimental Researches on Gaze-Based 3D Interaction to Stereo Image Display. In Edu- 0.7 tainment, Z. Pan et al., Ed. Springer-Verlag, Berlin, 1112–1120. 0.6 LNCS 3942. 0.5 L IPTON , L. 1982. Foundations of the Stereoscopic Cin- ema: A Study in Depth. Van Nostrand Reinhold Com- 0.4 pany Inc., New York, NY. ISBN O-442-24724-9 , URL: 0.3 <>. 0.2 S HAKHNOVICH , A. R. 1977. The Brain and Regulation of Eye Back Mid Front Movement. Plenum Press, New York, NY. Stereo Plane S MITH , G. AND ATCHISON , D. A. 1997. The Eye and Visual Figure 3: Binocular disparity when viewing the stereoscopic cal- Optical Instrucments. Cambridge Univ. Press, Cambridge, UK. ibration video, averaging over all viewers within ∼40 s viewing T OBII T ECHNOLOGY AB. 2003. Tobii ET-17 Eye-tracker Product time split in thirds, i.e., when the calibration dot was seen in the Description. (Version 1.1). back plane, the mid plane, and then in the front plane. 4 Conclusion Results suggest that vergence is more active when viewing stereo- scopic imagery than when no stereo disparity is present. Moreover, a commercially available binocular eye tracker can be used to mea- sure vergence via estimation of horizontal disparity between the left and right gaze points recorded when fixating. 1 The position-variance fixation analysis code was originally made available by LC Technologies. The C++ interface and im- plementation ported from C by Mike Ashmore are available at: <>. 2 With sphericity assumed by R, the statistical package used throughout. 100