Kohlbecker Low Latency Combined Eye And Head Tracking System For Teleoperating A Robotic Head In Real Time


Published on

We have developed a low-latency combined eye and head tracker suitable for teleoperating a remote robotic head in real-time. Eye and head movements of a human (wizard) are tracked and replicated by the robot with a latency of 16.5 ms. The tracking is achieved by three fully synchronized cameras attached to a head mount. One forward-looking, wide-angle camera is used to determine the wizard’s head pose with respect to the LEDs on the video monitor; the other two cameras are for binocular eye tracking. The whole system operates at a sample rate of 220 Hz, which allows the capture and reproduction of biological movements as precisely as possible while keeping the overall latency low. In future studies, this setup will be used as an experimental platform for Wizard-of-Oz evaluations of gaze-based human-robot interaction. In particular, the question will be addressed as to what extent aspects of human eye
movements need to be implemented in a robot in order to guarantee a smooth interaction.

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Kohlbecker Low Latency Combined Eye And Head Tracking System For Teleoperating A Robotic Head In Real Time

  1. 1. Low-Latency Combined Eye and Head Tracking System for Teleoperating a Robotic Head in Real-Time Stefan Kohlbecher∗ Klaus Bartl† Stanislavs Bardins‡ Erich Schneider§ , , , Clinical Neurosciences, University of Munich Hospital Teleoperation of Head and Eyes Scene Camera Eye Tracker LED Head Pos Wizard Eye Pos Visual Feedback Computer Monitor Robot ELIAS Figure 1: Combined eye and head tracking goggles used to control a remote robot’s head and eyes. Abstract who as a rule are not familiar with computers, but can easily cope with a system that looks and acts like a human. We have developed a low-latency combined eye and head tracker suitable for teleoperating a remote robotic head in real-time. Eye A significant part of human-human communication is based on and head movements of a human (wizard) are tracked and replicated non-verbal cues, such as facial expressions, body language, and so by the robot with a latency of 16.5 ms. The tracking is achieved by on. Our overall aim is to integrate natural eye movements into hu- three fully synchronized cameras attached to a head mount. One manoid robotic platforms. In such an attempt, the question arises as forward-looking, wide-angle camera is used to determine the wiz- to which critical aspects of human eye movements need to be im- ard’s head pose with respect to the LEDs on the video monitor; the plemented, and which not. We will address this question with a re- other two cameras are for binocular eye tracking. The whole sys- search platform that will enable Wizard-of-Oz examinations of hu- tem operates at a sample rate of 220 Hz, which allows the capture man eye movements in human-robot interaction experiments. The and reproduction of biological movements as precisely as possible platform consists of a robotic head with eyes that perform better while keeping the overall latency low. In future studies, this setup than those of a human [Schneider et al. 2009b]. In particular, the will be used as an experimental platform for Wizard-of-Oz eval- eyes can rotate with a maximum angular velocity of 3400◦ /s and a uations of gaze-based human-robot interaction. In particular, the maximal acceleration of 170000◦ /s2 at a bandwidth of up to 20 Hz. question will be addressed as to what extent aspects of human eye A human operator – the wizard – sees the world from the robot’s movements need to be implemented in a robot in order to guarantee perspective and his reactions are in turn carried out by the robot. a smooth interaction. In this way, we can produce human eye movements in a robotic platform without first having to implement a complex gaze model. Furthermore, we are also able to artificially alter the robot’s eye Keywords: calibration, head-mounted, real-time movements to find out what precisely has to be implemented in a humanoid gaze model. Once gaze and head movement models have 1 Introduction been developed, the setup can still be used in the long run as a valu- able evaluation tool for comparing the performance of those models To date, the predominant interaction modality between a human directly to that of a real human. user and a computer is based on artificial interfaces: display, mouse, and keyboard. The transition from artificial computer interfaces to In this paper, we present the wizard’s part of the experimental setup, anthropomorphic robots with human-like functionality requires the in particular our solution for real-time eye and head tracking. A use of natural communication modalities. Human-like verbal and number of eye and head tracking methods can be found in the lit- non-verbal communication channels reduce the barriers to interac- erature, and there are also commercial systems available. The next tion with such robots. This is especially valuable for the elderly section gives a short overview of these systems and outlines the mo- tivation for designing a new eye tracker that is suitable for real-time ∗ skohlbecher@nefo.med.uni-muenchen.de teleoperation. † kbartl@nefo.med.uni-muenchen.de ‡ sbardins@nefo.med.uni-muenchen.de § eschneider@nefo.med.uni-muenchen.de 1.1 Requirements and Related Work Copyright © 2010 by the Association for Computing Machinery, Inc. The goal is to reproduce natural eye and head movements of a hu- Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed man (wizard) on a remote robotic head that communicates with an- for commercial advantage and that copies bear this notice and the full citation on the other human (participant). The most important factor in this setup first page. Copyrights for components of this work owned by others than ACM must be is low latency. Consider, for example, the case in which the robot honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on smoothly pursues the participant’s moving hand. While the partici- servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail pant is not able to tell the robot’s exact angle of gaze, he/she clearly permissions@acm.org. perceives any lags between the interaction with – and the reaction ETRA 2010, Austin, TX, March 22 – 24, 2010. © 2010 ACM 978-1-60558-994-7/10/0003 $10.00 117
  2. 2. of – the robot. Besides a constant low latency, a high sampling 2.1 Eye Tracking rate of around 200 Hz is required to correctly reproduce the human movements in dynamic situations. A system that is able to measure The binocular eye tracker consists of two infrared cameras mounted both head and eye movements simultaneously is preferred, since laterally on a pair of swimming goggles. The eyes are illuminated it does not require a subsequent complex synchronization between by infrared light sources in the goggle frame. A translucent hot the different components. To summarize, the requirements are low mirror in front of each eye reflects only the infrared light in the latency, high sampling rate, and combined head and eye tracking. direction of the camera. Figure 1 shows a picture of the setup. Numerous methods for combined measurements of eye and head A calibration laser is positioned between and slightly above the movements can be found in the literature. [Huebner et al. 1992] user’s eyes. The laser projects a calibration pattern to a wall in placed an additional search coil on the subject’s forehead to investi- front of the user by means of a diffraction grating [Pelz and Canosa gate the relationship between the vestibulo ocular reflex and smooth 2001]. For sufficiently large projection distances (>5 m), the par- pursuit. While the search coil offers superior spatial and temporal allax error introduced by the translation between eyeball center resolution, it is an invasive method requiring contact lenses with and laser becomes irrelevant, thus allowing the laser pattern to be wires. This limits the examination time of a subject to half an hour. used for calibration. Once calibrated, the eye tracker provides two- [Allison et al. 1996] combined a video-based eye tracker with a dimensional gaze direction values for each eye, with the primary magnetic head tracker for dynamic testing of the vestibular system. position being parallel to the central laser beam. While this is an interesting approach, we prefer to avoid integrating different systems for the above reasons. Commercially available re- The additional, forward-looking scene view camera used for head mote eye trackers like the Tobii T120, for example, are not suitable, tracking is placed on the center of the forehead (see Fig. 1). All because they have a latency of up to 33 ms and they cannot mea- cameras run at a resolution of 188×120 pixels and are synchronized sure the head pose. An increasingly popular method for measuring at 220 Hz, i.e., they take pictures at the same time and do not drift. combined eye and head movements is the use of computer vision [Smith et al. 2000; La Cascia et al. 2000]. The face is observed by 2.2 Head Tracking a camera, and a head model is matched against the image to deter- mine the head pose. Then the eyes in the image are detected, and Head tracking is accomplished by analyzing the images of infrared the gaze direction is calculated with respect to the head direction. LEDs that are located at the corners of a computer monitor [Cor- While this is the least invasive method – no physical contact with nelissen et al. 2002]. The head tracking algorithm involves two the subject is required –, it lacks the necessary spatial and, most steps. First, the marker LEDs are detected by image processing and notably, temporal resolution. Head-mounted eye trackers have the assigned to their corresponding projections in the image. Then the advantage of providing sufficient spatial resolution, but they do not position and orientation of the LED plane are computed with re- track head pose per se. Consequently, head tracking must be added spect to the scene camera, which, in turn, also provides the head to such a system for a suitable solution. [Cornelissen et al. 2002] position and orientation with respect to the LED plane. introduced such a system in which head position is inferred from four LEDs on the edges of a computer screen. A head-mounted 2.2.1 LED Detection and Assignment scene camera tracks the LEDs. To facilitate image processing, infrared LEDs are used and the Our system uses a head-mounted eye tracker reported on earlier scene camera is equipped with an infrared filter. The shutter and [Schneider et al. 2009a] to which an infrared scene camera was gain values can be adjusted so that only the LEDs appear in the added that tracks LEDs affixed to the corners of a computer screen image as bright, white spots. This allows robust and fast LED de- (Fig. 1, the presence of a fifth LED is explained later). The eye tection. The position of the LED projections are determined with tracker and the scene camera can be operated at up to 600 Hz. The subpixel accuracy by using a center of mass algorithm. system was previously designed to record each frame exactly at the same point in time – an attribute that was easily extended to the third A plane in 3D space is defined by three points. If only their two- camera. In contrast to Cornelissen’s setup, our system additionally dimensional projection onto a camera image plane is known, a tracks both eyes simultaneously in order to detect the distance to fourth point and the knowledge of the correspondences between the point of gaze by using vergence eye movements. Normally, the the projections and their original points are needed to reconstruct point of gaze should be on the computer screen. Our binocular eye the original pose in 3D space. tracker can discriminate between fixations on the screen plane and measurement artifacts that are indicated by invalid depth informa- The ambiguity that arises with four LEDs can be resolved in two tion. Because of the low latency of the original eye tracker and its different ways. First, the head rolling angle can be constrained to high sampling rate, the system meets all requirements and is also remain between ]−45◦ ; 45◦ [. Then the relations between the points fully customizable and synchronized to measure dynamic eye and are given by the quadrants in which they lie and with respect to the head movements. center of mass of all points. While this might be acceptable when the subject sits in front of a computer monitor, it is not a suitable method if the plane of interest lies flat on the surface of a table 1.2 Paper Outline and the subject is allowed to move around freely. With such an application in mind, a second solution was implemented, in which In the following, we present the algorithms used for eye and head an additional LED point is placed between the upper left and right tracking, and provide information on the model used for calibrating corners (see Fig. 1). the complete system. We also describe how the eye tracker fits into the Wizard-of-Oz setup. Finally, we evaluate the system resolution, 2.2.2 Head Pose Estimation latency, and accuracy. To maximize wizard mobility, the scene camera is equipped 2 Methods with a wide angle lens (2.1 mm focal length, 1/3”), which results in a horizontal view angle of 145◦ . As this This section describes the methods used for eye and head tracking requires compensation for spherical and tangential distor- and gives an overview of the robot setup. tion, we used the Camera Calibration Toolbox for Matlab 118
  3. 3. (http://www.vision.caltech.edu/bouguetj/calib doc/) to determine 4.2 Head Tracker Resolution the intrinsic camera parameters. The resolution of the head tracker was determined by mounting the Pose estimation of the computer monitor is based on an algorithm goggles 32 cm in front of the monitor and measuring head position described by [Shapiro and Haralick 1993]. Three lines in a plane and orientation noise for 1.5 s (see Fig. 2). Table 1 shows the root with one perpendicular to the other two can be reconstructed by mean square (RMS) of each component. their projections. In our case the upper, lower, and right borders of the screen are used. This allows the scene camera position and 0.2 orientation to be calculated with respect to the monitor. As the x [mm] 0 eye tracker uses the coordinate system defined by the calibration pattern, the rotation and translation between scene camera and cal- 0.2 ibration laser must be taken into account. The translation can be 0.2 y [mm] easily measured. To determine the rotation, the calibration laser is 0 turned on and pointed toward the computer monitor, which displays 0.2 the estimated calibration points. Now the rotation parameters can 0.2 be adjusted until both sets of points match. These parameters are z [mm] 0 systematic and have to be measured only once. Now the point at which the laser calibration pattern intersects with the monitor plane 0.2 can be predicted. 0.02 Hor [°] 0 2.3 Combined Tracking 0.02 0.02 Ver [°] To determine the intersection between the user’s line of sight and 0 the monitor, the centers of the eyeball must be identified. They are 0.02 found in a separate calibration step, in which the user has to fixate 0 0.5 1 1.5 two known points on the monitor without moving his head. Then Time [s] the centers of the eyeballs can be calculated from the intersection of the line of sight from each point. From now on the system is fully Figure 2: The noise of the head tracking algorithm, with the gog- calibrated and the intersection between each eye’s line of sight and gles fixated at a distance of 32 cm from the computer monitor. Hor- the computer monitor can be determined. izontal (x) and vertical (y) head positions as well as distance from the monitor (z) are shown in [mm]. The horizontal and vertical 3 Robot Setup head angles are given in [◦ ]. The remote controlled robot is based on a commercially available Hor. Pos. Vert. Pos. Distance Angle Hor. Angle Ver. platform (MetraLabs, Ilmenau, Germany). It is equipped with a 0.0395 0.0319 0.0157 0.0051◦ 0.0045◦ movable neck and pivotable eyes. The original eyes could not mm mm mm match human eye performance, so they were replaced with our own RMS RMS RMS RMS RMS camera orientation devices [Schneider et al. 2009b]. These active robotic vision cameras (see Fig 1) surpass human eye movements Table 1: Resolution of the head tracker measured at a distance of in terms of angular velocity and acceleration as well as bandwidth 32 cm from the monitor. by a factor of five. Thus, the robotic system is able to exactly re- produce the eye movement dynamics of the wizard. 4.3 System Accuracy Additionally, an extra wide angle scene view camera was mounted on the robot’s neck, by means of which the wizard oversees the A healthy subject was instructed to fixate 20 points on a computer experimental scene. The camera is mounted near the pivot point monitor at a distance of 35 cm (see Fig. 3: o,+,x). The stimulus of the head and eyes. After calibration, the wizard’s head and eye points (o) were arranged in a grid with an intra-point distance of movements can be directly mapped to head and eye movements of 6.8 cm. The intersections between the monitor plane and the left the robot. Since the robot’s scene camera is fixed with respect to its eye’s line of sight were plotted; the subject looked once straight at body, the scene view remains stable and thus does not generate any the monitor (+) and then again with the head turned 12.5◦ to the visual feedback effects. right (x). Horizontal eye movements were in the range [−25◦ ; 15◦ ] when looking straight at the monitor and in the range [−10◦ ; 25◦ ] when the head was turned to the right. Vertical eye movements 4 Results stayed between a range of [−10◦ ; 20◦ ]. The accuracy was lower near to the bottom edge, because the eyelashes distorted the pupil This section explains the rationale behind the chosen geometrical image. For our application, the accuracy achieved was more than setup and gives details on the resolution of the head tracker as well sufficient. as the accuracy and latency of the whole system. 4.4 System Latency 4.1 Geometrical Setup In order to determine the system latency a gyroscope was attached Since the linear region of the robot’s scene camera (ca. 90◦ ) had to an artificial eye that was mounted on a servo motor. The eye to cover the human wizard’s field of view, the wizard was placed tracking computer also output the pupil position immediately after 35 cm in front of a 20” monitor. At this distance, the wide angle calculation. A second gyroscope was mounted on one of the robot’s head tracking camera still allowed for head movements of 40◦ in eyes. Colored noise with a cutoff frequency of 10 Hz and a dura- yaw and 20◦ in pitch. tion of 10 s was used to drive the artificial eye. Fig. 4 exemplarily 119
  4. 4. Monitor x [cm] The above-described research platform can now be used to con- 0 5 10 15 20 25 30 35 40 duct experiments that will help to define the critical aspects of gaze- based human-robot interaction. With its ability to detect fixations 5 in 3D space relative to any given rectangular surface by intersecting the lines of sight of both eyes, the combined eye and head tracker 10 will in a future step be used together with a 3D monitor. Then, the robot will be equipped with a stereo scene camera, the output of Monitor y [cm] 15 which will be presented to the wizard on a stereo monitor. Thus, the robot will also be able to correctly reproduce the wizard’s vergence 20 eye movements. 25 Acknowledgements 30 The authors would like to thank our project partner, Frank Wall- hoff, MMK, TU-Muenchen for providing the robot ELIAS. We also thank Judy Benson for critically reading the manuscript. This work Figure 3: Subject fixating validation grid (o, 6.8 cm) at a distance is supported by Bundesministerium fuer Bildung und Forschung of 35 cm, when looking both straight at the monitor (+) and with (IFB, LMU) and in part within the DFG excellence initiative re- the head turned 12.5◦ to the right (x). search cluster “Cognition for Technical Systems – CoTeSys”, see also www.cotesys.org. shows the resulting velocity profiles. The eye tracker detects the References movement of the artificial eye after 5 ms (which equals one frame period of 4.5 ms plus 0.5 ms for transmission and calculation). The A LLISON , R., E IZENMAN , M., AND C HEUNG , B. 1996. Com- overall latency between movement of the artificial eye and move- bined head and eye tracking system for dynamic testing of the ment of the robot’s eye is 16.5 ms. vestibular system. IEEE Transactions on Biomedical Engineer- ing 43, 11, 1073–1082. 3 2 C ORNELISSEN , F., P ETERS , E., AND PALMER , J. 2002. The eye- Relative Velocity 1 link toolbox: eye tracking with matlab and the psychophysics 0 toolbox. Behavior Research Methods Instruments and Comput- 1 ers 34, 4, 613–617. 2 Art. Eye Eye Tracking Robot Eye 3 H UEBNER , W., L EIGH , R., S EIDMAN , S., T HOMAS , C., B IL - 0 50 100 150 200 250 300 350 400 450 500 LIAN , C., D I S CENNA , A., AND D ELL’O SSO , L. 1992. Ex- Time [ms] perimental tests of a superposition hypothesis to explain the re- 1 lationship between the vestibuloocular reflex and smooth pursuit Correlation Coefficient Eye Tracking 0.5 Robot Eye during horizontal combined eye-head tracking in humans. Jour- nal of neurophysiology 68, 5, 1775–1792. 0 0.5 L A C ASCIA , M., S CLAROFF , S., AND ATHITSOS , V. 2000. Fast, reliable head tracking under varying illumination: an approach 1 based on registration of texture-mapped 3 d models. IEEE Trans- 100 80 60 40 20 0 20 40 60 80 100 Lag [ms] actions on Pattern Analysis and Machine Intelligence 22, 4, 322– 336. Figure 4: Latency between movement of an artificial eye (Art. Eye), P ELZ , J., AND C ANOSA , R. 2001. Oculomotor behavior and per- calculation of the pupil position (Eye Tracking) and movement of ceptual strategies in complex tasks. Vision Research 41, 25-26, the robot’s eyes (Robot Eye). The artificial eye was driven by col- 3587–3596. ored noise with a cutoff frequency of 10 Hz. The correlation func- tions reveal a delay of 5 ms until the pupil is detected, and an over- S CHNEIDER , E., V ILLGRATTNER , T., VOCKEROTH , J., BARTL , all delay of 16.5 ms between movement of the artificial eye and K., KOHLBECHER , S., BARDINS , S., U LBRICH , H., AND movement of the robot’s eyes. B RANDT, T. 2009. EyeSeeCam: An eye movement-driven head camera for the examination of natural visual exploration. Annals of the New York Academy of Sciences 1164, 1 Basic and Clinical 5 Conclusion and Future Work Aspects of Vertigo and Dizziness, 461–467. This novel system allows real-time teleoperation of a robot’s head S CHNEIDER , E., KOHLBECHER , S., BARTL , K., AND WALL - HOFF , F. 2009. Experimental platform for wizard-of-oz eval- and eyes by synchronizing the combined tracking of a human wiz- ard’s head and eyes. The accuracy and workspace are well suited uations of biomimetic active vision in robots. In 2009 IEEE In- for the given application. The overall latency between the wizard’s ternational Conference on Robotics and Biomimetics (ROBIO). eye movement and the eye movement of the robot is roughly 5 ms S HAPIRO , L., AND H ARALICK , R. 1993. Computer and Robot for image acquisition and processing, plus about 11.5 ms for trans- Vision, vol. 2. Addison-Wesley, ch. 13, 73f. mitting the commands to the robot and moving its eyes, i.e. 16.5 ms total. This is already on the order of the fastest human (vestibulo S MITH , P., S HAH , M., AND DA V ITORIA L OBO , N. 2000. Mon- ocular) reflex, which has a delay of about 10 ms. In one of the itoring head/eye motion for driver alertness with one camera. In next steps, the motor latency will be further decreased by taking Pattern Recognition, 2000. Proceedings. 15th International Con- non-linearities at low speeds into account. ference on, vol. 4. 120