Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks Using Features Of Eye Movements

Estimation of Viewer’s Response for Contextual Understanding of Tasks
using Features of Eye-movements
Minoru Nakayama∗ Yuko Hayashi
CRADLE Human System Science
(The Center for R & D of Educational Technology) Tokyo Institute of Technology
Tokyo Institute of Technology

Abstract & Karn summarized eye tracking related metrics and effectiveness
while subjects completed various tasks [Jacob and Karn 2003]. Ad-
To estimate viewer’s contextual understanding, features of their ditionally, Ryner summarized features of eye-movements in the
eye-movements while viewing question statements in response to reading process [Rayner 1998]. Most metrics are based on fixa-
definition statements, and features of correct and incorrect re- tion and saccade during a specific task, and are scalar, not dimen-
sponses were extracted and compared. Twelve directional features sional. Therefore, high-level eye-movement metrics are required,
of eye-movements across a two-dimensional space were created, and some of these have already been proposed [Duchowski 2006].
and these features were compared between correct and incorrect re- Also, the features of eye-movement metrics across two-dimensional
sponses. The procedure of estimating the response was developed space have also been discussed, as eye-movements can be illus-
with Support Vector Machines, using these features. The estima- trated in two dimensions [Tatler 2007; Tatler et al. 2007]. The
tion performance and accuracy were assessed across combinations authors’ preliminary analysis of the inferential task suggests that
of features. The number of definition statements, which needed estimations using several features and a liner function are useful,
to be memorized to answer the question statements during the ex- but subjects’ performance is not sufficient enough to be analyzed
periment, affected the estimation accuracy. These results provide in depth [Nakayama and Hayashi 2009]. To improve this perfor-
evidence that features of eye-movements during reading statements mance, two approaches will be considered. The first is the creation
can be used as an index of contextual understanding. of features of eye-movements which are effective at understanding
the viewer’s behavior. The second is that a more robust estimation
CR Categories: H.1.2 [User/Machine Systems]: Human informa- procedure be created using non-linear functions such as Support
tion processing; H.5.2 [User Interfaces]: Evaluation/methodology Vector Machines (SVM) [Stork et al. 2001]. Also, performance as-
sessment procedures are used to emphasize the significance of the
Keywords: eye-movements, answer correctness, eye-movement estimation [Fawcett 2006].
metrics, user’s response estimation, discriminant analysis This paper addresses the feasibility of estimating user response
correctness of inferential tasks using various features of eye-
1 Introduction movements made while the user selects alternative choices based
on their contextual understanding of statements presented to them.
“Contextual understanding” is the awareness of knowledge and pre-
sented information, including texts, images and other factors. To 2 Experimental method
discern a person’s contextual understanding of something, ques-
tions are generally given in order to observe the responses. Even in The subjects were first asked to understand and memorize some
the human-computer interaction (HCI) environment, various sys- definition statements which described locational relationships be-
tems ask users about their contextual understanding. To improve tween two objects (Figure 1(a)). Each definition statement was pre-
these systems and make them more environmentally effective for sented for 5 seconds. Then, ten questions in statement form were
users and designers, an index for the measurement of contextual given to determine the degree of understanding. These questions
understanding is desirable, and should be developed to ferret out asked subjects to choose one of two choices as quickly as possi-
problems regarding HCI. Eye-movements can be used to evaluate ble, regarding whether each question statement was “Yes (True)”
document relevance [Puol¨ maki et al. 2005], and to estimate user
a or “No (False)” (Figure 1(b)). Each question statement was shown
certainty [Nakayama and Takahasi 2008]. Results such as these for 10 seconds. When the subject responded to a question state-
suggest the possibility that features of eye-movements can estimate ment, the display moved to the next task. All texts were written in
viewer responses to questions which are based on contextual under- Japanese Kanji and Hiragana characters, and the texts are read from
standing and certainty. Estimation techniques using eye-movement left to right. This task asked subjects for “contextual understand-
metrics have already been applied to Web page assessments [Ehmke ing”. The number of statements, which were given to subjects, was
and Wilson 2007; Nakamichi et al. 2006]. The effective features of controlled at 3, 5 or 7 per set, as the task difficulty. Five sets were
response estimation have not yet been determined, however. created for each statement level. In total, 150 responses per subject
To conduct an estimation of viewer’s responses using eye- were gathered (3 levels × 5 sets × 10 questions). The experimen-
movements, the appropriate features need to be extracted. Jacob tal sequence was randomized to prevent subjects from experiencing
any learning effect. The subjects were 6 male university students
∗ e-mail: nakayama@cradle.titech.ac.jp ranging from 23 to 33 years of age. They had normal visual acuity
Copyright © 2010 by the Association for Computing Machinery, Inc.
for this experiment.
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
The task was displayed on a 20 inch LCD monitor positioned
for commercial advantage and that copies bear this notice and the full citation on the 60 cm from the subject. During the experiment, a subject’s
first page. Copyrights for components of this work owned by others than ACM must be eye-movements were observed using a video-based eye tracker
honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on (nac:EMR-8NL). The subject rested his head on a chin rest and a
servers, or to redistribute to lists, requires prior specific permission and/or a fee.
Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail
small infra-red camera was positioned between the subject and the
permissions@acm.org. monitor, 40 cm from the subject. Blink was detected using an as-
ETRA 2010, Austin, TX, March 22 – 24, 2010.
© 2010 ACM 978-1-60558-994-7/10/0003 $10.00

53

1 4
Incorrect

3
Correct

Time (sec.)
Accuracy
0.5 Accuracy 2

1

(a) (b)
0 0
3 5 7
Figure 1: Leftside (a) shows a sample of a definition statement: “A Number of statements
theater is located on the east side of the police station.” Rightside
(b) shows a sample of a question statement: “There is a post office Figure 2: Mean accuracy and reaction times for correct and incor-
on the south side of the theater.” rect responses across the number of definition statements.

90
pect ratio of the two diameters. During blink, the lack of eye track- 120 60 *

ing data was compensated for by the use of a simple, previously 6 deg

used procedure [Nakayama and Shimizu 2004]. The tracker was 150* 30

calibrated at the beginning of the session, and eye-movement was
tracked on a 640 by 480 pixel screen at 60 Hz. The accuracy of the 180* 0
spatial resolution of this equipment is noted in the manufacturer’s
catalog as being a visual angle of 0.1 degrees. Correct
210* Incorrect 330*
The tracked eye-movement data was extracted for the duration of p<0.01
*
time subjects viewed each question statement before the mouse but- 240 * 300*
ton was pressed. While differences between the captured view- 270 *

ing positions were calculated, eye-movements were divided into
saccades and gazes using a threshold of 40 degrees per second Figure 3: Mean fixation position for 12 directions.
[Ebisawa and Sugiura 1998]. The two-dimensional distribution
of eye-movements have not been considered as features of eye-
movements, although several studies have used these factors in their
research [Tatler 2007; Tatler et al. 2007]. Therefore, features of 3.2 Feature differences between responses
fixation and saccade were mapped in 12 directions using 30 degree
steps, in order to present a two-dimensional distribution. Four types Extracted features of eye movements for question statements, sac-
of features were summarized for each time the statements were cade length, differences in saccade length, saccade frequency and
viewed, as follows: the fixation position as the distance in degrees saccade duration, were compared between correct and incorrect re-
from the center of the screen, fixation duration, saccade length, and sponses. The distribution of fixation points is illustrated in Figure
saccade duration. These are 12-dimensional vectors, and they can 3. The figure shows fixation points covering a horizontal area, in
also be noted as a scalar of the means of the components. particular on the right-hand side. In this experiment, single sen-
tences were written horizontally, so that subjects viewed them ac-
cording to the outline. In Japanese, verbs and negation are written
3 Results at the ends of sentences, so that readers may confirm the relation-
ship between the subject and object in the sentence and whether
3.1 Viewer’s response it is a positive or negative statement. When comparing positions
between correct and incorrect responses, significant differences are
The subject’s responses were classified as correct (hits and correct illustrated with an asterisk mark (*) on the axis (p < 0.01). There
rejections) or incorrect (misses and false alarms) according to the are significant differences between the 150 to 330 degree directions
context of the question statement. There was a unique answer for and the 60 degree direction. For all cases, mean positions for in-
every question, because the question statements were generated us- correct responses were longer than those for correct responses. As
ing a rule of logic . The reaction time was also measured in mil- most differences appear on the left-hand side, this suggests that sub-
liseconds for all responses. The accuracy of the responses across ject’s fixation points stayed at the beginning of a statement when
the number of statements is summarized in Figure 2. The accuracy he made an incorrect response. They might have had some trou-
decreases with the total number of statements. According to the re- ble starting reading. Subjects also viewed a wider area when they
sults of one-way ANOVA on the accuracy, the factor of the number made incorrect responses than when they made correct responses.
of statements is significant (F (2, 10) = 16.5, p < 0.01). The ac- The distribution of the fixation durations for each direction is simi-
curacy for a total of 3 statements is significantly higher than for the lar to the mean position from the center of Figure 3. The durations
others (p < 0.01), but there is no significant difference between the on the right hand side are relatively longer than for the other di-
accuracy for 5 and 7 statements (p = 0.21), however. This suggests rections. This means that viewers’ eye movements stayed in this
that the number of statements can be used to control the difficulty of area, at the end of the sentence. When comparing the duration be-
the task, and that the task is easiest for 3 statements and hardest for tween correct and incorrect responses, the results for the very right
both 5 and 7 statements. Mean reaction times for both correct and hand side are different from the others. Mean durations for correct
incorrect responses were also illustrated in Figure 2. There are sig- responses are significantly longer than durations for incorrect re-
nificant differences in reaction times between correct and incorrect sponses in the direction of 0 degrees. For other directions, such as
responses (F (1, 5) = 109.1, p < 0.01). The factor of the number upward, left-ward and in the directions of 300 degrees, mean du-
of statements is not significant (F (2, 20) = 0.6, p = 0.54). This rations for incorrect responses are longer than mean durations for
suggests that reaction time is a key factor of response correctness. correct responses. This means that the distribution of the duration

54

90 *
120 60
5 deg. Table 1: Discrimination results for a condition (f(24)s(24): No. of
statements=3).
150 * 30*
Subject’s ˆ
Estimation [t]
response [t] Correct Incorrect Total
180* 0* Correct 158 49 207
Incorrect 25 68 93
Correct Total 183 117 300
210* Incorrect 330*
* p<0.01

240 300*
270 rameter of the error term C for the soft margin, and the γ parameter
as a standard deviation of the Gaussian kernel should be optimized.
Figure 4: Mean saccade length across 12 directions. To extract validation results, the leave-one-out procedure was ap-
plied to the estimation. Training data consisting of the data of all
subjects except the targeted subject was prepared. Both the train-
for correct responses shifts towards the right. These metrics show ing, and the estimation of responses was then conducted. These
some of the required indices such as the visual area of coverage and were tallied, and the mean performance was evaluated. As a result,
visual attention [Duchowski 2006]. the estimation results for three statements are summarized in Table
ˆ
1, for all features (t vs. t). The rate for correct decisions consisting
Mean saccade lengths in visual angles are summarized across 12 of hits and correct rejections is 76.3%. This discrimination per-
directions in Figure 4. The mean of the saccade lengths is spread formance is significant according to the binomial distribution. The
more widely along the horizontal axis. In particular, saccade estimation performance is often evaluated using an ROC (Receiver
lengths in horizontally opposite directions are the longest, and their Operating Characteristics) curve, which is based on signal detection
lengths are almost equal. This behavior shows that subjects care- theory [Fawcett 2006]. LIBSVM tools can provide a probability of
fully read the question statements. Also, it may depend on the use the discrimination [Chang and Lin 2008], and then the ROCs are
of Japanese grammar, because the subject term and the verb are created for each level of statements using the probability [Fawcett
separated horizontally in the text. When comparing the lengths be- 2006]. Furthermore, the validation performance of the discrimi-
tween correct and incorrect responses, the mean saccade length in nation is conducted using AUC (Area Under the Curve). The AUC
the reverse direction (180 degrees) is longer for correct responses varies between 0 and 1, but the performance is better when the value
than it is for incorrect responses. For several other directions (0, approaches 1. When AUC is near 0.5, the performance is at the
90, 150, 210, 300, and 330 degrees), the mean saccade lengths for chance level. Other feature sets of eye-movements, such as fixa-
incorrect responses are longer than the saccade lengths for correct tion and saccades, were applied to the same estimation procedure.
responses, however. Mean saccade durations clearly show that the Estimation performances and AUCs for a number of statements are
mean durations for incorrect responses are definitely longer than summarized in Table 2. The 12 features for fixations consist of the
those for correct responses. Though the mean indicates the duration fixation positions across 12 directions. The 13 features for fixa-
for a single saccade, overall means are quite different, and there are tion consist of the 12 fixation positions plus a scalar of the fixation
significant differences for all directions between correct and incor- duration, and the 24 features consist of the 12 fixation positions
rect responses. This suggests that saccadic movement seems to be and the 12 fixation durations. For saccades, the 12 features consist
slower when the viewer’s responses are incorrect. of the saccade lengths across 12 directions. The 13 features con-
sist of the 12 saccade lengths plus a scalar of the saccade duration,
3.3 Estimation of Answer Correctness and the 24 features consist of the 12 saccade lengths and the 12
saccade durations. As references, performances using a combina-
The significant differences in eye movement features between re- tion of scalar features were calculated. The combination “A” shows
sponses were summarized in the above sections. These results sug- the performance when selected features (four features of saccades)
gest that if there is the possibility of estimating responses using are applied to the estimation [Nakayama and Hayashi 2009]. An-
viewer’s eye movement patterns before their decisions are made, other combination “B” shows the performance when another set
then this possibility should be determined. Here, the hypothesis of selected features for all saccades (four features) [Nakayama and
is that there is a relationship between “correct” or “incorrect” re- Takahasi 2008] is applied to the estimation. The estimation proce-
sponses and the acquired features of eye-movements for a question dure is based on the previous study. Both feature sets of “A” and
statement. Feature vectors of eye-movements are noted as V , al- “B” do not include the reaction time factor in this paper, because
ternative responses are noted as t, and the acquired data can be the factor affected the performance.
noted as (V , t) for each question statement. In this section, the per- As a result of the estimations in Table 2, the best performance is
formance of the estimation is determined using various features of obtained using all features of fixation and saccade across 12 direc-
eye movements. First of all, all extracted features (24+24 dimen- tions. According to the table, the estimation performance using sac-
sions, V24+24 ) of eye movements, such as fixation and saccade, are cade features is higher than the performance using fixation features.
applied to a discrimination function. For the estimation function, When the estimation was conducted using fixation or saccade fea-
support vector machines (SVM) are used for this analysis because tures, the performance was practically independent of the number
SVM is quite robust for high dimensionality features and poorly de- of feature dimensions. A combination of features including fixa-
fined feature fields [Stork et al. 2001]. Here, a sign function which tion and saccade gives the highest performance. When two sets of
is based on the SVM function is defined as G and based on a Gaus- selected features “A” and “B” were applied to the estimation, the
sian kernel. The parameters and functions can be noted as follows: performance was not significant. A few estimations were less than
t ∈ {+1(correct), −1(incorrect)} 50% accurate. This result provides evidence that the directional
t = G(V24+24 ), t ∈ {+1(correct), −1(incorrect)}
ˆ ˆ information across 12 directions is quite significant. The AUC met-
rics are also the highest when the estimation was conducted using
The optimization for the function G(V ) was conducted using LIB- fixation or saccade features. Table 1 shows the rate of false alarms,
SVM tools [Chang and Lin 2008]. For the SVM, the penalty pa- which is the number of correct responses estimated for incorrect

55

Table 2: Estimation accuracy using feature vectors.
Combination Fixation Saccade Fixation+Saccade
Tasks A B f(12) f(13) f(24) s(12) s(13) s(24) f(12)s(12) f(12)s(13) f(24)s(24)
Estimation accuracy
3 68.3 43.0 65.0 65.7 73.0 77.3 78.7 76.0 76.0 75.7 76.3
5 50.3 55.7 68.3 68.7 65.3 65.7 65.3 64.0 66.7 68.7 70.7
7 44.3 61.0 67.6 67.0 61.6 67.0 66.0 66.7 68.3 67.7 68.7
M 54.3 53.2 67.0 67.1 66.6 70.0 70.0 68.9 70.3 70.7 71.9
AUC: Area under a curve
3 0.37 0.75 0.72 0.73 0.81 0.80 0.82 0.80 0.83 0.84 0.83
5 0.44 0.70 0.74 0.73 0.73 0.70 0.71 0.71 0.75 0.76 0.76
7 0.42 0.71 0.75 0.73 0.71 0.75 0.73 0.73 0.77 0.78 0.78
M 0.41 0.72 0.73 0.73 0.75 0.75 0.75 0.75 0.78 0.79 0.79
(13): 12 features of vector information plus the scalar of the duration.
A: mean saccade length, mean differential ratio, saccade frequency, mean succade duration [Nakayama and Hayashi 2009]
B: saccade length, dx, dy, saccade duration of every saccade [Nakayama and Takahasi 2008]

responses. When the number of false alarms is smaller the AUCs FAWCETT, T. 2006. An introduction to roc analysis. Pattern Recog-
are higher. Additionally, both the estimation accuracy and the AUC nition Letters 27, 861–874.
metric for the three statement conditions are higher than for other
conditions with 5 and 7 statements. According to Figure 2, the re- JACOB , R. J. K., AND K ARN , K. S. 2003. Eye tracking in human–
sponse accuracy for three statements is significantly higher than for computer interaction and usability research: Ready to deliver the
the other conditions. This response accuracy may affect both the promises. In The Mind’s Eye: Cognitive and Applied Aspects
estimation accuracy and the AUCs. of Eye Movement Research, Hyona, Radach, and Deubel, Eds.
Elsevier Science BV, Oxford, UK.

4 Summary NAKAMICHI , N., S HIMA , K., S AKAI , M., AND ICHI M AT-
SUMOTO , K. 2006. Detecting low usability web pages using
To estimate viewer’s contextual understanding using features of quantitative data of users’ behavior. In Proceedings of the 28th
eye-movements, features were extracted and compared between International Conference on Software Engineering (ICSE’06),
correct and incorrect responses when alternative responses to ques- ACM Press.
tion statements concerning several definition statements were of- NAKAYAMA , M., AND H AYASHI , Y. 2009. Feasibility study for
fered. Twelve directional features of eye-movements across a two- the use of eye-movements in estimation of answer correctness.
dimensional space were created: fixation position, fixation dura- In Proceedings of COGAIN2009, A. Villanueva, J. P. Hansen,
tion, saccade length and saccade duration. In a comparison of these and B. K. Ersboell, Eds., 71–75.
features between correct and incorrect responses, there were signif-
icant differences in most features. This shows evidence that fea- NAKAYAMA , M., AND S HIMIZU , Y. 2004. Frequency analysis of
tures of eye-movements reflect the viewer’s contextual understand- task evoked pupillary response and eye-movement. In Eye Track-
ing. An estimation procedure using Support Vector Machines was ing Research and Applications Symposium 2002, ACM Press,
developed and applied to the experimental data. The estimation per- New York, USA, S. N. Spencer, Ed., ACM, 71–76.
formance and accuracy were assessed across several combinations NAKAYAMA , M., AND TAKAHASI , Y. 2008. Estimation of cer-
of features. When all extracted features of eye-movements were tainty for responses to multiple-choice questionnaires using eye
applied to the estimation, the estimation accuracy was 71.9 % and movements. ACM TOMCCAP 5, 2, Article 14.
the AUC was 0.79. The number of definition statements affected
estimation performance and accuracy. ¨ ¨
P UOL AMAKI , Y., S ALOJ ARVI , J., S AVIA , E., S IMOLA , J., AND
K ASKI , S. 2005. Combining eye movements and collabora-
References tive filtering for proactive information retrieval. In Proceedings
of ACM-SIGIR 2005, ACM Press, New York, USA, A. Heikkil,
A. Pietik, and O. Silven, Eds., ACM, 145–153.
C HANG , C., AND L IN , C., 2008. Libsvm: A library for sup-
port vector machines (last updated: May 13, 2008). Available R AYNER , K. 1998. Eye movements in reading and information
21 July 2009 at URL: http://www.csie.ntu.edu.tw processing: 20 years of research. Psychological Bulletin 124, 3,
/˜cjlin/libsvm. 372–422.
D UCHOWSKI , A. T., 2006. High-level eye movement metrics in the S TORK , D. G. R., D UDA , O., AND H ART, P. E. 2001. Pattern
usability context. Position paper, CHI2006 Workshop: Getting a Classification, 2nd ed. John Wiley & Sons, Inc. Japanese transla-
Measure of Satisfaction from Eyetracking in Practice. tion by M. Onoue, New Technology Communications Co., Ltd.,
Tokyo, Japan (2001).
E BISAWA , Y., AND S UGIURA , M. 1998. Influences of target and
fixation point conditions on characteristics of visually guided TATLER , B. W., WADE , N. J., AND K AULARD , K. 2007. Ex-
voluntary saccade. The Journal of the Institute of Image Infor- amining art: dissociating pattern and perceptual influences on
mation and Television Engineers 52, 11, 1730–1737. oculomotor behaviour. Spatial Vision 21, 1-2, 165–184.
E HMKE , C., AND W ILSON , S. 2007. Identifying web usabil- TATLER , B. W. 2007. The central fixation bias in scene viewing:
ity problems from eye-tracking. In Proceedings of HCI 2007, Selecting and optimal viewing position independently of motor
British Computer Society, L. Ball, M. Sasse, C. Sas, T. Ormerod, biases and image feature distributions. Journal of Vision 7, 14,
A. Dix, P. Bagnall, and T. McEwan, Eds. 1–17.

56

Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks Using Features Of Eye Movements

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks Using Features Of Eye Movements

Similar to Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks Using Features Of Eye Movements (20)

More from Kalle

More from Kalle (20)

Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks Using Features Of Eye Movements