Grindinger Group Wise Similarity And Classification Of Aggregate Scanpaths


Published on

We present a novel method for the measurement of the similarity between aggregates of scanpaths. This may be thought of as a solution to the “average scanpath” problem. As a by-product of this method, we derive a classifier for groups of scanpaths drawn from various classes. This capability is empirically demonstrated using data gathered from an experiment in an attempt to automatically determine expert/novice classification for a set of visual tasks.

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Grindinger Group Wise Similarity And Classification Of Aggregate Scanpaths

  1. 1. Group-Wise Similarity and Classification of Aggregate Scanpaths Thomas Grindinger∗ Andrew T. Duchowski , Michael Sawyer School of Computing, Clemson University Industrial Engineering, Clemson University Figure 1: Typical scanpath visualization at left. Time-projected scanpath visualization at right, where the y-axis denotes vertical gaze position and the x-axis denotes time. Fixation labels are common between the two. Vertical markers denote one-second intervals. Abstract analysis. For instance, a director could specify a “basis” scanpath, which the audience is expected to closely approximate. Our met- We present a novel method for the measurement of the similarity ric could measure how closely the audience conforms to that basis. between aggregates of scanpaths. This may be thought of as a so- The ability to perform aggregate scanpath similarity measurement lution to the “average scanpath” problem. As a by-product of this and classification for each of these tasks is clearly needed. method, we derive a classifier for groups of scanpaths drawn from various classes. This capability is empirically demonstrated using data gathered from an experiment in an attempt to automatically 2 Background determine expert/novice classification for a set of visual tasks. One of the foundational works in scanpath comparison was Priv- itera and Stark [2000]’s use of a string editing procedure to com- CR Categories: J.4 [Computer Applications]: Social and Behav- pare the sequential loci of scanpaths. Their approach does not dis- ioral Sciences—Psychology. I.2 [Pattern Recognition]: Models— tinguish between fixations of different durations. Our approach is Statistical. similar, but at a finer granular level, allowing for comparison with groups of scanpaths. Hembrooke et al. [2006] used a multiple se- Keywords: eye tracking, scanpath comparison, classification quence alignment algorithm to create an average scan path for mul- tiple viewers, providing the functionality lacking in the previous 1 Introduction work. Unfortunately, their procedure was never explained in detail, and no objective results were provided. Scanpath comparison is a topic of growing interest. Methods have Duchowski and McCormick [1998] described a visualization which been proposed that allow comparison of two scanpaths. Less work tracks fixations through time, referred to as “volumes of interest”. has been done on the comparison of one or more scanpaths to differ- They were able to visualize multiple scanpaths in two and three di- ent groups of scanpaths. This is useful work, especially in light of mensions, using this temporal mapping. The former plots x or y its potential in training environments, e.g., Sadasivan et al. [2005] components on the y-axis and time on the x-axis, while the latter demonstrated that expert scanpaths could be used as feedforward visualizes fixations as three-dimensional, uniform-width volumes, information to guide novices in visual search. Our work provides where time serves as the third dimension. R¨ ih¨ et al. [2005] de- a a a means of evaluating the ways in which different portions of a scribed a similar visualization in two dimensions, with the slight novice’s scanpath deviates from the expert’s. difference that fixations were displayed as variable-size circles, Leigh and Zee [1991] discuss the implications of eye movements congruent with the typical visualization of scanpaths. Heatmap vi- on the diagnosis and understanding of certain neurological disor- sualizations, as described by Pomplun et al. [1996] and popularized ders. Our work also has potential in marketing and film viewing by Wooding [2002], overlay attentional information onto a stimu- lus as colors, where hot colors correspond to regions of high inter- ∗ e-mail: est and cold (or no) colors correspond to regions of low interest. This representation is highly informative, yet does not provide any Copyright © 2010 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or quantitative information. We utilize the concept of heatmaps in our classroom use is granted without fee provided that copies are not made or distributed algorithm, but we do not aggregate them over time. for commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be Our similarity measure resembles somewhat the Earth Mover’s Dis- honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on tance used by Dempere-Marco et al. [2006] when considering cog- servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail nitive processes underlying visual search of medical images. The approach is also similar to Galgani et al.’s [2009] effort to diagnose ETRA 2010, Austin, TX, March 22 – 24, 2010. © 2010 ACM 978-1-60558-994-7/10/0003 $10.00 101
  2. 2. (a) (b) (c) (d) Figure 2: Collections of scanpaths of novice (a) and expert (c) pilots over a single stimulus. Time-projected scanpaths of novices (b) and of experts (d) can be considered side views of the three-dimensional data. ADHD through eye tracking data. They created three classifiers, scanpath s and time t, the fixation function, f (s, t), produces either including a classifier based on Levenshtein distance, and discov- the fixation attributable at that timestamp, e.g., frame, or null (for ered that Levenshtein’s gave the best results among their chosen saccades) from scanpath s. Figure 1 visualizes the difference be- algorithms. To show relative improvement, we also compare the tween the standard scanpath representation and a side view of the performance of our algorithm to a similar Levenshtein classifier. three-dimensional representation. We extend the above definition to the function f (S, t) by chang- 3 Group-Wise Similarity ing the single scanpath parameter s to a collection of scanpaths S. This function would then return a collection of fixations for Our algorithm takes as input two collections of fixation-filtered all scanpaths in S at the given timestamp. Then, we may differen- scanpaths. An example image is presented in Figure 2, displayed in tiate groups of subjects into their own scanpath sets. For instance, 2(a) with all novice scanpaths and in 2(c) with all expert scanpaths. in our experiment, we study the differences between experts and From a simple visual examination, there is no obvious characteris- novices. We may then create an expert scanpath set E and a novice tic that stands out for either collection. A procedure is then needed scanpath set N . The functions f (E, t) and f (N, t) would then re- to perform a deeper statistical analysis of each collection. turn collections of fixations at timestamp t for experts and novices, The original impetus for this approach was the desire to formulate respectively (Figures 2(b) and 2(d) visualize the same data as in an elegant scanpath comparison measure for dynamic stimuli, such Figures 2(a) and 2(c), but as side views of their three-dimensional as movies or interactive tasks. Current string-editing approaches representations). are not sufficient for video. For example, a string-editing alignment could mistakenly align AOIs from frames that are many seconds These group-specific collections of fixations for single frames may apart. There is nothing to explicitly constrain AOIs to only coincide be clustered by the mean shift approach described by Santella and within specific temporal limits. DeCarlo [2004]. The resulting clusters serve as general AOIs for a given frame, describing regions of varying interest for that specific From the perspective of a collection of movie frames, each frame group of individuals. We may then construct a probabilistic model can be thought of as a separate stimulus. The scanpath for a sin- of expected attention for that group. Such a model for a single gle subject, viewing a movie stimulus, can then be broken up into frame is visualized in Figure 3. a collection of fixation-frame units, which are more or less inde- pendent from each other. This conceptualization of a scanpath dif- Each frame will have a separate model associated with it, and we fers from the conventional view, in that the conventional visualiza- may calculate the “error per group” of a given fixation in a frame by tion is a “projection” of fixations over time onto a two-dimensional calculating the summation of the Gaussian distances from the fixa- plane. Our conceptualization avoids this projection entirely. Thus, tion point to all group-specific cluster centers. We use a Gaussian we produce a three-dimensional “scanpath function”. Given some kernel with standard deviation of 50 pixels to determine the dis- 102
  3. 3. this approach to validate whether our group-wise similarity mea- sure produces information that may be used to reliably discriminate between groups. A classifier must be constructed for each group, e.g., the expert group and the novice group. The classifier for expert data will be described below. The classifier for novice data may be constructed identically, though with different input values. As input to our classifier, we provide a list of group-wise similarity scores, corresponding to the similarities of individual scanpaths to the expert model, as described above. The goal of the expert clas- sifier, then, is to determine some similarity threshold score, above which indicates that a given scanpath is likely to be expert and be- low which indicates that the scanpath is unlikely to be expert. We use the receiver operating characteristic (ROC) curve to find Figure 3: Mixture of Gaussians for expert fixations at a discrete this threshold. A thorough description of the curve may be found in timestamp. Displayed novice fixations were not used in the clus- Fogarty et al. [2005]. This curve may also be used to compute the tering operation. Note that the fixation labeled ‘A’ is far from the area under the ROC curve (AUC). This value describes the discrim- cluster centers, and thus has lower similarity than fixation labeled inative ability of a classifier. Simple percentage accuracy values ‘B’ that is close to a cluster center. may be misrepresentative, especially in skewed cases, such as hav- ing a large quantity of data from one class and a small quantity of data from another. The AUC value describes the probability that an tance value, which we then invert. Thus, a fixation point collocated individual instance of one class will be classified differently from with a cluster mean or centroid has inverse distance value (similar- an instance of another class. ity) of 1.0, and a fixation point more than 50 pixels away from the cluster mean has inverse distance value close to 0. The summation Two classifiers are being trained: an expert and a novice classifier. of the cluster similarities for a single fixation point are divided by This means that two scores are produced for a single instance. Each the number of clusters, giving a value between 0 and 1. score describes the probability that an instance is a member of the expert or novice group, respectively. In order to decide which class With a mechanism to evaluate group-specific error, or rather simi- this instance conclusively belongs to, we use a heuristic. There larity, of fixation points in individual frames, we may then extrap- are a few possibilities for the arrangement of these scores. First, olate this process over the entire scanpath duration by summing in- the expert score may be higher than the expert threshold, and the dividual similarities for each frame and then returning the average. novice score may be lower than the novice threshold. This case Thus, a scanpath in which most fixation points lie near to group- is trivially expert. Similarly, an instance with expert score lower specific clusters will have similarity close to 1.0 for that group, than the expert threshold and novice score higher than the novice while a scanpath in which most fixations points lie far away from threshold is trivially novice. In the case of both scores being above those clusters will have similarity close to 0. This metric may then or below their respective thresholds, we divide the score of each be extrapolated further to describe the similarity of one group of classifier by its threshold value and choose the greater of the two. scanpaths to another by simply averaging together the group-wise similarities for each scanpath in one group to the entire other group. 5 Results The data collected for expert/novice classification purposes did not, in fact, use video as stimulus. Nevertheless, while the video-based In order to evaluate our method we analyzed the results of a study approach is expected to be more reliable for video, its application wherein 20 high-time pilots (experts) and 20 non-pilots (novices) to static images would also be beneficial. In concordance with the were presented with 20 different images of weather. Subjects video paradigm, we take samples from our data every 16 millisec- were asked to determine whether they would continue their current onds. Thus, this procedure may be utilized for analysis over both flight path or if they needed to divert. Their eye movements were static and dynamic stimuli. In our study, recorded scanpaths are of recorded by a Tobii ET-1750 eye tracker (their verbal responses various lengths. We must, therefore, specify a time window over were ignored in our analysis). Our objective was to produce a clas- which to collect fixation data. The upper bound on the length of sifier that can predict whether a subject is expert or novice, based this window is the shorter of either the length of the scanpath being solely on their eye movements. compared or the mean of the scanpath lengths for a given stimulus. To evaluate the capabilities of this new approach, we compared the With two classes, a random classifier would be expected to produce results to a group-wise extension of pairwise string-editing similar- 0.50 accuracy and AUC values. Evaluation metrics for our mech- ity. The group-wise string-editing similarity of a single scanpath anism are listed in Table 1. In our evaluation, we refer to expert to a group of scanpaths is the average pairwise similarity of that data as our positive class and novice data as negative. According to scanpath to each scanpath in the group it is being compared to. the p-values, all metrics are significantly higher than random for our method, while only the accuracy and AUC for the positive classifier are significantly higher for the string-editing method. 4 Classification Results show the classifier’s discriminative ability over a single Our method of group-wise scanpath similarity is validated by a stimulus. Given multiple stimuli, our measure is extrapolated over machine learning validation approach. Machine learning, specifi- all stimuli for each subject. A “majority vote” is then used, where cally classification, is a statistical framework which takes, as input, one vote is drawn from each stimulus. If more than half the votes one or more groups of data and produces, as output, probability indicate that a subject is expert, that subject is then classified as values that describe the likelihood that some arbitrary datum is a conclusively expert. Otherwise, a subject is classified as novice. member of one or more of the defined groups. Thus, we may use Accuracies for this voting mechanism are listed in Table 2. 103
  4. 4. Cross-Validation Results those scanpaths. This mechanism has been empirically and statis- posAcc negAcc totAcc posAUC negAUC tically validated, showing that it is capable of discriminating be- Temporal tween groupings at least as diverse as expert/novice subject appel- Average 0.71 0.64 0.68 0.85 0.86 lation, with greater accuracy and reliability than random. Potential Std Dev 0.07 0.12 0.07 0.07 0.04 applications include training environments, neurological disorder Median 0.74 0.66 0.68 0.87 0.86 diagnosis, and, in general, evaluation of attention deviation from p-value 0.00 0.01 0.00 0.00 0.00 that expected or desired during a dynamic stimulus. Future work String-editing may include pre-alignment of unclassified scanpaths with classified Average 0.49 0.64 0.57 0.81 0.72 scanpaths, attempting to increase the accuracy further during the Std Dev 0.17 0.13 0.06 0.06 0.11 calculation of class similarity. Median 0.48 0.64 0.57 0.82 0.71 p-value 0.98 0.02 0.16 0.00 0.00 References Table 1: Results of classification cross-validation for both the new D EMPERE -M ARCO , L., H U , X.-P., E LLIS , S. M., H ANSELL , temporal method and string-editing similarity. Columns are accu- D. M., AND YANG , G.-Z. 2006. Analysis of Visual Search Pat- racy of positive (expert) and negative (novice) instances, total com- terns With EMD Metric in Normalized Anatomical Space. IEEE bined accuracy, and AUC values for positive and negative classifi- Transactions on Medical Imaging 25, 8 (August), 1011–1021. cation. P-values are results of t-test for significance of score distri- butions against a random distribution. D UCHOWSKI , A. T. AND M C C ORMICK , B. H. 1998. Gaze- Contingent Video Resolution Degradation. In Human Vision and Subject Results Electronic Imaging III. SPIE, Bellingham, WA. Temporal String-editing F OGARTY, J., BAKER , R. S., AND H UDSON , S. E. 2005. Case Experts Novices Experts Novices studies in the use of ROC curve analysis for sensor-based esti- Average 0.68 0.35 0.45 0.34 mates in human computer interaction. In GI ’05: Proceedings Accuracy 85% 95% 40% 80% of Graphics Interface 2005. Canadian Human-Computer Com- munications Society, School of Computer Science, University of Table 2: Results of cross-stimulus validation. Accuracy is deter- Waterloo, Waterloo, Ontario, Canada, 129–136. mined by counting the number of experts/novices with expert ratio G ALGANI , F., S UN , Y., L ANZI , P., AND L EIGH , J. 2009. Au- greater than 0.5 in the case of experts and less than or equal to 0.5 tomatic analysis of eye tracking data for medical diagnosis. In in the case of novices. Proceedings of IEEE Symposium on Computational Intelligence and Data Mining (IEEE CIDM 2009). IEEE. 6 Discussion H EMBROOKE , H., F EUSNER , M., AND G AY, G. 2006. Averag- ing Scan Patterns and What They Can Tell Us. In Eye Tracking Research & Applications (ETRA) Symposium. ACM, San Diego, The AUC values listed in Table 1 show stronger discriminative abil- CA, 41. ity than a measure based on string-editing. P-values from t-tests indicate that the results of our new method are significantly dif- L EIGH , R. J. AND Z EE , D. S. 1991. The Neurology of Eye Move- ferent from random for all measures, while results of the string- ments, 2nd ed. Contemporary Neurology Series. F. A. Davis editing method are only significant for novices and AUC values. Company, Philadelphia, PA. The cross-stimulus results in Table 2 show that novice instances are consistently easier to classify than expert instances, but the overall P OMPLUN , M., R ITTER , H., AND V ELICHKOVSKY, B. 1996. Dis- accuracies are still quite high. 85% of the positive instances are ambiguating Complex Visual Information: Towards Communi- properly classified as experts, while 95% of the negative instances cation of Personal Views of a Scene. Perception 25, 8, 931–948. are classified as novice. This is an improvement over string-editing, P RIVITERA , C. M. AND S TARK , L. W. 2000. Algorithms for with 40% positive accuracy and 80% negative accuracy. Defining Visual Regions-of-Interest: Comparison with Eye Fix- ations. IEEE Transactions on Pattern Analysis and Machine In- The average accuracies in the cross-stimulus table may be inter- telligence (PAMI) 22, 9, 970–982. preted as the cross-validated similarity of each class to the expert class. The group-wise similarity of experts to the expert class is ¨ ¨ R AIH A , K.-J., AULA , A., M AJARANTA , P., R ANTALA , H., AND 0.68, while the group-wise similarity of novices to the expert class KOIVUNEN , K. 2005. Static Visualization of Temporal Eye- is 0.35. The experts’ similarity is above 0.5, while the novices’ is Tracking Data. In INTERACT. IFIP, 946–949. below 0.5, which is appropriate and intuitive, though one might ex- pect the similarity of a class with itself to be closer to 1.0. In this S ADASIVAN , S., G REENSTEIN , J. S., G RAMOPADHYE , A. K., case, though, since we are cross-validating our results, we are not so AND D UCHOWSKI , A. T. 2005. Use of Eye Movements as Feed- much measuring the similarity between a group and itself, but mea- forward Training for a Synthetic Aircraft Inspection Task. In suring the average similarity between members of the same class. Proceedings of ACM CHI 2005 Conference on Human Factors In the case of measuring the similarity of different classes, though, in Computing Systems. ACM Press, Portland, OR, 141–149. such as comparing the novice class to the expert class, the intuitive S ANTELLA , A. AND D E C ARLO , D. 2004. Robust Clustering of idea of group-wise similarity is more appropriate and convenient. Eye Movement Recordings for Quantification of Visual Interest. In Eye Tracking Research & Applications (ETRA) Symposium. 7 Conclusion ACM, San Antonio, TX, 27–34. W OODING , D. 2002. Fixation Maps: Quantifying Eye-Movement A group-wise scanpath similarity measure and classification al- Traces. In Eye Tracking Research & Applications (ETRA) Sym- gorithm have been described, allowing analysis and discrimina- posium. ACM, New Orleans, LA. tion of groups of scanpaths, based on any informative grouping of 104