The repeated viewing of a suspect’s face by an eyewitness during the commission of a crime and subsequently when presented with
suspects in a photo lineup provides a real-world scenario where Noton and Stark’s 1971 “scanpath theory” of visual perception and memory can be tested. Noton and Stark defined “scanpaths” as repetitive sequences of fixations and saccades that occur during exposure and subsequently upon re-exposure to a visual stimulus, facilitating recognition. Ten subjects watched a video of a staged theft in a parking lot. Scanpaths were recorded for the initial viewing of the suspect’s face and a later close-up viewing of the suspect’s face in the video, and then on the suspect’s face when his picture appeared 24 hours later in a photo lineup constructed by law enforcement officers. These scanpaths were compared using the string-edit methodology to measure resemblance between sequences. Preliminary analysis showed support for repeated scanpath sub-sequences. In the analysis of four clusters of scanpaths, there was little within-subject resemblance between full scanpath sequences but seven of 10 subjects had repeated scanpath sub-sequences. When a subject’s multiple scanpaths across the suspect’s photo in the lineup were compared, instances of within-subjects repetition of short scanpaths occurred more often than expected due to chance.
2. 3 Contribution of This Study The next day, approximately 24 hours later, participants returned
to the eye-tracking laboratory. They listened to these instructions:
While several researchers such as Brandt and Stark [1997], “Yesterday you witnessed a theft in a parking lot. Now you are
Salvucci and Anderson [2001], and Josephson and Holmes [2002] going to be shown a photo lineup constructed by law enforcement
have used string-edit methods to study eye-path sequences, this officers. The thief may or may not be present. Look carefully at
appears to be the first study to test scanpath theory using string- the six men in the photo lineup. Take your time. When you are as
edit methods in an eyewitness identification situation. certain as you can be that you have identified the thief or that the
thief is not present in the photo lineup, say ‘OK.’”
4 Method
4.4 Eye-Tracking Apparatus
4.1 Participants
Eye-movement data were collected using an ISCAN RK-426PC
The participants were 10 college students recruited from a Pupil/Corneal Reflection Tracking System [ISCAN, Inc.
medium-sized university in the western United States. The Operating Manual 1998], which uses a corneal reflection system
participants included seven Caucasian men and three Caucasian to measure the precise location of a person’s eye fixations when
women. Their average age was 26.6 years (SD = 6.65). looking at a visual display – in this case a six-person photo lineup
– on a computer monitor. The eye-tracking system does not
4.2 Stimulus Materials require participants to wear head gear. It uses a real-time digital
image processor to automatically track the center of the pupil and
Participants were first asked to watch a 45-second video of a a low-level infrared reflection from the corneal surface [ISCAN,
property theft in a parking lot while their eyes were tracked. The Inc. Operating Manual 1998]. The system collects data at 60 Hz
video showed a young Caucasian man sitting in a car witnessing a or about every 16.7 milliseconds. For the purposes of this study,
crime being committed by another young Caucasian man. The we did not analyze fixations but looked at the eye-path trace
men were actors who were paid for their participation. They were across the visual stimuli.
selected because they did not possess any unique facial features.
4.5 Sequence Comparison
A photo lineup was constructed by Caucasian law enforcement
officials responsible for producing photo arrays on a daily basis in The next step was to define the eye-path sequence for each
a medium-sized city in the western United States. A mug shot of participant’s two main viewings of the suspect’s face on the
the actor was produced, copying the style and quality of actual stimulus video (see Figure 2) and then the often multiple eye-path
mug shots taken in the police department in this city. It was then sequences across the suspect’s face in the photo lineup (see
placed among five other photos in a photo array with three Figure 1). For example, a viewing beginning with a single trace
photographs on the top row and three photographs on the bottom. over the right eye designated as area “E” followed by traces over
See Figure 1. the nose "H" and the mouth "M" would generate a sequence
beginning “EHM”. For the purposes of this study, sequences were
characterized by the presence of the eye-path trace within a
defined area of interest. AOIs included facial features such as
eyes, eyebrows, nose, mouth, cheeks, ears, forehead and hair.
Each pass over an AOI, regardless of duration, was represented by
a single element in the sequence.
Optimal matching analysis (OMA) was used to compare two
coded sequences in the video (the first appearance of the suspect
and a later close-up) and the multiple coded sequences across the
Figure 1: Example of photo lineup with suspect in E position.
4.3 Data Collection
Participants were instructed to “carefully” watch the video as if
they were the person shown sitting in a car witnessing what was
happening. The video was shown on a 15-inch laptop screen at a
resolution of 1280 by 800 pixels and 32-bit color. It was viewed
at a normal distance of about 18 to 24 inches under ideal indoor
Figure 2: The eye-path sequence for Subject 4 as he scanned
lighting conditions.
the face of the suspect in the close-up in the video.
50
3. thief’s photo in the lineup to generate a matrix of distance indexes
for these eye-path sequences. OMA is a string-edit tool for Cluster 1: (length range 1 to 4, mean 2.7)
finding the Levenshtein distance between any sequences Members: 4A, 6A, 6B
composed of elements from the same sequence alphabet. Short sequences beginning with gazes to the mouth or chin
Sequences similar in composition and sequence will, when
compared, have smaller distances; the more different the
sequences, the greater the distance. To adjust for the role of Cluster 2: (length range 2 to 6, mean 3.9)
sequence length in the total cost of alignment, the inter-sequence Members: 2B, 4B, 7A, 7C, 9B, 11B, 11C, 13A, 13B, 13C
distance is determined by dividing the total alignment cost by the Sequences tend to include glances at the nose and left cheek.
length of the longer sequence in the sequence. Given the
normalization of the distance index by the length of the longer of
the sequences, the distance index can range from 0 for identical Cluster 3: (length range 1 to 12, mean 4.1)
sequences to 1 for maximally dissimilar sequences. Examples, Members: 1A, 3A, 3C, 4C, 5A, 5B, 9A, 12B
illustrations and applications of OMA can be found in Kruskal Sequences tend to include glances at hair and the right eye.
[1983].
Alignments may use a combination of substitutions, insertions
Cluster 4: (length range 2 to 7, mean 4.8)
and deletions to produce the Levenshtein distance. Brandt and
Members: 2C, 3B, 5C, 6C, 7B, 9C, 11A, 12A, 12C
Stark [1997] set equal substitution costs for all pairs of sequence
Sequences tend to include mid-sequence gazes at the bridge of the
elements. Similarly, in this study equal substitution costs were
nose and middle or late glances at the right eye.
used because there was no compelling reason to differentiate
those costs given the similarity of the images and their close
arrangement to each other. Table 1: Clusters for two viewings of the suspect in the video
(A and B), as well as the first scan across the suspect’s face in
5 Sequence Comparison Results the photo lineup (C).
SPSS software, version 15 [SPSS 2007] was used to perform "A" sequences are naturally constrained in by the duration of
hierarchical cluster analysis, Ward’s method (CLUSTER) and appearance of the thief's face appears on video length (as are “B”
non-metric multidimensional scaling (PROXSCAL) on the sequences though somewhat less so). “C” sequences are less
scanpath distance matrices generated by comparing the initial constrained due to the unchanging nature of still images.
viewing of the suspect’s face in the video, the close-up of the However, while the longest sequences tended to be generated by
suspect’s face in the video, and the first scan of the thief's face in the photo lineup, all three sequence types have instances as short
the photo lineup. It was also used to compare the multiple as one or two scanpath regions.
scanpaths across the suspect’s photo in the lineup. Scaling
arranges sequences in n-dimensional space such that the spatial Sequence length appears to be associated with cluster
arrangement approximates the distances between sequences; membership, with the shortest average sequence length in Cluster
cluster analysis defines “neighborhoods” of similar cases within 1 and the longest average sequence length in Cluster 4. However,
that space. sequence content also drives the clustering. Key content
differentiators are noted in Table 1.
5.1 Cluster Analysis
Finally, seven of 10 subjects had repeated scanpath sub-sequences
Dendograms and agglomeration schedules for the hierarchical within the clusters. Cluster 1 contains two scanpaths from Subject
clustering of the scanpath sequence distance matrix were 6; Cluster 2 has two scanpaths from Subjects 7 and 11, and three
examined to identify clustering stages which yielded a sharp from Subject 13; Cluster 3 has two scanpaths from Subjects 3 and
increase in distance coefficients. Review of the results yielded 5; and Cluster 4 contains two scanpaths from Subject 12. Thus
four clusters at stage 25 of the 39-stage clustering. The member this preliminary analysis showed support for repeated scanpath
sequences of the four clusters are listed and described in Table 1. sub-sequences.
Sequences are identified by case number and sequence type. "A" 5.2 MDS Results
sequences are composed of the scanpath AOI sequence over the
short, initial glimpse of the thief's face in the video. "B" sequences Scree plots of stress for one- to five-dimension MDS solutions
are composed of the scanpath AOI sequence over the longer first revealed an elbow in the stress curve at two dimensions, yielding
closeup of the thief's face in the video, and "C" sequences are a solution with an acceptable level of stress (stress = .10). The
composed of the first scanpath AOI sequence over the thief's face two-dimensional MDS solution is displayed in Figure 3 with the
in the photo lineup. Hence "2A" is the first video scanpath for cluster analysis results superimposed.
Subject 2.
While neither dimension of the MDS solution produced a
The mean sequence lengths in Table 1 suggest sequence length is significant correlation with sequence length (Dim 1, r=0.26,
associated with cluster membership. This is unsurprising given p=.89; Dim 2, r=-.14, p=.45), the association of length to
the role of length in contributing to the Levenshtein distance. sequence groupings and placement in the MDS solution is
However, visual examination of the sequences reveals content apparent when looking from Cluster 1, in the upper left of the
differences as well as length differences with each cluster showing plot, to Cluster 4, in the lower right of the plot. Sequence length
at least one content pattern of interest.
51
4. component glances. Much of the repetition is repeated
occurrences of a pair within a single scanpath rather than across
multiple scanpaths within or between subjects.
6 Conclusion
Preliminary analysis showed support for repeated scanpath sub-
sequences. Six of 10 subjects had two repeated scanpath sub-
sequences that clustered. These two scanpaths consisted of the
two frontal viewings of the suspect in the video or one of these
scanpaths with the initial scanpath across his picture in the photo
lineup. One subject’s scanpaths for all three viewings were
grouped within one cluster. When a subject’s multiple scanpaths
across the suspect’s photo in the lineup were compared, nine
types of within-subjects repetition of short scanpaths occurred
more often than expected due to chance, also adding support for
repeated scanpath sub-sequences. While this research is
Figure 3: Sequence comparison MDS solution in two preliminary in nature, it shows limited support for Noton and
dimensions with cluster results superimposed. Stark’s [1971a, 1971b) “scanpath theory” of visual perception and
memory in an eyewitness identification situation.
varies within clusters but shows a general trend from shorter to
longer from the upper left to lower right. While small sequence References
lengths and group sizes preclude statistical testing for
relationships between placement in the MDS solution and BRANDT, S.A. AND STARK, L. W. 1997. Spontaneous Eye
sequence content, an assumption can be made that the dispersion Movements During Visual Imagery Reflect the Content of the
on the axes results from a combination of sequence length and Visual Scene, Journal of Cognitive Neuroscience 9, 1, 27-38.
sequence composition differences.
INNOCENCE PROJECT 2009. Innocence Project Home.
5.3 Transition pair frequency Retrieved October 10, 2009, from http://www.innocence
project.org.
Adjacent pairs of AOI glances constitute the smallest possible
subsequences of the full scanpaths. We tabulated all transition ISCAN, INC. January 1998. RK-726PCI Pupil/Corneal
pairs without regard to AOI order in the pair (i.e., a hair to Reflection Tracking System PCI Card Version Operating
forehead transition is treated as equal to a forehead to hair Instructions. Rikki Razdan.
transition). Table 2 lists all such pairs occurring at least four
JOSEPHSON, S. and HOLMES, M. E. November 2002.
times in the full data set. Each listed pair appears at a greater
Attention to Repeated Images on the World-Wide Web:
frequency than predicted by the simple joint probability of its
Another Look at Scanpath Theory. Behavior Research Methods,
Transition pairs of Expected Observed Instruments, & Computers 34, 4, 539-548.
facial features probability frequency
KRUSKAL, J. B. 1983. An overview of sequence comparison. In
D. Sankoff and J.B. Kruskal (Eds.), Time warps, string edits,
right eye to bridge of nose 1.34% 12.1%
and macromolecules: the theory and practice of sequence
left check to nose 1.58% 7.7% comparison, 1-44. Addison-Wesley, Reading, MA.
right eye to right cheek 1.22% 6.6% NOTON, D. and STARK, L. W. 1971a. Scanpaths in Saccadic
Eye Movements While Viewing and Recognizing Patterns,
hair to forehead 0.40% 5.5% Vision Research 11, 929-942.
nose to upper lip 0.67% 5.5% NOTON, D. and STARK. L. W. 1971b. Scanpaths in Eye
Movements During Pattern Perception, Science 171, 308-311.
nose to right eye 2.14% 4.4%
SALVUCCI, D. D. and ANDERSON, J. R. 2001. Automated
left eye to bridge of nose 0.50% 4.4% Eye-Movement Protocol Analysis, Human-Computer
Interaction 16, 1.
left check to left eye 0.70% 4.4%
SPSS INC. 2007. SPSS Base 15.0 for Windows User's Guide.
mouth to upper lip 0.30% 4.4% SPSS Inc.
Table 2: Transitions between AOI pairs occurring at least four
times.
52