SlideShare a Scribd company logo
1 of 20
Download to read offline
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.


         IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS                                                                                                             1




                               Attention and Visual Memory
                         in Visualization and Computer Graphics
                                         Christopher G. Healey, Senior Member, IEEE, and James. T. Enns

                Abstract—A fundamental goal of visualization is to produce images of data that support visual analysis, exploration, and discovery of
                novel insights. An important consideration during visualization design is the role of human visual perception. How we “see” details in
                an image can directly impact a viewer’s ef ciency and effectiveness. This article surveys research on attention and visual perception,
                with a speci c focus on results that have direct relevance to visualization and visual analytics. We discuss theories of low-level visual
                perception, then show how these ndings form a foundation for more recent work on visual memory and visual attention. We conclude
                with a brief overview of how knowledge of visual attention and visual memory is being applied in visualization and graphics. We also
                discuss how challenges in visualization are motivating research in psychophysics.

                Index Terms—Attention, color, motion, nonphotorealism, texture, visual memory, visual perception, visualization.

                                                                                          ✦



         1      I NTRODUCTION                                                                 attention is focused and what is already in our minds
                                                                                              prior to viewing an image. Finally, we describe sev-
         H        UMAN  perception plays an important role in the
                area of visualization. An understanding of per-
         ception can signi cantly improve both the quality and
                                                                                              eral studies in which human perception has in uenced
                                                                                              the development of new methods in visualization and
         the quantity of information being displayed [1]. The                                 graphics.
         importance of perception was cited by the NSF panel on
         graphics and image processing that proposed the term                                 2     P REATTENTIVE P ROCESSING
         “scienti c visualization” [2]. The need for perception
         was again emphasized during recent DOE–NSF and                                       For many years vision researchers have been investigat-
         DHS panels on directions for future research in visu-                                ing how the human visual system analyzes images. One
         alization [3].                                                                       feature of human vision that has an impact on almost
            This document summarizes some of the recent de-                                   all perceptual analyses is that, at any given moment de-
         velopments in research and theory regarding human                                    tailed vision for shape and color is only possible within
         psychophysics, and discusses their relevance to scien-                               a small portion of the visual eld (i.e., an area about the
         ti c and information visualization. We begin with an                                 size of your thumbnail when viewed at arm’s length).
         overview of the way human vision rapidly and au-                                     In order to see detailed information from more than
         tomatically categorizes visual images into regions and                               one region, the eyes move rapidly, alternating between
         properties based on simple computations that can be                                  brief stationary periods when detailed information is
         made in parallel across an image. This is often referred                             acquired—a xation—and then icking rapidly to a new
         to as preattentive processing. We describe ve theories of                            location during a brief period of blindness—a saccade.
         preattentive processing, and brie y discuss related work                             This xation–saccade cycle repeats 3–4 times each second
         on ensemble coding and feature hierarchies. We next                                  of our waking lives, largely without any awareness
         examine several recent areas of research that focus on the                           on our part [4], [5], [6], [7]. The cycle makes seeing
         critical role that the viewer’s current state of mind plays                          highly dynamic. While bottom-up information from each
         in determining what is seen, speci cally, postattentive                                xation is in uencing our mental experience, our current
         amnesia, memory-guided attention, change blindness,                                  mental states—including tasks and goals—are guiding
         inattentional blindness, and the attentional blink. These                            saccades in a top-down fashion to new image locations
         phenomena offer a perspective on early vision that is                                for more information. Visual attention is the umbrella
         quite different from the older view that early visual pro-                           term used to denote the various mechanisms that help
         cesses are re exive and in exible. Instead, they highlight                           determine which regions of an image are selected for
         the fact that what we see depends critically on where                                more detailed analysis.
                                                                                                 For many years, the study of visual attention in hu-
         • C. G. Healey is with the Department of Computer Science, North Carolina            mans focused on the consequences of selecting an object or
           State University, Raleigh, NC, 27695.                                              location for more detailed processing. This emphasis led
           E-mail: healey@csc.ncsu.edu
         • J. T. Enns is with the Department of Psychology, University of British
                                                                                              to theories of attention based on a variety of metaphors
           Columbia, Vancouver, BC, Canada, V6T 1Z4.                                          to account for the selective nature of perception, in-
           E-mail: jenns@psych.ubc.ca                                                         cluding ltering and bottlenecks [8], [9], [10], limited
                                                                                              resources and limited capacity [11], [12], [13], and mental



Digital Object Indentifier 10.1109/TVCG.2011.127                          1077-2626/11/$26.00 © 2011 IEEE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.


IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS                                                                                                           2



effort or cognitive skill [14], [15]. An important discovery
in these early studies was the identi cation of a limited
set of visual features that are detected very rapidly by
low-level, fast-acting visual processes. These properties
were initially called preattentive, since their detection
seemed to precede focused attention, occurring within
the brief period of a single xation. We now know that
attention plays a critical role in what we see, even at this
early stage of vision. The term preattentive continues to
be used, however, for its intuitive notion of the speed                                               (a)                                     (b)
and ease with which these properties are identi ed.
   Typically, tasks that can be performed on large multi-
element displays in less than 200–250 milliseconds
(msec) are considered preattentive. Since a saccade takes
at least 200 msec to initiate, viewers complete the task
in a single glance. An example of a preattentive task
is the detection of a red circle in a group of blue circles
(Figs. 1a–b). The target object has a visual property “red”
that the blue distractor objects do not. A viewer can
easily determine whether the target is present or absent.
   Hue is not the only visual feature that is preattentive.                                           (c)                                     (d)
In Figs. 1c–d the target is again a red circle, while
the distractors are red squares. Here, the visual system
identi es the target through a difference in curvature.
   A target de ned by a unique visual property—a red
hue in Figs. 1a–b, or a curved form in Figs. 1c–d—allows
it to “pop out” of a display. This implies that it can be
easily detected, regardless of the number of distractors.
In contrast to these effortless searches, when a target is
de ned by the joint presence of two or more visual prop-
erties it often cannot be found preattentively. Figs. 1e–                                             (e)                                      (f)
f show an example of these more dif cult conjunction
                                                                                   Fig. 1. Target detection: (a) hue target red circle absent;
searches. The red circle target is made up of two features:
                                                                                   (b) target present; (c) shape target red circle absent; (d)
red and circular. One of these features is present in each
                                                                                   target present; (e) conjunction target red circle present; (f)
of the distractor objects—red squares and blue circles. A
                                                                                   target absent
search for red items always returns true because there
are red squares in each display. Similarly, a search for
circular items always sees blue circles. Numerous studies
                                                                                           ary between two groups of elements, where all of
have shown that most conjunction targets cannot be
                                                                                           the elements in each group have a common visual
detected preattentively. Viewers must perform a time-
                                                                                           property (see Fig. 10),
consuming serial search through the display to con rm
                                                                                       •   region tracking: viewers track one or more elements
its presence or absence.
                                                                                           with a unique visual feature as they move in time
   If low-level visual processes can be harnessed during
                                                                                           and space, and
visualization, they can draw attention to areas of poten-
                                                                                       •   counting and estimation: viewers count the number of
tial interest in a display. This cannot be accomplished in
                                                                                           elements with a unique visual feature.
an ad-hoc fashion, however. The visual features assigned
to different data attributes must take advantage of the
strengths of our visual system, must be well-suited to the                         3       T HEORIES        OF    P REATTENTIVE V ISION
viewer’s analysis needs, and must not produce visual                               A number of theories attempt to explain how preat-
interference effects (e.g., conjunction search) that mask                          tentive processing occurs within the visual system. We
information.                                                                       describe ve well-known models: feature integration,
   Fig. 2 lists some of the visual features that have been                         textons, similarity, guided search, and boolean maps.
identi ed as preattentive. Experiments in psychology                               We next discuss ensemble coding, which shows that
have used these features to perform the following tasks:                           viewers can generate summaries of the distribution of
   • target detection: viewers detect a target element with                        visual features in a scene, even when they are unable to
      a unique visual feature within a eld of distractor                           locate individual elements based on those same features.
      elements (Fig. 1),                                                           We conclude with feature hierarchies, which describe
   • boundary detection: viewers detect a texture bound-                           situations where the visual system favors certain visual
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.


IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS                                                                                                           3




          orientation                                   length                                   closure                                    size
      [16], [17], [18], [19]                           [20], [21]                                  [16]                                   [22], [23]




            curvature                                  density                                  number                                     hue
               [21]                                      [23]                                [20], [24], [25]                  [23], [26], [27], [28], [29]




            luminance                               intersections                             terminators                                3D depth
          [21], [30], [31]                               [16]                                     [16]                                   [32], [33]




               icker                           direction of motion                       velocity of motion                         lighting direction
   [34], [35], [36], [37], [38]                   [33], [38], [39]                    [33], [38], [39], [40], [41]                       [42], [43]
Fig. 2. Examples of preattentive visual features, with references to papers that investigated each feature’s capabilities


features over others. Because we are interested equally                            focusing in particular on the image features that led to
in where viewers attend in an image, as well as to what                            selective perception. She was inspired by a physiological
they are attending, we will not review theories focusing                             nding that single neurons in the brains of monkeys
exclusively on only one of these functions (e.g., the                              responded selectively to edges of a speci c orientation
attention orienting theory of Posner and Petersen [44]).                           and wavelength. Her goal was to nd a behavioral
                                                                                   consequence of these kinds of cells in humans. To do
3.1 Feature Integration Theory                                                     this, she focused on two interrelated problems. First, she
Treisman was one of the rst attention researchers to sys-                          tried to determine which visual properties are detected
tematically study the nature of preattentive processing,                           preattentively [21], [45], [46]. She called these properties
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.


IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS                                                                                                           4


                          individual feature maps

   red                                                           luminance

 green                                                           orientation

 yellow                                                          size

   blue                                                          contrast


                                                           master
                                                            map
                               focus of attention

Fig. 3. Treisman’s feature integration model of early
vision—individual maps can be accessed in parallel to                                  (a)         (b)                                (c)
detect feature activity, but focused attention is required to
combine features at a common spatial location [22]                                 Fig. 4. Textons: (a,b) two textons A and B that appear
                                                                                   different in isolation, but have the same size, number
                                                                                   of terminators, and join points; (c) a target group of B-
“preattentive features” [47]. Second, she formulated a                             textons is dif cult to detect in a background of A-textons
hypothesis about how the visual system performs preat-                             when random rotation is applied [49]
tentive processing [22].
   Treisman ran experiments using target and boundary
detection to classify preattentive features (Figs. 1 and 10),                      has a unique feature, one can simply access the given
measuring performance in two different ways: by re-                                feature map to see if any activity is occurring. Feature
sponse time, and by accuracy. In the response time model                           maps are encoded in parallel, so feature detection is
viewers are asked to complete the task as quickly as                               almost instantaneous. A conjunction target can only be
possible while still maintaining a high level of accuracy.                         detected by accessing two or more feature maps. In
The number of distractors in a scene is varied from few                            order to locate these targets, one must search serially
to many. If task completion time is relatively constant                            through the master map of locations, looking for an
and below some chosen threshold, independent of the                                object that satis es the conditions of having the correct
number of distractors, the task is said to be preattentive                         combination of features. Within the model, this use of
(i.e., viewers are not searching through the display to                            focused attention requires a relatively large amount of
locate the target).                                                                time and effort.
   In the accuracy version of the same task, the display                              In later work, Treisman has expanded her strict di-
is shown for a small, xed exposure duration, then                                  chotomy of features being detected either in parallel or in
removed. Again, the number of distractors in the scene                             serial [21], [45]. She now believes that parallel and serial
varies across trials. If viewers can complete the task accu-                       represent two ends of a spectrum that include “more”
rately, regardless of the number of distractors, the feature                       and “less,” not just “present” and “absent.” The amount
used to de ne the target is assumed to be preattentive.                            of difference between the target and the distractors will
   Treisman and others have used their experiments to                              affect search time. For example, a long vertical line can
compile a list of visual features that are detected preat-                         be detected immediately among a group of short vertical
tentively (Fig. 2). It is important to note that some of                           lines, but a medium-length line may take longer to see.
                                                                                      Treisman has also extended feature integration to ex-
these features are asymmetric. For example, a sloped line
                                                                                   plain situations where conjunction search involving mo-
in a sea of vertical lines can be detected preattentively,
                                                                                   tion, depth, color, and orientation have been shown to be
but a vertical line in a sea of sloped lines cannot.
                                                                                   preattentive [33], [39], [48]. Treisman hypothesizes that
   In order to explain preattentive processing, Treisman
                                                                                   a signi cant target–nontarget difference would allow
proposed a model of low-level human vision made up
                                                                                   individual feature maps to ignore nontarget information.
of a set of feature maps and a master map of locations
                                                                                   Consider a conjunction search for a green horizontal bar
(Fig. 3). Each feature map registers activity for a spe-
                                                                                   within a set of red horizontal bars and green vertical
ci c visual feature. When the visual system rst sees
                                                                                   bars. If the red color map could inhibit information about
an image, all the features are encoded in parallel into
                                                                                   red horizontal bars, the search reduces to nding a green
their respective maps. A viewer can access a particular
                                                                                   horizontal bar in a sea of green vertical bars, which
map to check for activity, and perhaps to determine the
                                                                                   occurs preattentively.
amount of activity. The individual feature maps give
no information about location, spatial arrangement, or
relationships to activity in other maps, however.                                  3.2 Texton Theory
   This framework provides a general hypothesis that                               Jul´ sz was also instrumental in expanding our under-
                                                                                      e
explains how preattentive processing occurs. If the target                         standing of what we “see” in a single xation. His start-
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.


IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS                                                                                                           5



ing point came from a dif cult computational problem
in machine vision, namely, how to de ne a basis set for
the perception of surface properties. Jul´ sz’s initial in-
                                            e
vestigations focused on determining whether variations
in order statistics were detected by the low-level visual
system [37], [49], [50]. Examples included contrast—a
  rst-order statistic—orientation and regularity—second-                                              (a)                                     (b)
order statistics—and curvature—a third-order statistic.
                                                                                   Fig. 5. N-N similarity affecting search ef ciency for an
Jul´ sz’s results were inconclusive. First-order variations
    e
                                                                                   L-shaped target: (a) high N-N (nontarget-nontarget) sim-
were detected preattentively. Some, but not all, second-
                                                                                   ilarity allows easy detection of the target L; (b) low N-N
order variations were also preattentive. as were an even
                                                                                   similarity increases the dif culty of detecting the target L
smaller set of third-order variations.
                                                                                   [55]
   Based on these ndings, Jul´ sz modi ed his theory
                                  e
to suggest that the early visual system detects three
categories of features called textons [49], [51], [52]:                               • as N-N similarity decreases, search ef ciency de-
   1) Elongated blobs—Lines, rectangles, or ellipses—                                   creases and search time increases, and
       with speci c hues, orientations, widths, and so on.                            • T-N and N-N similarity are related; decreasing N-N
   2) Terminators—ends of line segments.                                                similarity has little effect if T-N similarity is low;
   3) Crossings of line segments.                                                       increasing T-N similarity has little effect if N-N
   Jul´ sz believed that only a difference in textons or in
      e                                                                                 similarity is high.
their density could be detected preattentively. No posi-                              Treisman’s feature integration theory has dif culty
tional information about neighboring textons is available                          explaining Fig. 5. In both cases, the distractors seem
without focused attention. Like Treisman, Jul´ sz sug-
                                                   e                               to use exactly the same features as the target: oriented,
gested that preattentive processing occurs in parallel and                         connected lines of a xed length. Yet experimental results
focused attention occurs in serial.                                                show displays similar to Fig. 5a produce an average
   Jul´ sz used texture segregation to demonstrate his the-
      e                                                                            search time increase of 4.5 msec per distractor, versus
ory. Fig. 4 shows an example of an image that supports                             54.5 msec per distractor for displays similar to Fig. 5b. To
the texton hypothesis. Although the two objects look                               explain this, Duncan and Humphreys proposed a three-
very different in isolation, they are actually the same                            step theory of visual selection.
texton. Both are blobs with the same height and width,
                                                                                      1) The visual eld is segmented in parallel into struc-
made up of the same set of line segments with two termi-
                                                                                         tural units that share some common property, for
nators. When oriented randomly in an image, one cannot
                                                                                         example, spatial proximity or hue. Structural units
preattentively detect the texture boundary between the
                                                                                         may again be segmented, producing a hierarchical
target group and the background distractors.
                                                                                         representation of the visual eld.
                                                                                      2) Access to visual short-term memory is a limited
3.3 Similarity Theory                                                                    resource. During target search a template of the tar-
Some researchers did not support the dichotomy of                                        get’s properties is available.The closer a structural
serial and parallel search modes. They noted that groups                                 unit matches the template, the more resources it
of neurons in the brain seemed to be competing over                                      receives relative to other units with a poorer match.
time to represent the same object. Work in this area                                  3) A poor match between a structural unit and the
by Quinlan and Humphreys therefore began by inves-                                       search template allows ef cient rejection of other
tigating two separate factors in conjunction search [53].                                units that are strongly grouped to the rejected unit.
First, search time may depend on the number of items                                  Structural units that most closely match the target
of information required to identify the target. Second,                            template have the highest probability of access to visual
search time may depend on how easily a target can                                  short-term memory. Search speed is therefore a function
be distinguished from its distractors, regardless of the                           of the speed of resource allocation and the amount
presence of unique preattentive features. Follow-on work                           of competition for access to visual short-term memory.
by Duncan and Humphreys hypothesized that search                                   Given this, we can see how T-N and N-N similarity affect
ability varies continuously, and depends on both the                               search ef ciency. Increased T-N similarity means more
type of task and the display conditions [54], [55], [56].                          structural units match the template, so competition for
Search time is based on two criteria: T-N similarity and                           visual short-term memory access increases. Decreased
N-N similarity. T-N similarity is the amount of similarity                         N-N similarity means we cannot ef ciently reject large
between targets and nontargets. N-N similarity is the                              numbers of strongly grouped structural units, so re-
amount of similarity within the nontargets themselves.                             source allocation time and search time increases.
These two factors affect search time as follows:                                      Interestingly, similarity theory is not the only attempt
   • as T-N similarity increases, search ef ciency de-                             to distinguish between preattentive and attentive results
     creases and search time increases,                                            based on a single parallel process. Nakayama and his
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.


IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS                                                                                                           6


                                                       wn
                              ou
                                   r               -do
                        col                     top green
                                               green green

                                       Y
              e                    B
            ag
       im                  G                              up
                       R
                              tio
                                 n                      m-
                                                  tto
                           nta                  bo
                      orie                             steep
                                               steep
                                                   steep
                                       right
                              left
                         steep
                   shallow
                                                                                                      (a)                                     (b)
Fig. 6. Guided search for steep green targets, an image
is ltered into categories for each feature map, bottom-
up and top-down activation “mark” target regions, and an
activation map combines the information to draw attention
to the highest “hills” in the map [61]


colleagues proposed the use of stereo vision and oc-
clusion to segment a three-dimensional scene, where
preattentive search could be performed independently
                                                                                                      (c)                                     (d)
within a segment [33], [57]. Others have presented sim-
ilar theories that segment by object [58], or by signal                            Fig. 7. Boolean maps: (a) red and blue vertical and
strength and noise [59]. The problem of distinguishing                             horizontal elements; (b) map for “red”, color label is red,
serial from parallel processes in human cognition is one                           orientation label is unde ned; (c) map for “vertical”, orien-
of the longest-standing puzzles in the eld, and one that                           tation label is vertical, color label is unde ned; (d) map for
researchers often return to [60].                                                  set intersection on “red” and “vertical” maps [64]


3.4 Guided Search Theory                                                           top-down in uence. A viewer’s attention is drawn from
More recently, Wolfe et al. has proposed the theory                                hill to hill in order of decreasing activation.
of “guided search” [48], [61], [62]. This was the rst                                In addition to traditional ”parallel” and ”serial” tar-
attempt to actively incorporate the goals of the viewer                            get detection, guided search explains similarity theory’s
into a model of human search. He hypothesized that                                 results. Low N-N similarity causes more distractors to re-
an activation map based on both bottom-up and top-                                 port bottom-up activation, while high T-N similarity re-
down information is constructed during visual search.                              duces the target element’s bottom-up activation. Guided
Attention is drawn to peaks in the activation map that                             search also offers a possible explanation for cases where
represent areas in the image with the largest combination                          conjunction search can be performed preattentively [33],
of bottom-up and top-down in uence.                                                [48], [63]: viewer-driven top-down activation may permit
   As with Treisman, Wolfe believes early vision divides                           ef cient search for conjunction targets.
an image into individual feature maps (Fig. 6). Within
each map a feature is ltered into multiple categories,
for example, colors might be divided into red, green,                              3.5 Boolean Map Theory
blue, and yellow. Bottom-up activation follows feature                             A new model of low-level vision has been presented by
categorization. It measures how different an element is                            Huang et al. to study why we often fail to notice features
from its neighbors. Top-down activation is a user-driven                           of a display that are not relevant to the immediate task
attempt to verify hypotheses or answer questions by                                [64], [65]. This theory carefully divides visual search
“glancing” about an image, searching for the necessary                             into two stages: selection and access. Selection involves
visual information. For example, visual search for a                               choosing a set of objects from a scene. Access determines
“blue” element would generate a top-down request that                              which properties of the selected objects a viewer can
is drawn to blue locations. Wolfe argued that viewers                              apprehend.
must specify requests in terms of the categories provided                             Huang suggests that the visual system can divide a
by each feature map [18], [31]. Thus, a viewer could                               scene into two parts: selected elements and excluded
search for “steep” or “shallow” elements, but not for                              elements. This is the “boolean map” that underlies his
elements rotated by a speci c angle.                                               theory. The visual system can then access certain proper-
   The activation map is a combination of bottom-up and                            ties of the selected elements for more detailed analysis.
top-down activity. The weights assigned to these two                                  Boolean maps are created in two ways. First, a viewer
values are task dependent. Hills in the map mark regions                           can specify a single value of an individual feature to
that generate relatively large amounts of bottom-up or                             select all objects that contain the feature value. For
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.


IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS                                                                                                           7




                 (a)                                     (b)                                        (a)                                       (b)
Fig. 8. Conjunction search for a blue horizontal target with                       Fig. 9. Estimating average size: (a) average size of green
boolean maps, select “blue” objects, then search within for                        elements is larger; (b) average size of blue elements is
a horizontal target: (a) target present; (b) target absent                         larger [66]


example, a viewer could look for red objects, or vertical                             Fig. 8 shows two examples of searching for a blue hor-
objects. If a viewer selected red objects (Fig. 7b), the                           izontal target. Viewers can apply the following strategy
color feature label for the resulting boolean map would                            to search for the target. First, search for blue objects,
be “red”. Labels for other features (e.g., orientation,                            and once these are “held” in your memory, look for a
size) would be unde ned, since they have not (yet)                                 horizontal object within that group. For most observers
participated in the creation of the map. A second method                           it is not dif cult to determine the target is present in
of selection is for a viewer to choose a set of elements at                        Fig. 8a and absent in Fig. 8b.
speci c spatial locations. Here the boolean map’s feature
labels are left unde ned, since no speci c feature value                           3.6 Ensemble Coding
was used to identify the selected elements. Figs. 7a–c                             All the preceding characterizations of preattentive vision
show an example of a simple scene, and the resulting                               have focused on how low level-visual processes can be
boolean maps for selecting red objects or vertical objects.                        used to guide attention in a larger scene and how a
   An important distinction between feature integration                            viewer’s goals interact with these processes. An equally
and boolean maps is that, in feature integration, presence                         important characteristic of low-level vision is its ability
or absence of a feature is available preattentively, but                           to generate a quick summary of how simple visual fea-
no information on location is provided. A boolean map                              tures are distributed across the eld of view. The ability
encodes the speci c spatial locations of the elements that                         of humans to register a rapid and in-parallel summary
are selected, as well as feature labels to de ne properties                        of a scene in terms of its simple features was rst
of the selected objects.                                                           reported by Ariely [66]. He demonstrated that observers
   A boolean map can also be created by applying the                               could extract the average size of a large number of dots
set operators union or intersection on two existing maps                           from a single glimpse of a display. Yet, when observers
(Fig. 7d). For example, a viewer could create an initial                           were tested on the same displays and asked to indicate
map by selecting red objects (Fig. 7b), then select vertical                       whether a speci c dot of a given size was present, they
objects (Fig. 7c) and intersect the vertical map with                              were unable to do so. This suggests that there is a
the red map currently held in memory. The result is a                              preattentive mechanism that records summary statistics
boolean map identifying the locations of red, vertical                             of visual features without retaining information about
objects (Fig. 7d). A viewer can only retain a single                               the constituent elements that generated the summary.
boolean map. The result of the set operation immediately                              Other research has followed up on this remarkable
replaces the viewer’s current map.                                                 ability, showing that rapid averages are also computed
   Boolean maps lead to some surprising and counter-                               for the orientation of simple edges seen only in pe-
intuitive claims. For example, consider searching for a                            ripheral vision [67], for color [68] and for some higher-
blue horizontal target in a sea of red horizontal and                              level qualities such as the emotions expressed—happy
blue vertical objects. Unlike feature integration or guided                        versus sad—in a group of faces [69]. Exploration of the
search, boolean map theory says this type of combined                              robustness of the ability indicates the precision of the
feature search is more dif cult because it requires two                            extracted mean is not compromised by large changes in
boolean map operations in series: creating a blue map,                             the shape of the distribution within the set [68], [70].
then creating a horizontal map and intersecting it against                            Fig. 9 shows examples of two average size estimation
the blue map to hunt for the target. Importantly, how-                             trials. Viewers are asked to report which group has
ever, the time required for such a search is constant and                          a larger average size: blue or green. In Fig. 9a each
independent of the number of distractors. It is simply the                         group contains six large and six small elements, but the
sum of the time required to complete the two boolean                               green elements are all larger than their blue counterparts,
map operations.                                                                    resulting in a larger average size for the green group. In
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.


IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS                                                                                                           8



Fig. 9b the large and small elements in each group are
the same size, but there are more large blue elements
than large green elements, producing a larger average
size for the blue group. In both cases viewers responded
with 75% accuracy or greater for diameter differences of
only 8–12%. Ensemble encoding of visual properties may
help to explain our experience of gist, the rich contextual
information we are able to obtain from the briefest of
glimpses at a scene.
   This ability may offer important advantages in certain                                             (a)                                     (b)
visualization environments. For example, given a stream
of real-time data, ensemble coding would allow viewers                             Fig. 10. Hue-on-form hierarchy: (a) horizontal form
to observe the stream at a high frame rate, yet still iden-                        boundary is masked when hue varies randomly; (b) verti-
tify individual frames with interesting relative distribu-                         cal hue boundary preattentively identi ed even when form
tions of visual features (i.e., attribute values). Ensemble                        varies randomly [72]
coding would also be critical for any situation where
viewers want to estimate the amount of a particular data
                                                                                   to a particular object in a scene?” An equally interesting
attribute in a display. These capabilities were hinted at
                                                                                   question is, “What do we remember about an object
in a paper by Healey et al. [71], but without the bene t
                                                                                   or a scene when we stop attending to it and look
of ensemble coding as a possible explanation.
                                                                                   at something else?” Many viewers assume that as we
3.7 Feature Hierarchy                                                              look around us we are constructing a high-resolution,
                                                                                   fully detailed description of what we see. Researchers
One of the most important considerations for a visu-                               in psychophysics have known for some time that this is
alization designer is deciding how to present informa-                             not true [21], [47], [61], [79], [80]. In fact, in many cases
tion in a display without producing visual confusion.                              our memory for detail between glances at a scene is very
Consider, for example, the conjunction search shown in                             limited. Evidence suggests that a viewer’s current state
Figs. 1e–f. Another important type of interference results                         of mind can play a critical role in determining what is
from a feature hierarchy that appears to exist in the                              being seen at any given moment, what is not being seen,
visual system. For certain tasks one visual feature may                            and what will be seen next.
be “more salient” than another. For example, during
boundary detection Callaghan showed that the visual
system favors color over shape [72]. Background varia-                             4.1 Eye Tracking
tions in color slowed—but did not completely inhibit—                              Although the dynamic interplay between bottom-up and
a viewer’s ability to preattentively identify the presence                         top-down processing was already evident in the early
of spatial patterns formed by different shapes (Fig. 10a).                         eye tracking research of Yarbus [4], some modern theo-
If color is held constant across the display, these same                           rists have tried to predict human eye movements during
shape patterns are immediately visible. The interference                           scene viewing with a purely bottom-up approach. Most
is asymmetric: random variations in shape have no effect                           notably, Itti and Koch [6] developed the saliency theory
on a viewer’s ability to see color patterns (Fig. 10b).                            of eye movements based on Treisman’s feature integra-
Luminance-on-hue and hue-on-texture preferences have                               tion theory. Their guiding assumption was that during
also been found [23], [47], [73], [74], [75].                                      each xation of a scene, several basic feature contrasts—
   Feature hierarchies suggest the most important data                             luminance, color, orientation—are processed rapidly and
attributes should be displayed with the most salient                               in parallel across the visual eld, over a range of spatial
visual features, to avoid situations where secondary data                          scales varying from ne to coarse. These analyses are
values mask the information the viewer wants to see.                               combined into a single feature-independent “conspicuity
   Various researchers have proposed theories for how                              map” that guides the deployment of attention and there-
visual features compete for attention [76], [77], [78]. They                       fore the next saccade to a new location, similar to Wolfe’s
point to a rough order of processing: (1) determine the                            activation map (Fig. 6). The model also includes an
3D layout of a scene; (2) determine surface structures                             inhibitory mechanism—inhibition of return—to prevent
and volumes; (3) establish object movement; (4) interpret                          repeated attention and xation to previously viewed
luminance gradients across surfaces; and (5) use color                             salient locations.
to ne-tune these interpretations. If a con ict arises                                 The surprising outcome of applying this model to
between levels, it is usually resolved in favor of giving                          visual inspection tasks, however, has not been to suc-
priority to an earlier process.                                                    cessfully predict eye movements of viewers. Rather, its
                                                                                   bene t has come from making explicit the failure of a
4   V ISUAL E XPECTATION                   AND     M EMORY                         purely bottom-up approach to determine the movement
Preattentive processing asks in part, “What visual prop-                           of attention and the eyes. It has now become almost rou-
erties draw our eyes, and therefore our focus of attention                         tine in the eye tracking literature to use the Itti and Koch
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.


IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS                                                                                                           9



model as a benchmark for bottom-up saliency, against
which the top-down cognitive in uences on visual selec-
tion and eye tracking can be assessed (e.g., [7], [81], [82],
                                                                                                  GREEN                                     GREEN
[83], [84]). For example, in an analysis of gaze during                                          VERTICAL                                  VERTICAL
everyday activities, xations are made in the service of
locating objects and performing manual actions on them,
rather than on the basis of object distinctiveness [85]. A
very readable history of the technology involved in eye
tracking is given in Wade and Tatler [86].                                                                               (a)
   Other theorists have tried to use the pattern of eye
movements during scene viewing as a direct index of the
cognitive in uences on scene perception. For example,
in the scanpath theory of Stark [5], [87], [88], [89],                                                                                      WHITE
the saccades and xations made during initial viewing                                                                                       OBLIQUE
become part of the lasting memory trace of a scene. Thus,
according to this theory, the xation sequences during
initial viewing and then later recognition of the same
scene should be similar. Much research has con rmed
                                                                                                                         (b)
that there are correlations between the scanpaths of
initial and subsequent viewings. Yet, at the same time,                            Fig. 11. Search for color-and-shape conjunction targets:
there seem to be no negative effects on scene memory                               (a) text identifying the target is shown, followed by the
when scanpaths differ between views [90].                                          scene, green vertical target is present; (b) a preview
   One of the most profound demonstrations that eye                                is shown, followed by text identifying the target, white
gaze and perception were not one and the same was rst                              oblique target is absent [80]
reported by Grimes [91]. He tracked the eyes of viewers
examining natural photographs in preparation for a later
memory test. On some occasions he would make large                                    Wolfe believed that if multiple objects are recognized
changes to the photos during the brief period—20–40                                simultaneously in the low-level visual system, it must
msec—in which a saccade was being made from one                                    involve a search for links between the objects and their
location to another in the photo. He was shocked to nd                             representation in long-term memory (LTM). LTM can be
that when two people in a photo changed clothing, or                               queried nearly instantaneously, compared to the 40–50
even heads, during a saccade, viewers were often blind                             msec per item needed to search a scene or to access short-
to these changes, even when they had recently xated                                term memory. Preattentive processing can rapidly draw
the location of the changed features directly.                                     the focus of attention to a target object, so little or no
   Clearly, the eyes are not a direct window to the                                searching is required. To remove this assistance, Wolfe
soul. Research on eye tracking has shown repeatedly                                designed targets with two properties (Fig. 11):
that merely tracking the eyes of a viewer during scene                                1) Targets are formed from a conjunction of features—
perception provides no privileged access to the cogni-                                   they cannot be detected preattentively.
tive processes undertaken by the viewer. Researchers                                  2) Targets are arbitrary combinations of colors and
studying the top-down contributions to perception have                                   shapes—they cannot be semantically recognized
therefore established methodologies in which the role                                    and remembered.
of memory and expectation can be studied through                                      Wolfe initially tested two search types:
more indirect methods. In the sections that follow, we                                1) Traditional search. Text on a blank screen described
present ve laboratory procedures that have been devel-                                   the target. This was followed by a display contain-
oped speci cally for this purpose: postattentive amnesia,                                ing 4–8 potential target formed by combinations of
memory-guided search, change blindness, inattentional                                    colors and shapes in a 3 × 3 array (Fig. 11a).
blindness, and attentional blink. Understanding what we                               2) Postattentive search. The display was shown to the
are thinking, remembering, and expecting as we look at                                   viewer for up to 300 msec. Text describing the
different parts of a visualization is critical to designing                              target was then inserted into the scene (Fig. 11b).
visualizations that encourage locating and retaining the                              Results showed that the preview provided no advan-
information that is most important to the viewer.                                  tage. Postattentive search was as slow (or slower) than
                                                                                   the traditional search, with approximately 25–40 msec
4.2 Postattentive Amnesia                                                          per object required for target present trials. This has a
Wolfe conducted a study to determine whether showing                               signi cant potential impact for visualization design. In
viewers a scene prior to searching it would improve                                most cases visualization displays are novel, and their
their ability to locate targets [80]. Intuitively, one might                       contents cannot be committed to LTM. If studying a
assume that seeing the scene in advance would help with                            display offers no assistance in searching for speci c data
target detection. Wolfe’s results suggest this is not true.                        values, then preattentive methods that draw attention to
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.


IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS                                                                                                         10


                                                                                                                                     Wait 950 msec
areas of potential interest are critical for ef cient data
exploration.                                                                                                         Look 100 msec



4.3 Attention Guided by Memory and Prediction
Although research on postattentive amnesia suggests
that there are few, if any, advantages from repeated view-
ing of a display, several more recent ndings suggest
there are important bene ts of memory during search.
Interestingly, all of these bene ts seem to occur outside
of the conscious awareness of the viewer.
   In the area of contextual cuing [92], [93], viewers                                                              Wait 950 msec
  nd a target more rapidly for a subset of the displays                               Look 100 msec
that are presented repeatedly—but in a random order—
versus other displays that are presented for the rst                               Fig. 12. A rapid responses re-display trial, viewers are
time. Moreover, when tested after the search task was                              asked to report the color of the T target, two separate
completed, viewers showed no conscious recollection or                             displays must be searched [97]
awareness that some of the displays were repeated or
that their search speed bene ted from these repetitions.
Contextual cuing appears to involve guiding attention                              one with red elements, the other with blue elements
to a target by subtle regularities in the past experience                          (Fig. 12). Viewers were asked to identify the color of the
of a viewer. This means that attention can be affected by                          target T—that is, to determine whether either of the two
incidental knowledge about global context, in particular,                          displays contained a T. Here, viewers are forced to stop
the spatial relations between the target and non-target                            one search and initiate another as the display changes.
items in a given display. Visualization might be able to                           As before, extremely fast responses were observed for
harness such incidental spatial knowledge of a scene by                            displays that were re-presented.
tracking both the number of views and the time spent                                 The interpretation that the rapid responses re ected
viewing images that are later re-examined by the viewer.                           perceptual predictions—as opposed to easy access to
   A second line of research documents the unconscious                             memory of the scene—was based on two crucial ndings
tendency of viewers to look for targets in novel loca-                             [98], [99]. The rst was the sheer speed at which a
tions in the display, as opposed to looking at locations                           search resumed after an interruption. Previous studies
that have already been examined. This phenomenon is                                on the bene ts of visual priming and short term memory
referred to as inhibition of return [94] and has been shown                        show responses that begin at least 500 msec after the
to be distinct from strategic in uences on search, such                            onset of a display. Correct responses in the 100–250 msec
as choosing consciously to search from left-to-right or                            range call for an explanation that goes beyond mere
moving out from the center in a clockwise direction [95].                          memory. The second nding was that rapid responses
   A nal area of research concerns the bene ts of re-                              depended critically on a participant’s ability to form
suming a visual search that has been interrupted by mo-                            implicit perceptual predictions about what they expected
mentarily occluding the display [96], [97]. Results show                           to see at a particular location in the display after it
that viewers can resume an interrupted search much                                 returned to view.
faster than they can start a new search. This suggests                               For visualization, rapid response suggests that a
that viewers bene t from implicit (i.e., unconscious)                              viewer’s domain knowledge may produce expectations
perceptual predictions they make about the target based                            based on the current display about where certain data
on the partial information acquired during the initial                             might appear in future displays. This in turn could
glimpse of a display.                                                              improve a viewer’s ability to locate important data.
   Rapid resumption was rst observed when viewers
were asked to search for a T among L-shapes [97].
Viewers were given brief looks at the display separated                            4.4 Change Blindness
by longer waits where the screen was blank. They easily                            Both postattentive amnesia and memory-guided search
found the target within a few glimpses of the display.                             agree that our visual system does not resemble the
A surprising result was the presence of many extremely                             relatively faithful and largely passive process of modern
fast responses after display re-presentation. Analysis re-                         photography. A much better metaphor for vision is that
vealed two different types of responses. The rst, which                            of a dynamic and ongoing construction project, where
occurred only during re-presentation, required 100–250                             the products being built are short-lived models of the ex-
msec. This was followed by a second, slower set of                                 ternal world that are speci cally designed for the current
responses that peaked at approximately 600 msec.                                   visually guided tasks of the viewer [100], [101], [102],
   To test whether search was being fully interrupted,                             [103]. There does not appear to be any general purpose
a second experiment showed two interleaved displays,                               vision. What we “see” when confronted with a new
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.


IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS                                                                                                         11



scene depends as much on our goals and expectations                                      switched to a completely different actor. Nearly
as it does on the light that enters our eyes.                                            two-thirds of the subjects failed to report that the
   These new ndings differ from the initial ideas of                                     main actor was replaced, instead describing him
preattentive processing: that only certain features are rec-                             using details from the initial actor.
ognized without the need for focused attention, and that                              3) Nothing is Stored. No details are represented
other features cannot be detected, even when viewers ac-                                 internally after a scene is abstracted. When we
tively search for them. More recent work in preattentive                                 need speci c details, we simply re-examine the
vision has shown that the visual differences between a                                   scene. We are blind to change unless it affects our
target and its neighbors, what a viewer is searching for,                                abstracted knowledge of the scene, or unless it
and how the image is presented can all have an effect on                                 occurs where we are looking.
search performance. For example, Wolfe’s guided search                                4) Everything is Stored, Nothing is Compared. De-
theory assumes both bottom-up (i.e., preattentive) and                                   tails about a scene are stored, but cannot be ac-
top-down (i.e., attention-based) activation of features in                               cessed without an external stimulus. In one study,
an image [48], [61], [62]. Other researchers like Treisman                               an experimenter asks a pedestrian for directions
have also studied the dual effects of preattentive and                                   [103]. During this interaction, a group of students
attention-driven demands on what the visual system                                       walks between the experimenter and the pedes-
sees [45], [46]. Wolfe’s discussion of postattentive am-                                 trian, surreptitiously taking a basketball the exper-
nesia points out that details of an image cannot be re-                                  imenter is holding. Only a very few pedestrians
membered across separate scenes except in areas where                                    reported that the basketball had gone missing,
viewers have focused their attention [80].                                               but when asked speci cally about something the
   New research in psychophysics has shown that an                                       experimenter was holding, more than half of the
interruption in what is being seen—a blink, an eye sac-                                  remaining subjects remembered the basketball, of-
cade, or a blank screen—renders us “blind” to signi cant                                 ten providing a detailed description.
changes that occur in the scene during the interruption.                              5) Feature Combination. Details from an initial view
This change blindness phenomena can be illustrated us-                                   and the current view are merged to form a com-
ing a task similar to one shown in comic strips for many                                 bined representation of the scene. Viewers are not
years [101], [102], [103], [104]. Fig. 13 shows three pairs                              aware of which parts of their mental image come
of images. A signi cant difference exists between each                                   from which scene.
image pair. Many viewers have a dif cult time seeing                                  Interestingly, none of the explanations account for all
any difference and often have to be coached to look                                of the change blindness effects that have been identi ed.
carefully to nd it. Once they discover it, they realize that                       This suggests that some combination of these ideas—
the difference was not a subtle one. Change blindness                              or some completely different hypothesis—is needed to
is not a failure to see because of limited visual acuity;                          properly model the phenomena.
rather, it is a failure based on inappropriate attentional                            Simons and Rensink recently revisited the area of
guidance. Some parts of the eye and the brain are clearly                          change blindness [107]. They summarize much of the
responding differently to the two pictures. Yet, this does                         work-to-date, and describe important research issues
not become part of our visual experience until attention                           that are being studied using change blindness experi-
is focused directly on the objects that vary.                                      ments. For example, evidence shows that attention is re-
   The presence of change blindness has important im-                              quired to detect changes, although attention alone is not
plications for visualization. The images we produce are                            necessarily suf cient [108]. Changes to attended objects
normally novel for our viewers, so existing knowledge                              can also be missed, particularly when the changes are
cannot be used to guide their analyses. Instead, we strive                         unexpected. Changes to semantically important objects
to direct the eye, and therefore the mind, to areas of                             are detected faster than changes elsewhere [104]. Low-
interest or importance within a visualization. This ability                        level object properties of the same kind (e.g., color or
forms the rst step towards enabling a viewer to abstract                           size) appear to compete for recognition in visual short-
details that will persist over subsequent images.                                  term memory, but different properties seem to be en-
   Simons offers a wonderful overview of change blind-                             coded separately and in parallel [109]—similar in some
ness, together with some possible explanations [103].                              ways to Treisman’s original feature integration theory
   1) Overwriting. The current image is overwritten by                             [21]. Finally, experiments suggest the locus of attention
      the next, so information that is not abstracted from                         is distributed symmetrically around a viewer’s xation
      the current image is lost. Detailed changes are only                         point [110].
      detected at the focus of attention.                                             Simons and Rensink also described hypotheses that
   2) First Impression. Only the initial view of a scene                           they felt are not supported by existing research. For
      is abstracted, and if the scene is not perceived to                          example, many people have used change blindness to
      have changed, it is not re-encoded. One example                              suggest that our visual representation of a scene is
      of rst impression is an experiment by Levins and                             sparse, or altogether absent. Four hypothetical models of
      Simon where subjects viewed a short movie [105],                             vision were presented that include detailed representa-
      [106]. During a cut scene, the central character was                         tions of a scene, while still allowing for change blindness.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.


IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS                                                                                                         12




                                            (a)                                                                    (b)




                                            (c)                                                                    (d)




                                            (e)                                                                    (f)
Fig. 13. Change blindness, a major difference exists between each pair of images; (a–b) object added/removed; (c–d)
color change; (e–f) luminance change


A detailed representation could rapidly decay, making                              on this subject were conducted by Mack and Rock [101].
it unavailable for future comparisons; a representation                            Viewers were shown a cross at the center of xation and
could exist in a pathway that is not accessible to the                             asked to report which arm was longer. After a very small
comparison operation; a representation could exist and                             number of trials (two or three) a small “critical” object
be accessible, but not be in a format that supports the                            was randomly presented in one of the quadrants formed
comparison operation; or an appropriate representation                             by the cross. After answering which arm was longer,
could exist, but the comparison operation is not applied                           viewers were then asked, “Did you see anything else
even though it could be.                                                           on the screen besides the cross?” Approximately 25% of
                                                                                   the viewers failed to report the presence of the critical
                                                                                   object. This was surprising, since in target detection
4.5 Inattentional Blindness
                                                                                   experiments (e.g., Figs. 1a–d) the same critical objects
A related phenomena called inattentional blindness sug-                            are identi ed with close to 100% accuracy.
gests that viewers can completely fail to perceive visually
salient objects or activities. Some of the rst experiments                            These unexpected results led Mack and Rock to mod-
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.


IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS                                                                                                         13



                                                                                   tional blindness can be sustained over longer durations
                                                                                   [111]. Neisser’s experiment superimposed video streams
                                                                                   of two basketball games [112]. Players wore white shirts
                                                                                   in one stream and black shirts in the other. Subjects at-
                                                                                   tended to one team—either white or black—and ignored
                                                                                   the other. Whenever the subject’s team made a pass, they
                                                                                   were told to press a key. After about 30 seconds of video,
                                                                                   a third stream was superimposed showing a woman
                                                                                   walking through the scene with an open umbrella. The
                                                                                   stream was visible for about 4 seconds, after which
                                                                                   another 25 seconds of basketball video was shown.
                                                                                   Following the trial, only a small number of observers
                                                                                   reported seeing the woman. When subjects only watched
                                                                                   the screen and did not count passes, 100% noticed the
                                                                                   woman.
Fig. 14. Images from Simons and Chabris’s inattentional
                                                                                      Simons and Chabris controlled three conditions during
blindness experiments, showing both superimposed and
                                                                                   their experiment. Two video styles were shown: three
single-stream video frames containing a woman with an
                                                                                   superimposed video streams where the actors are semi-
umbrella, and a woman in a gorilla suit [111]
                                                                                   transparent, and a single stream where the actors are
                                                                                     lmed together. This tests to see if increased realism af-
                                                                                   fects awareness. Two unexpected actors were also used:
ify their experiment. Following the critical trial another                         a woman with an umbrella, and a woman in a gorilla
two or three noncritical trials were shown—again asking                            suit. This studies how actor similarity changes awareness
viewers to identify the longer arm of the cross—followed                           (Fig. 14). Finally, two types of tasks were assigned to
by a second critical trial and the same question, “Did                             subjects: maintain one count of the bounce passes your
you see anything else on the screen besides the cross?”                            team makes, or maintain two separate counts of the
Mack and Rock called these divided attention trials. The                           bounce passes and the aerial passes your team makes.
expectation is that after the initial query viewers will                           This varies task dif culty to measure its impact on
anticipate being asked this question again. In addition                            awareness.
to completing the primary task, they will also search for
                                                                                      After the video, subjects wrote down their counts,
a critical object. In the nal set of displays viewers were
                                                                                   and were then asked a series of increasingly speci c
told to ignore the cross and focus entirely on identifying
                                                                                   questions about the unexpected actor, starting with “Did
whether a critical object appears in the scene. Mack and
                                                                                   you notice anything unusual?” to “Did you see a go-
Rock called these full attention trials, since a viewer’s
                                                                                   rilla/woman carrying an umbrella?” About half of the
entire attention is directed at nding critical objects.
                                                                                   subjects tested failed to notice the unexpected actor,
   Results showed that viewers were signi cantly better                            demonstrating sustained inattentional blindness in a dy-
at identifying critical objects in the divided attention tri-                      namic scene. A single stream video, a single count task,
als, and were nearly 100% accurate during full attention                           and a woman actor all made the task easier, but in every
trials. This con rmed that the critical objects were salient                       case at least one-third of the observers were blind to the
and detectable under the proper conditions.                                        unexpected event.
   Mack and Rock also tried placing the cross in the pe-
riphery and the critical object at the xation point. They
assumed this would improve identifying critical trials,                            4.6 Attentional Blink
but in fact it produced the opposite effect. Identi cation                         In each of the previous methods for studying visual
rates dropped to as low as 15%. This emphasizes that                               attention, the primary emphasis is on how human at-
subjects can fail to see something, even when it is directly                       tention is limited in its ability to represent the details of
in their eld of vision.                                                            a scene, and in its ability to represent multiple objects
   Mack and Rock hypothesized that “there is no per-                               at the same time. But attention is also severely limited
ception without attention.” If you do not attend to an                             in its ability to process information that arrives in quick
object in some way, you may not perceive it at all.                                succession, even when that information is presented at
This suggestion contradicts the belief that objects are                            a single location in space.
organized into elementary units automatically and prior                               Attentional blink is currently the most widely used
to attention being activated (e.g., Gestalt theory). If atten-                     method to study the availability of attention across time.
tion is intentional, without objects rst being perceived                           Its name—“blink”—derives from the nding that when
there is nothing to focus attention on. Mack and Rock’s                            two targets are presented in rapid succession, the second
experiments suggest this may not be true.                                          of the two targets cannot be detected or identi ed when
   More recent work by Simons and Chabris recreated a                              it appears within approximately 100–500 msec following
classic study by Neisser to determine whether inatten-                             the rst target [113], [114].
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.


IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS                                                                                                         14



   In a typical experiment, visual items such as words                             visualization and in psychophysics. For example, exper-
or pictures are shown in a rapid serial presentation at a                          iments conducted by the authors on perceiving orien-
single location. Raymond et al. [114] asked participants                           tation led to a visualization technique for multivalued
to identify the only white letter ( rst target) in a 10-item                       scalar elds [19], and to a new theory on how targets
per second stream of black letters (distractors), then to                          are detected and localized in cognitive vision [117].
report whether the letter “X” (second target) occurred
in the subsequent letter stream. The second target was
                                                                                   5.1 Visual Attention
present in 50% of trials and, when shown, appeared
at random intervals after the rst target ranging from                              Understanding visual attention is important, both in
100–800 msec. Reports of both targets were required                                visualization and in graphics. The proper choice of visual
after the stimulus stream ended. The attentional blink                             features will draw the focus of attention to areas in a
is de ned as having occurred when the rst target is                                visualization that contain important data, and correctly
reported correctly, but the second target is not. This                             weight the perceptual strength of a data element based
usually happens for temporal lags between targets of                               on the attribute values it encodes. Tracking attention can
100–500 msec. Accuracy recovers to a normal baseline                               be used to predict where a viewer will look, allowing
level at longer intervals.                                                         different parts of an image to be managed based on the
   Curiously, when the second target is presented im-                              amount of attention they are expected to receive.
mediately following the rst target (i.e., with no delay                               Perceptual Salience. Building a visualization often
between the two targets), reports of the second target are                         begins with a series of basic questions, “How should I
quite accurate [115]. This suggests that attention operates                        represent the data? How can I highlight important data
over time like a window or gate, opening in response to                            values when they appear? How can I ensure that viewers
  nding a visual item that matches its current criterion                           perceive differences in the data accurately?” Results from
and then closing shortly thereafter to consolidate that                            research on visual attention can be used to assign visual
item as a distinct object. The attentional blink is therefore                      features to data values in ways that satisfy these needs.
an index of the “dwell time” needed to consolidate                                    A well known example of this approach is the design
a rapidly presented visual item into visual short term                             of colormaps to visualize continuous scalar values. The
memory, making it available for conscious report [116].                            vision models agree that properties of color are preat-
   Change blindness, inattentional blindness, and atten-                           tentive. They do not, however, identify the amount of
tional blink have important consequences for visualiza-                            color difference needed to produce distinguishable col-
tion. Signi cant changes in the data may be missed                                 ors. Follow-on studies have been conducted by visualiza-
if attention is fully deployed or focused on a speci c                             tion researchers to measure this difference. For example,
location in a visualization. Attending to data elements                            Ware ran experiments that asked a viewer to distinguish
in one frame of an animation may render us temporarily                             individual colors and shapes formed by colors. He used
blind to what follows at that location. These issues must                          his results to build a colormap that spirals up the lumi-
be considered during visualization design.                                         nance axis, providing perceptual balance and controlling
                                                                                   simultaneous contrast error [118]. Healey conducted a
                                                                                   visual search experiment to determine the number of col-
5   V ISUALIZATION             AND      G RAPHICS                                  ors a viewer can distinguish simultaneously. His results
How should researchers in visualization and graphics                               showed that viewers can rapidly choose between up to
choose between the different vision models? In psy-                                seven isoluminant colors [119]. Kuhn et al. used results
chophysics, the models do not compete with one another.                            from color perception experiments to recolor images in
Rather, they build on top of one another to address                                ways that allow colorblind viewers to properly perceive
common problems and new insights over time. The                                    color differences [120]. Other visual features have been
models differ in terms of why they were developed, and                             studied in a similar fashion, producing guidelines on
in how they explain our eye’s response to visual stimulae.                         the use of texture—size, orientation, and regularity [121],
Yet, despite this diversity, the models usually agree on                           [122], [123]—and motion— icker, direction, and velocity
which visual features we can attend to. Given this, we                             [38], [124]—for visualizing data.
recommend considering the most recent models, since                                   An alternative method for measuring image salience
these are the most comprehensive.                                                  is Daly’s visible differences predictor, a more physically-
   A related question asks how well a model ts our                                 based approach that uses light level, spatial frequency,
needs. For example, the models identify numerous visual                            and signal content to de ne a viewer’s sensitivity at
features as preattentive, but they may not de ne the                               each image pixel [125]. Although Daly used his metric
difference needed to produce distinguishable instances                             to compare images, it could also be applied to de ne
of a feature. Follow-on experiments are necessary to                               perceptual salience within an visualization.
extend the ndings for visualization design.                                           Another important issue, particularly for multivari-
   Finally, although vision models have proven to be                               ate visualization, is feature interference. One common
surprisingly robust, their predictions can fail. Identifying                       approach visualizes each data attribute with a separate
these situations often leads to new research, both in                              visual feature. This raises the question, “Will the visual
Goutham2
Goutham2
Goutham2
Goutham2
Goutham2
Goutham2

More Related Content

What's hot

総研大講義 「#8 盲視の脳内機構」レジメ
総研大講義 「#8 盲視の脳内機構」レジメ総研大講義 「#8 盲視の脳内機構」レジメ
総研大講義 「#8 盲視の脳内機構」レジメMasatoshi Yoshida
 
Klingner Fixation Aligned Pupillary Response Averaging
Klingner Fixation Aligned Pupillary Response AveragingKlingner Fixation Aligned Pupillary Response Averaging
Klingner Fixation Aligned Pupillary Response AveragingKalle
 
Charlie chung
Charlie chungCharlie chung
Charlie chungCOT SSNP
 
Diversity in Learning: Teaching Practices and Educational Policies that Impac...
Diversity in Learning: Teaching Practices and Educational Policies that Impac...Diversity in Learning: Teaching Practices and Educational Policies that Impac...
Diversity in Learning: Teaching Practices and Educational Policies that Impac...EduSkills OECD
 
総研大講義 「#4 注意と運動」レジメ
総研大講義 「#4 注意と運動」レジメ総研大講義 「#4 注意と運動」レジメ
総研大講義 「#4 注意と運動」レジメMasatoshi Yoshida
 
駒場学部講義 総合情報学特論III 「意識の科学的研究 - 盲視を起点に」
駒場学部講義 総合情報学特論III 「意識の科学的研究 - 盲視を起点に」駒場学部講義 総合情報学特論III 「意識の科学的研究 - 盲視を起点に」
駒場学部講義 総合情報学特論III 「意識の科学的研究 - 盲視を起点に」Masatoshi Yoshida
 
Natural Rationality: Beyond Bounded and Ecological Rationality
Natural Rationality: Beyond Bounded and Ecological RationalityNatural Rationality: Beyond Bounded and Ecological Rationality
Natural Rationality: Beyond Bounded and Ecological RationalityBenoit Hardy-Vallée, Ph.D.
 
Design as Worldmaking by Bas Leurs
Design as Worldmaking by Bas Leurs Design as Worldmaking by Bas Leurs
Design as Worldmaking by Bas Leurs medialabamsterdam
 
Toward Tractable AGI: Challenges for System Identification in Neural Circuitry
Toward Tractable AGI: Challenges for System Identification in Neural CircuitryToward Tractable AGI: Challenges for System Identification in Neural Circuitry
Toward Tractable AGI: Challenges for System Identification in Neural CircuitryRandal Koene
 
Assessmet power point handout
Assessmet power point handoutAssessmet power point handout
Assessmet power point handoutLeoncio Lumaban
 
President gogue no movie 2011 10 28
President gogue no movie 2011 10 28President gogue no movie 2011 10 28
President gogue no movie 2011 10 28aucompbiker
 

What's hot (20)

Visual perceptual assessment
Visual perceptual assessmentVisual perceptual assessment
Visual perceptual assessment
 
総研大講義 「#8 盲視の脳内機構」レジメ
総研大講義 「#8 盲視の脳内機構」レジメ総研大講義 「#8 盲視の脳内機構」レジメ
総研大講義 「#8 盲視の脳内機構」レジメ
 
Human vision
Human visionHuman vision
Human vision
 
Klingner Fixation Aligned Pupillary Response Averaging
Klingner Fixation Aligned Pupillary Response AveragingKlingner Fixation Aligned Pupillary Response Averaging
Klingner Fixation Aligned Pupillary Response Averaging
 
Recognition at end of Year 1
Recognition at end of Year 1Recognition at end of Year 1
Recognition at end of Year 1
 
Charlie chung
Charlie chungCharlie chung
Charlie chung
 
Diversity in Learning: Teaching Practices and Educational Policies that Impac...
Diversity in Learning: Teaching Practices and Educational Policies that Impac...Diversity in Learning: Teaching Practices and Educational Policies that Impac...
Diversity in Learning: Teaching Practices and Educational Policies that Impac...
 
総研大講義 「#4 注意と運動」レジメ
総研大講義 「#4 注意と運動」レジメ総研大講義 「#4 注意と運動」レジメ
総研大講義 「#4 注意と運動」レジメ
 
W10 visual space and time
W10 visual space and timeW10 visual space and time
W10 visual space and time
 
駒場学部講義 総合情報学特論III 「意識の科学的研究 - 盲視を起点に」
駒場学部講義 総合情報学特論III 「意識の科学的研究 - 盲視を起点に」駒場学部講義 総合情報学特論III 「意識の科学的研究 - 盲視を起点に」
駒場学部講義 総合情報学特論III 「意識の科学的研究 - 盲視を起点に」
 
Visual perceptual
Visual perceptualVisual perceptual
Visual perceptual
 
Natural Rationality: Beyond Bounded and Ecological Rationality
Natural Rationality: Beyond Bounded and Ecological RationalityNatural Rationality: Beyond Bounded and Ecological Rationality
Natural Rationality: Beyond Bounded and Ecological Rationality
 
Design as Worldmaking by Bas Leurs
Design as Worldmaking by Bas Leurs Design as Worldmaking by Bas Leurs
Design as Worldmaking by Bas Leurs
 
Where is my mind?
Where is my mind?Where is my mind?
Where is my mind?
 
Toward Tractable AGI: Challenges for System Identification in Neural Circuitry
Toward Tractable AGI: Challenges for System Identification in Neural CircuitryToward Tractable AGI: Challenges for System Identification in Neural Circuitry
Toward Tractable AGI: Challenges for System Identification in Neural Circuitry
 
Lesson 17
Lesson 17Lesson 17
Lesson 17
 
Piazza 1 lecture
Piazza 1 lecturePiazza 1 lecture
Piazza 1 lecture
 
Assessmet power point handout
Assessmet power point handoutAssessmet power point handout
Assessmet power point handout
 
President gogue no movie 2011 10 28
President gogue no movie 2011 10 28President gogue no movie 2011 10 28
President gogue no movie 2011 10 28
 
Heart & Mind
Heart & MindHeart & Mind
Heart & Mind
 

Similar to Goutham2

International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...
Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...
Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...Kalle
 
Consciousness & neuroscience francis crick & christof koch
Consciousness & neuroscience francis crick & christof kochConsciousness & neuroscience francis crick & christof koch
Consciousness & neuroscience francis crick & christof kochnjqtpie86
 
Visual perceptual assessment
Visual perceptual assessmentVisual perceptual assessment
Visual perceptual assessmentDharmendra Kumar
 
Behind Eyetracking: How the Brain Utilizies Its Eyes (Dixon Cleveland)
Behind Eyetracking: How the Brain Utilizies Its Eyes (Dixon Cleveland)Behind Eyetracking: How the Brain Utilizies Its Eyes (Dixon Cleveland)
Behind Eyetracking: How the Brain Utilizies Its Eyes (Dixon Cleveland)uxpa-dc
 
UVC100_Fall16_Class3.1
UVC100_Fall16_Class3.1UVC100_Fall16_Class3.1
UVC100_Fall16_Class3.1Jennifer Burns
 
Contents lists available at ScienceDirectBrain and Cogniti.docx
Contents lists available at ScienceDirectBrain and Cogniti.docxContents lists available at ScienceDirectBrain and Cogniti.docx
Contents lists available at ScienceDirectBrain and Cogniti.docxdonnajames55
 
User Experience 2: Psychology Concepts
User Experience 2: Psychology ConceptsUser Experience 2: Psychology Concepts
User Experience 2: Psychology ConceptsMarc Miquel
 
Your Brain On Graphics: IA Summit 2011 (can download)
Your Brain On Graphics: IA Summit 2011 (can download)Your Brain On Graphics: IA Summit 2011 (can download)
Your Brain On Graphics: IA Summit 2011 (can download)Connie Malamed
 
Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks...
Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks...Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks...
Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks...Kalle
 
19 3 sep17 21may 6657 t269 revised (edit ndit)
19 3 sep17 21may 6657 t269 revised (edit ndit)19 3 sep17 21may 6657 t269 revised (edit ndit)
19 3 sep17 21may 6657 t269 revised (edit ndit)IAESIJEECS
 
19 3 sep17 21may 6657 t269 revised (edit ndit)
19 3 sep17 21may 6657 t269 revised (edit ndit)19 3 sep17 21may 6657 t269 revised (edit ndit)
19 3 sep17 21may 6657 t269 revised (edit ndit)IAESIJEECS
 
Information Processing and Motor Skill Performance
Information Processing and Motor Skill PerformanceInformation Processing and Motor Skill Performance
Information Processing and Motor Skill PerformanceJohn John
 
Fundamentals of visual communication unit iii
Fundamentals of visual communication unit iiiFundamentals of visual communication unit iii
Fundamentals of visual communication unit iiiRangarajanN6
 
Growing evidence for separate neural mechanisms for attention and consciousne...
Growing evidence for separate neural mechanisms for attention and consciousne...Growing evidence for separate neural mechanisms for attention and consciousne...
Growing evidence for separate neural mechanisms for attention and consciousne...Nao (Naotsugu) Tsuchiya
 

Similar to Goutham2 (20)

Arai thesis
Arai thesisArai thesis
Arai thesis
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...
Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...
Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...
 
Ch06 Lecture Notes
Ch06 Lecture NotesCh06 Lecture Notes
Ch06 Lecture Notes
 
Consciousness & neuroscience francis crick & christof koch
Consciousness & neuroscience francis crick & christof kochConsciousness & neuroscience francis crick & christof koch
Consciousness & neuroscience francis crick & christof koch
 
Visual perceptual assessment
Visual perceptual assessmentVisual perceptual assessment
Visual perceptual assessment
 
Behind Eyetracking: How the Brain Utilizies Its Eyes (Dixon Cleveland)
Behind Eyetracking: How the Brain Utilizies Its Eyes (Dixon Cleveland)Behind Eyetracking: How the Brain Utilizies Its Eyes (Dixon Cleveland)
Behind Eyetracking: How the Brain Utilizies Its Eyes (Dixon Cleveland)
 
UVC100_Fall16_Class3.1
UVC100_Fall16_Class3.1UVC100_Fall16_Class3.1
UVC100_Fall16_Class3.1
 
Contents lists available at ScienceDirectBrain and Cogniti.docx
Contents lists available at ScienceDirectBrain and Cogniti.docxContents lists available at ScienceDirectBrain and Cogniti.docx
Contents lists available at ScienceDirectBrain and Cogniti.docx
 
User Experience 2: Psychology Concepts
User Experience 2: Psychology ConceptsUser Experience 2: Psychology Concepts
User Experience 2: Psychology Concepts
 
Your Brain On Graphics: IA Summit 2011 (can download)
Your Brain On Graphics: IA Summit 2011 (can download)Your Brain On Graphics: IA Summit 2011 (can download)
Your Brain On Graphics: IA Summit 2011 (can download)
 
Research Experience
Research ExperienceResearch Experience
Research Experience
 
Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks...
Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks...Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks...
Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks...
 
19 3 sep17 21may 6657 t269 revised (edit ndit)
19 3 sep17 21may 6657 t269 revised (edit ndit)19 3 sep17 21may 6657 t269 revised (edit ndit)
19 3 sep17 21may 6657 t269 revised (edit ndit)
 
19 3 sep17 21may 6657 t269 revised (edit ndit)
19 3 sep17 21may 6657 t269 revised (edit ndit)19 3 sep17 21may 6657 t269 revised (edit ndit)
19 3 sep17 21may 6657 t269 revised (edit ndit)
 
Thesis Carohorn
Thesis CarohornThesis Carohorn
Thesis Carohorn
 
Information Processing and Motor Skill Performance
Information Processing and Motor Skill PerformanceInformation Processing and Motor Skill Performance
Information Processing and Motor Skill Performance
 
Fundamentals of visual communication unit iii
Fundamentals of visual communication unit iiiFundamentals of visual communication unit iii
Fundamentals of visual communication unit iii
 
Growing evidence for separate neural mechanisms for attention and consciousne...
Growing evidence for separate neural mechanisms for attention and consciousne...Growing evidence for separate neural mechanisms for attention and consciousne...
Growing evidence for separate neural mechanisms for attention and consciousne...
 
SP16UVC2.1
SP16UVC2.1SP16UVC2.1
SP16UVC2.1
 

Goutham2

  • 1. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 1 Attention and Visual Memory in Visualization and Computer Graphics Christopher G. Healey, Senior Member, IEEE, and James. T. Enns Abstract—A fundamental goal of visualization is to produce images of data that support visual analysis, exploration, and discovery of novel insights. An important consideration during visualization design is the role of human visual perception. How we “see” details in an image can directly impact a viewer’s ef ciency and effectiveness. This article surveys research on attention and visual perception, with a speci c focus on results that have direct relevance to visualization and visual analytics. We discuss theories of low-level visual perception, then show how these ndings form a foundation for more recent work on visual memory and visual attention. We conclude with a brief overview of how knowledge of visual attention and visual memory is being applied in visualization and graphics. We also discuss how challenges in visualization are motivating research in psychophysics. Index Terms—Attention, color, motion, nonphotorealism, texture, visual memory, visual perception, visualization. ✦ 1 I NTRODUCTION attention is focused and what is already in our minds prior to viewing an image. Finally, we describe sev- H UMAN perception plays an important role in the area of visualization. An understanding of per- ception can signi cantly improve both the quality and eral studies in which human perception has in uenced the development of new methods in visualization and the quantity of information being displayed [1]. The graphics. importance of perception was cited by the NSF panel on graphics and image processing that proposed the term 2 P REATTENTIVE P ROCESSING “scienti c visualization” [2]. The need for perception was again emphasized during recent DOE–NSF and For many years vision researchers have been investigat- DHS panels on directions for future research in visu- ing how the human visual system analyzes images. One alization [3]. feature of human vision that has an impact on almost This document summarizes some of the recent de- all perceptual analyses is that, at any given moment de- velopments in research and theory regarding human tailed vision for shape and color is only possible within psychophysics, and discusses their relevance to scien- a small portion of the visual eld (i.e., an area about the ti c and information visualization. We begin with an size of your thumbnail when viewed at arm’s length). overview of the way human vision rapidly and au- In order to see detailed information from more than tomatically categorizes visual images into regions and one region, the eyes move rapidly, alternating between properties based on simple computations that can be brief stationary periods when detailed information is made in parallel across an image. This is often referred acquired—a xation—and then icking rapidly to a new to as preattentive processing. We describe ve theories of location during a brief period of blindness—a saccade. preattentive processing, and brie y discuss related work This xation–saccade cycle repeats 3–4 times each second on ensemble coding and feature hierarchies. We next of our waking lives, largely without any awareness examine several recent areas of research that focus on the on our part [4], [5], [6], [7]. The cycle makes seeing critical role that the viewer’s current state of mind plays highly dynamic. While bottom-up information from each in determining what is seen, speci cally, postattentive xation is in uencing our mental experience, our current amnesia, memory-guided attention, change blindness, mental states—including tasks and goals—are guiding inattentional blindness, and the attentional blink. These saccades in a top-down fashion to new image locations phenomena offer a perspective on early vision that is for more information. Visual attention is the umbrella quite different from the older view that early visual pro- term used to denote the various mechanisms that help cesses are re exive and in exible. Instead, they highlight determine which regions of an image are selected for the fact that what we see depends critically on where more detailed analysis. For many years, the study of visual attention in hu- • C. G. Healey is with the Department of Computer Science, North Carolina mans focused on the consequences of selecting an object or State University, Raleigh, NC, 27695. location for more detailed processing. This emphasis led E-mail: healey@csc.ncsu.edu • J. T. Enns is with the Department of Psychology, University of British to theories of attention based on a variety of metaphors Columbia, Vancouver, BC, Canada, V6T 1Z4. to account for the selective nature of perception, in- E-mail: jenns@psych.ubc.ca cluding ltering and bottlenecks [8], [9], [10], limited resources and limited capacity [11], [12], [13], and mental Digital Object Indentifier 10.1109/TVCG.2011.127 1077-2626/11/$26.00 © 2011 IEEE
  • 2. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2 effort or cognitive skill [14], [15]. An important discovery in these early studies was the identi cation of a limited set of visual features that are detected very rapidly by low-level, fast-acting visual processes. These properties were initially called preattentive, since their detection seemed to precede focused attention, occurring within the brief period of a single xation. We now know that attention plays a critical role in what we see, even at this early stage of vision. The term preattentive continues to be used, however, for its intuitive notion of the speed (a) (b) and ease with which these properties are identi ed. Typically, tasks that can be performed on large multi- element displays in less than 200–250 milliseconds (msec) are considered preattentive. Since a saccade takes at least 200 msec to initiate, viewers complete the task in a single glance. An example of a preattentive task is the detection of a red circle in a group of blue circles (Figs. 1a–b). The target object has a visual property “red” that the blue distractor objects do not. A viewer can easily determine whether the target is present or absent. Hue is not the only visual feature that is preattentive. (c) (d) In Figs. 1c–d the target is again a red circle, while the distractors are red squares. Here, the visual system identi es the target through a difference in curvature. A target de ned by a unique visual property—a red hue in Figs. 1a–b, or a curved form in Figs. 1c–d—allows it to “pop out” of a display. This implies that it can be easily detected, regardless of the number of distractors. In contrast to these effortless searches, when a target is de ned by the joint presence of two or more visual prop- erties it often cannot be found preattentively. Figs. 1e– (e) (f) f show an example of these more dif cult conjunction Fig. 1. Target detection: (a) hue target red circle absent; searches. The red circle target is made up of two features: (b) target present; (c) shape target red circle absent; (d) red and circular. One of these features is present in each target present; (e) conjunction target red circle present; (f) of the distractor objects—red squares and blue circles. A target absent search for red items always returns true because there are red squares in each display. Similarly, a search for circular items always sees blue circles. Numerous studies ary between two groups of elements, where all of have shown that most conjunction targets cannot be the elements in each group have a common visual detected preattentively. Viewers must perform a time- property (see Fig. 10), consuming serial search through the display to con rm • region tracking: viewers track one or more elements its presence or absence. with a unique visual feature as they move in time If low-level visual processes can be harnessed during and space, and visualization, they can draw attention to areas of poten- • counting and estimation: viewers count the number of tial interest in a display. This cannot be accomplished in elements with a unique visual feature. an ad-hoc fashion, however. The visual features assigned to different data attributes must take advantage of the strengths of our visual system, must be well-suited to the 3 T HEORIES OF P REATTENTIVE V ISION viewer’s analysis needs, and must not produce visual A number of theories attempt to explain how preat- interference effects (e.g., conjunction search) that mask tentive processing occurs within the visual system. We information. describe ve well-known models: feature integration, Fig. 2 lists some of the visual features that have been textons, similarity, guided search, and boolean maps. identi ed as preattentive. Experiments in psychology We next discuss ensemble coding, which shows that have used these features to perform the following tasks: viewers can generate summaries of the distribution of • target detection: viewers detect a target element with visual features in a scene, even when they are unable to a unique visual feature within a eld of distractor locate individual elements based on those same features. elements (Fig. 1), We conclude with feature hierarchies, which describe • boundary detection: viewers detect a texture bound- situations where the visual system favors certain visual
  • 3. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 3 orientation length closure size [16], [17], [18], [19] [20], [21] [16] [22], [23] curvature density number hue [21] [23] [20], [24], [25] [23], [26], [27], [28], [29] luminance intersections terminators 3D depth [21], [30], [31] [16] [16] [32], [33] icker direction of motion velocity of motion lighting direction [34], [35], [36], [37], [38] [33], [38], [39] [33], [38], [39], [40], [41] [42], [43] Fig. 2. Examples of preattentive visual features, with references to papers that investigated each feature’s capabilities features over others. Because we are interested equally focusing in particular on the image features that led to in where viewers attend in an image, as well as to what selective perception. She was inspired by a physiological they are attending, we will not review theories focusing nding that single neurons in the brains of monkeys exclusively on only one of these functions (e.g., the responded selectively to edges of a speci c orientation attention orienting theory of Posner and Petersen [44]). and wavelength. Her goal was to nd a behavioral consequence of these kinds of cells in humans. To do 3.1 Feature Integration Theory this, she focused on two interrelated problems. First, she Treisman was one of the rst attention researchers to sys- tried to determine which visual properties are detected tematically study the nature of preattentive processing, preattentively [21], [45], [46]. She called these properties
  • 4. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 4 individual feature maps red luminance green orientation yellow size blue contrast master map focus of attention Fig. 3. Treisman’s feature integration model of early vision—individual maps can be accessed in parallel to (a) (b) (c) detect feature activity, but focused attention is required to combine features at a common spatial location [22] Fig. 4. Textons: (a,b) two textons A and B that appear different in isolation, but have the same size, number of terminators, and join points; (c) a target group of B- “preattentive features” [47]. Second, she formulated a textons is dif cult to detect in a background of A-textons hypothesis about how the visual system performs preat- when random rotation is applied [49] tentive processing [22]. Treisman ran experiments using target and boundary detection to classify preattentive features (Figs. 1 and 10), has a unique feature, one can simply access the given measuring performance in two different ways: by re- feature map to see if any activity is occurring. Feature sponse time, and by accuracy. In the response time model maps are encoded in parallel, so feature detection is viewers are asked to complete the task as quickly as almost instantaneous. A conjunction target can only be possible while still maintaining a high level of accuracy. detected by accessing two or more feature maps. In The number of distractors in a scene is varied from few order to locate these targets, one must search serially to many. If task completion time is relatively constant through the master map of locations, looking for an and below some chosen threshold, independent of the object that satis es the conditions of having the correct number of distractors, the task is said to be preattentive combination of features. Within the model, this use of (i.e., viewers are not searching through the display to focused attention requires a relatively large amount of locate the target). time and effort. In the accuracy version of the same task, the display In later work, Treisman has expanded her strict di- is shown for a small, xed exposure duration, then chotomy of features being detected either in parallel or in removed. Again, the number of distractors in the scene serial [21], [45]. She now believes that parallel and serial varies across trials. If viewers can complete the task accu- represent two ends of a spectrum that include “more” rately, regardless of the number of distractors, the feature and “less,” not just “present” and “absent.” The amount used to de ne the target is assumed to be preattentive. of difference between the target and the distractors will Treisman and others have used their experiments to affect search time. For example, a long vertical line can compile a list of visual features that are detected preat- be detected immediately among a group of short vertical tentively (Fig. 2). It is important to note that some of lines, but a medium-length line may take longer to see. Treisman has also extended feature integration to ex- these features are asymmetric. For example, a sloped line plain situations where conjunction search involving mo- in a sea of vertical lines can be detected preattentively, tion, depth, color, and orientation have been shown to be but a vertical line in a sea of sloped lines cannot. preattentive [33], [39], [48]. Treisman hypothesizes that In order to explain preattentive processing, Treisman a signi cant target–nontarget difference would allow proposed a model of low-level human vision made up individual feature maps to ignore nontarget information. of a set of feature maps and a master map of locations Consider a conjunction search for a green horizontal bar (Fig. 3). Each feature map registers activity for a spe- within a set of red horizontal bars and green vertical ci c visual feature. When the visual system rst sees bars. If the red color map could inhibit information about an image, all the features are encoded in parallel into red horizontal bars, the search reduces to nding a green their respective maps. A viewer can access a particular horizontal bar in a sea of green vertical bars, which map to check for activity, and perhaps to determine the occurs preattentively. amount of activity. The individual feature maps give no information about location, spatial arrangement, or relationships to activity in other maps, however. 3.2 Texton Theory This framework provides a general hypothesis that Jul´ sz was also instrumental in expanding our under- e explains how preattentive processing occurs. If the target standing of what we “see” in a single xation. His start-
  • 5. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 5 ing point came from a dif cult computational problem in machine vision, namely, how to de ne a basis set for the perception of surface properties. Jul´ sz’s initial in- e vestigations focused on determining whether variations in order statistics were detected by the low-level visual system [37], [49], [50]. Examples included contrast—a rst-order statistic—orientation and regularity—second- (a) (b) order statistics—and curvature—a third-order statistic. Fig. 5. N-N similarity affecting search ef ciency for an Jul´ sz’s results were inconclusive. First-order variations e L-shaped target: (a) high N-N (nontarget-nontarget) sim- were detected preattentively. Some, but not all, second- ilarity allows easy detection of the target L; (b) low N-N order variations were also preattentive. as were an even similarity increases the dif culty of detecting the target L smaller set of third-order variations. [55] Based on these ndings, Jul´ sz modi ed his theory e to suggest that the early visual system detects three categories of features called textons [49], [51], [52]: • as N-N similarity decreases, search ef ciency de- 1) Elongated blobs—Lines, rectangles, or ellipses— creases and search time increases, and with speci c hues, orientations, widths, and so on. • T-N and N-N similarity are related; decreasing N-N 2) Terminators—ends of line segments. similarity has little effect if T-N similarity is low; 3) Crossings of line segments. increasing T-N similarity has little effect if N-N Jul´ sz believed that only a difference in textons or in e similarity is high. their density could be detected preattentively. No posi- Treisman’s feature integration theory has dif culty tional information about neighboring textons is available explaining Fig. 5. In both cases, the distractors seem without focused attention. Like Treisman, Jul´ sz sug- e to use exactly the same features as the target: oriented, gested that preattentive processing occurs in parallel and connected lines of a xed length. Yet experimental results focused attention occurs in serial. show displays similar to Fig. 5a produce an average Jul´ sz used texture segregation to demonstrate his the- e search time increase of 4.5 msec per distractor, versus ory. Fig. 4 shows an example of an image that supports 54.5 msec per distractor for displays similar to Fig. 5b. To the texton hypothesis. Although the two objects look explain this, Duncan and Humphreys proposed a three- very different in isolation, they are actually the same step theory of visual selection. texton. Both are blobs with the same height and width, 1) The visual eld is segmented in parallel into struc- made up of the same set of line segments with two termi- tural units that share some common property, for nators. When oriented randomly in an image, one cannot example, spatial proximity or hue. Structural units preattentively detect the texture boundary between the may again be segmented, producing a hierarchical target group and the background distractors. representation of the visual eld. 2) Access to visual short-term memory is a limited 3.3 Similarity Theory resource. During target search a template of the tar- Some researchers did not support the dichotomy of get’s properties is available.The closer a structural serial and parallel search modes. They noted that groups unit matches the template, the more resources it of neurons in the brain seemed to be competing over receives relative to other units with a poorer match. time to represent the same object. Work in this area 3) A poor match between a structural unit and the by Quinlan and Humphreys therefore began by inves- search template allows ef cient rejection of other tigating two separate factors in conjunction search [53]. units that are strongly grouped to the rejected unit. First, search time may depend on the number of items Structural units that most closely match the target of information required to identify the target. Second, template have the highest probability of access to visual search time may depend on how easily a target can short-term memory. Search speed is therefore a function be distinguished from its distractors, regardless of the of the speed of resource allocation and the amount presence of unique preattentive features. Follow-on work of competition for access to visual short-term memory. by Duncan and Humphreys hypothesized that search Given this, we can see how T-N and N-N similarity affect ability varies continuously, and depends on both the search ef ciency. Increased T-N similarity means more type of task and the display conditions [54], [55], [56]. structural units match the template, so competition for Search time is based on two criteria: T-N similarity and visual short-term memory access increases. Decreased N-N similarity. T-N similarity is the amount of similarity N-N similarity means we cannot ef ciently reject large between targets and nontargets. N-N similarity is the numbers of strongly grouped structural units, so re- amount of similarity within the nontargets themselves. source allocation time and search time increases. These two factors affect search time as follows: Interestingly, similarity theory is not the only attempt • as T-N similarity increases, search ef ciency de- to distinguish between preattentive and attentive results creases and search time increases, based on a single parallel process. Nakayama and his
  • 6. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 6 wn ou r -do col top green green green Y e B ag im G up R tio n m- tto nta bo orie steep steep steep right left steep shallow (a) (b) Fig. 6. Guided search for steep green targets, an image is ltered into categories for each feature map, bottom- up and top-down activation “mark” target regions, and an activation map combines the information to draw attention to the highest “hills” in the map [61] colleagues proposed the use of stereo vision and oc- clusion to segment a three-dimensional scene, where preattentive search could be performed independently (c) (d) within a segment [33], [57]. Others have presented sim- ilar theories that segment by object [58], or by signal Fig. 7. Boolean maps: (a) red and blue vertical and strength and noise [59]. The problem of distinguishing horizontal elements; (b) map for “red”, color label is red, serial from parallel processes in human cognition is one orientation label is unde ned; (c) map for “vertical”, orien- of the longest-standing puzzles in the eld, and one that tation label is vertical, color label is unde ned; (d) map for researchers often return to [60]. set intersection on “red” and “vertical” maps [64] 3.4 Guided Search Theory top-down in uence. A viewer’s attention is drawn from More recently, Wolfe et al. has proposed the theory hill to hill in order of decreasing activation. of “guided search” [48], [61], [62]. This was the rst In addition to traditional ”parallel” and ”serial” tar- attempt to actively incorporate the goals of the viewer get detection, guided search explains similarity theory’s into a model of human search. He hypothesized that results. Low N-N similarity causes more distractors to re- an activation map based on both bottom-up and top- port bottom-up activation, while high T-N similarity re- down information is constructed during visual search. duces the target element’s bottom-up activation. Guided Attention is drawn to peaks in the activation map that search also offers a possible explanation for cases where represent areas in the image with the largest combination conjunction search can be performed preattentively [33], of bottom-up and top-down in uence. [48], [63]: viewer-driven top-down activation may permit As with Treisman, Wolfe believes early vision divides ef cient search for conjunction targets. an image into individual feature maps (Fig. 6). Within each map a feature is ltered into multiple categories, for example, colors might be divided into red, green, 3.5 Boolean Map Theory blue, and yellow. Bottom-up activation follows feature A new model of low-level vision has been presented by categorization. It measures how different an element is Huang et al. to study why we often fail to notice features from its neighbors. Top-down activation is a user-driven of a display that are not relevant to the immediate task attempt to verify hypotheses or answer questions by [64], [65]. This theory carefully divides visual search “glancing” about an image, searching for the necessary into two stages: selection and access. Selection involves visual information. For example, visual search for a choosing a set of objects from a scene. Access determines “blue” element would generate a top-down request that which properties of the selected objects a viewer can is drawn to blue locations. Wolfe argued that viewers apprehend. must specify requests in terms of the categories provided Huang suggests that the visual system can divide a by each feature map [18], [31]. Thus, a viewer could scene into two parts: selected elements and excluded search for “steep” or “shallow” elements, but not for elements. This is the “boolean map” that underlies his elements rotated by a speci c angle. theory. The visual system can then access certain proper- The activation map is a combination of bottom-up and ties of the selected elements for more detailed analysis. top-down activity. The weights assigned to these two Boolean maps are created in two ways. First, a viewer values are task dependent. Hills in the map mark regions can specify a single value of an individual feature to that generate relatively large amounts of bottom-up or select all objects that contain the feature value. For
  • 7. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 7 (a) (b) (a) (b) Fig. 8. Conjunction search for a blue horizontal target with Fig. 9. Estimating average size: (a) average size of green boolean maps, select “blue” objects, then search within for elements is larger; (b) average size of blue elements is a horizontal target: (a) target present; (b) target absent larger [66] example, a viewer could look for red objects, or vertical Fig. 8 shows two examples of searching for a blue hor- objects. If a viewer selected red objects (Fig. 7b), the izontal target. Viewers can apply the following strategy color feature label for the resulting boolean map would to search for the target. First, search for blue objects, be “red”. Labels for other features (e.g., orientation, and once these are “held” in your memory, look for a size) would be unde ned, since they have not (yet) horizontal object within that group. For most observers participated in the creation of the map. A second method it is not dif cult to determine the target is present in of selection is for a viewer to choose a set of elements at Fig. 8a and absent in Fig. 8b. speci c spatial locations. Here the boolean map’s feature labels are left unde ned, since no speci c feature value 3.6 Ensemble Coding was used to identify the selected elements. Figs. 7a–c All the preceding characterizations of preattentive vision show an example of a simple scene, and the resulting have focused on how low level-visual processes can be boolean maps for selecting red objects or vertical objects. used to guide attention in a larger scene and how a An important distinction between feature integration viewer’s goals interact with these processes. An equally and boolean maps is that, in feature integration, presence important characteristic of low-level vision is its ability or absence of a feature is available preattentively, but to generate a quick summary of how simple visual fea- no information on location is provided. A boolean map tures are distributed across the eld of view. The ability encodes the speci c spatial locations of the elements that of humans to register a rapid and in-parallel summary are selected, as well as feature labels to de ne properties of a scene in terms of its simple features was rst of the selected objects. reported by Ariely [66]. He demonstrated that observers A boolean map can also be created by applying the could extract the average size of a large number of dots set operators union or intersection on two existing maps from a single glimpse of a display. Yet, when observers (Fig. 7d). For example, a viewer could create an initial were tested on the same displays and asked to indicate map by selecting red objects (Fig. 7b), then select vertical whether a speci c dot of a given size was present, they objects (Fig. 7c) and intersect the vertical map with were unable to do so. This suggests that there is a the red map currently held in memory. The result is a preattentive mechanism that records summary statistics boolean map identifying the locations of red, vertical of visual features without retaining information about objects (Fig. 7d). A viewer can only retain a single the constituent elements that generated the summary. boolean map. The result of the set operation immediately Other research has followed up on this remarkable replaces the viewer’s current map. ability, showing that rapid averages are also computed Boolean maps lead to some surprising and counter- for the orientation of simple edges seen only in pe- intuitive claims. For example, consider searching for a ripheral vision [67], for color [68] and for some higher- blue horizontal target in a sea of red horizontal and level qualities such as the emotions expressed—happy blue vertical objects. Unlike feature integration or guided versus sad—in a group of faces [69]. Exploration of the search, boolean map theory says this type of combined robustness of the ability indicates the precision of the feature search is more dif cult because it requires two extracted mean is not compromised by large changes in boolean map operations in series: creating a blue map, the shape of the distribution within the set [68], [70]. then creating a horizontal map and intersecting it against Fig. 9 shows examples of two average size estimation the blue map to hunt for the target. Importantly, how- trials. Viewers are asked to report which group has ever, the time required for such a search is constant and a larger average size: blue or green. In Fig. 9a each independent of the number of distractors. It is simply the group contains six large and six small elements, but the sum of the time required to complete the two boolean green elements are all larger than their blue counterparts, map operations. resulting in a larger average size for the green group. In
  • 8. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 8 Fig. 9b the large and small elements in each group are the same size, but there are more large blue elements than large green elements, producing a larger average size for the blue group. In both cases viewers responded with 75% accuracy or greater for diameter differences of only 8–12%. Ensemble encoding of visual properties may help to explain our experience of gist, the rich contextual information we are able to obtain from the briefest of glimpses at a scene. This ability may offer important advantages in certain (a) (b) visualization environments. For example, given a stream of real-time data, ensemble coding would allow viewers Fig. 10. Hue-on-form hierarchy: (a) horizontal form to observe the stream at a high frame rate, yet still iden- boundary is masked when hue varies randomly; (b) verti- tify individual frames with interesting relative distribu- cal hue boundary preattentively identi ed even when form tions of visual features (i.e., attribute values). Ensemble varies randomly [72] coding would also be critical for any situation where viewers want to estimate the amount of a particular data to a particular object in a scene?” An equally interesting attribute in a display. These capabilities were hinted at question is, “What do we remember about an object in a paper by Healey et al. [71], but without the bene t or a scene when we stop attending to it and look of ensemble coding as a possible explanation. at something else?” Many viewers assume that as we 3.7 Feature Hierarchy look around us we are constructing a high-resolution, fully detailed description of what we see. Researchers One of the most important considerations for a visu- in psychophysics have known for some time that this is alization designer is deciding how to present informa- not true [21], [47], [61], [79], [80]. In fact, in many cases tion in a display without producing visual confusion. our memory for detail between glances at a scene is very Consider, for example, the conjunction search shown in limited. Evidence suggests that a viewer’s current state Figs. 1e–f. Another important type of interference results of mind can play a critical role in determining what is from a feature hierarchy that appears to exist in the being seen at any given moment, what is not being seen, visual system. For certain tasks one visual feature may and what will be seen next. be “more salient” than another. For example, during boundary detection Callaghan showed that the visual system favors color over shape [72]. Background varia- 4.1 Eye Tracking tions in color slowed—but did not completely inhibit— Although the dynamic interplay between bottom-up and a viewer’s ability to preattentively identify the presence top-down processing was already evident in the early of spatial patterns formed by different shapes (Fig. 10a). eye tracking research of Yarbus [4], some modern theo- If color is held constant across the display, these same rists have tried to predict human eye movements during shape patterns are immediately visible. The interference scene viewing with a purely bottom-up approach. Most is asymmetric: random variations in shape have no effect notably, Itti and Koch [6] developed the saliency theory on a viewer’s ability to see color patterns (Fig. 10b). of eye movements based on Treisman’s feature integra- Luminance-on-hue and hue-on-texture preferences have tion theory. Their guiding assumption was that during also been found [23], [47], [73], [74], [75]. each xation of a scene, several basic feature contrasts— Feature hierarchies suggest the most important data luminance, color, orientation—are processed rapidly and attributes should be displayed with the most salient in parallel across the visual eld, over a range of spatial visual features, to avoid situations where secondary data scales varying from ne to coarse. These analyses are values mask the information the viewer wants to see. combined into a single feature-independent “conspicuity Various researchers have proposed theories for how map” that guides the deployment of attention and there- visual features compete for attention [76], [77], [78]. They fore the next saccade to a new location, similar to Wolfe’s point to a rough order of processing: (1) determine the activation map (Fig. 6). The model also includes an 3D layout of a scene; (2) determine surface structures inhibitory mechanism—inhibition of return—to prevent and volumes; (3) establish object movement; (4) interpret repeated attention and xation to previously viewed luminance gradients across surfaces; and (5) use color salient locations. to ne-tune these interpretations. If a con ict arises The surprising outcome of applying this model to between levels, it is usually resolved in favor of giving visual inspection tasks, however, has not been to suc- priority to an earlier process. cessfully predict eye movements of viewers. Rather, its bene t has come from making explicit the failure of a 4 V ISUAL E XPECTATION AND M EMORY purely bottom-up approach to determine the movement Preattentive processing asks in part, “What visual prop- of attention and the eyes. It has now become almost rou- erties draw our eyes, and therefore our focus of attention tine in the eye tracking literature to use the Itti and Koch
  • 9. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 9 model as a benchmark for bottom-up saliency, against which the top-down cognitive in uences on visual selec- tion and eye tracking can be assessed (e.g., [7], [81], [82], GREEN GREEN [83], [84]). For example, in an analysis of gaze during VERTICAL VERTICAL everyday activities, xations are made in the service of locating objects and performing manual actions on them, rather than on the basis of object distinctiveness [85]. A very readable history of the technology involved in eye tracking is given in Wade and Tatler [86]. (a) Other theorists have tried to use the pattern of eye movements during scene viewing as a direct index of the cognitive in uences on scene perception. For example, in the scanpath theory of Stark [5], [87], [88], [89], WHITE the saccades and xations made during initial viewing OBLIQUE become part of the lasting memory trace of a scene. Thus, according to this theory, the xation sequences during initial viewing and then later recognition of the same scene should be similar. Much research has con rmed (b) that there are correlations between the scanpaths of initial and subsequent viewings. Yet, at the same time, Fig. 11. Search for color-and-shape conjunction targets: there seem to be no negative effects on scene memory (a) text identifying the target is shown, followed by the when scanpaths differ between views [90]. scene, green vertical target is present; (b) a preview One of the most profound demonstrations that eye is shown, followed by text identifying the target, white gaze and perception were not one and the same was rst oblique target is absent [80] reported by Grimes [91]. He tracked the eyes of viewers examining natural photographs in preparation for a later memory test. On some occasions he would make large Wolfe believed that if multiple objects are recognized changes to the photos during the brief period—20–40 simultaneously in the low-level visual system, it must msec—in which a saccade was being made from one involve a search for links between the objects and their location to another in the photo. He was shocked to nd representation in long-term memory (LTM). LTM can be that when two people in a photo changed clothing, or queried nearly instantaneously, compared to the 40–50 even heads, during a saccade, viewers were often blind msec per item needed to search a scene or to access short- to these changes, even when they had recently xated term memory. Preattentive processing can rapidly draw the location of the changed features directly. the focus of attention to a target object, so little or no Clearly, the eyes are not a direct window to the searching is required. To remove this assistance, Wolfe soul. Research on eye tracking has shown repeatedly designed targets with two properties (Fig. 11): that merely tracking the eyes of a viewer during scene 1) Targets are formed from a conjunction of features— perception provides no privileged access to the cogni- they cannot be detected preattentively. tive processes undertaken by the viewer. Researchers 2) Targets are arbitrary combinations of colors and studying the top-down contributions to perception have shapes—they cannot be semantically recognized therefore established methodologies in which the role and remembered. of memory and expectation can be studied through Wolfe initially tested two search types: more indirect methods. In the sections that follow, we 1) Traditional search. Text on a blank screen described present ve laboratory procedures that have been devel- the target. This was followed by a display contain- oped speci cally for this purpose: postattentive amnesia, ing 4–8 potential target formed by combinations of memory-guided search, change blindness, inattentional colors and shapes in a 3 × 3 array (Fig. 11a). blindness, and attentional blink. Understanding what we 2) Postattentive search. The display was shown to the are thinking, remembering, and expecting as we look at viewer for up to 300 msec. Text describing the different parts of a visualization is critical to designing target was then inserted into the scene (Fig. 11b). visualizations that encourage locating and retaining the Results showed that the preview provided no advan- information that is most important to the viewer. tage. Postattentive search was as slow (or slower) than the traditional search, with approximately 25–40 msec 4.2 Postattentive Amnesia per object required for target present trials. This has a Wolfe conducted a study to determine whether showing signi cant potential impact for visualization design. In viewers a scene prior to searching it would improve most cases visualization displays are novel, and their their ability to locate targets [80]. Intuitively, one might contents cannot be committed to LTM. If studying a assume that seeing the scene in advance would help with display offers no assistance in searching for speci c data target detection. Wolfe’s results suggest this is not true. values, then preattentive methods that draw attention to
  • 10. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 10 Wait 950 msec areas of potential interest are critical for ef cient data exploration. Look 100 msec 4.3 Attention Guided by Memory and Prediction Although research on postattentive amnesia suggests that there are few, if any, advantages from repeated view- ing of a display, several more recent ndings suggest there are important bene ts of memory during search. Interestingly, all of these bene ts seem to occur outside of the conscious awareness of the viewer. In the area of contextual cuing [92], [93], viewers Wait 950 msec nd a target more rapidly for a subset of the displays Look 100 msec that are presented repeatedly—but in a random order— versus other displays that are presented for the rst Fig. 12. A rapid responses re-display trial, viewers are time. Moreover, when tested after the search task was asked to report the color of the T target, two separate completed, viewers showed no conscious recollection or displays must be searched [97] awareness that some of the displays were repeated or that their search speed bene ted from these repetitions. Contextual cuing appears to involve guiding attention one with red elements, the other with blue elements to a target by subtle regularities in the past experience (Fig. 12). Viewers were asked to identify the color of the of a viewer. This means that attention can be affected by target T—that is, to determine whether either of the two incidental knowledge about global context, in particular, displays contained a T. Here, viewers are forced to stop the spatial relations between the target and non-target one search and initiate another as the display changes. items in a given display. Visualization might be able to As before, extremely fast responses were observed for harness such incidental spatial knowledge of a scene by displays that were re-presented. tracking both the number of views and the time spent The interpretation that the rapid responses re ected viewing images that are later re-examined by the viewer. perceptual predictions—as opposed to easy access to A second line of research documents the unconscious memory of the scene—was based on two crucial ndings tendency of viewers to look for targets in novel loca- [98], [99]. The rst was the sheer speed at which a tions in the display, as opposed to looking at locations search resumed after an interruption. Previous studies that have already been examined. This phenomenon is on the bene ts of visual priming and short term memory referred to as inhibition of return [94] and has been shown show responses that begin at least 500 msec after the to be distinct from strategic in uences on search, such onset of a display. Correct responses in the 100–250 msec as choosing consciously to search from left-to-right or range call for an explanation that goes beyond mere moving out from the center in a clockwise direction [95]. memory. The second nding was that rapid responses A nal area of research concerns the bene ts of re- depended critically on a participant’s ability to form suming a visual search that has been interrupted by mo- implicit perceptual predictions about what they expected mentarily occluding the display [96], [97]. Results show to see at a particular location in the display after it that viewers can resume an interrupted search much returned to view. faster than they can start a new search. This suggests For visualization, rapid response suggests that a that viewers bene t from implicit (i.e., unconscious) viewer’s domain knowledge may produce expectations perceptual predictions they make about the target based based on the current display about where certain data on the partial information acquired during the initial might appear in future displays. This in turn could glimpse of a display. improve a viewer’s ability to locate important data. Rapid resumption was rst observed when viewers were asked to search for a T among L-shapes [97]. Viewers were given brief looks at the display separated 4.4 Change Blindness by longer waits where the screen was blank. They easily Both postattentive amnesia and memory-guided search found the target within a few glimpses of the display. agree that our visual system does not resemble the A surprising result was the presence of many extremely relatively faithful and largely passive process of modern fast responses after display re-presentation. Analysis re- photography. A much better metaphor for vision is that vealed two different types of responses. The rst, which of a dynamic and ongoing construction project, where occurred only during re-presentation, required 100–250 the products being built are short-lived models of the ex- msec. This was followed by a second, slower set of ternal world that are speci cally designed for the current responses that peaked at approximately 600 msec. visually guided tasks of the viewer [100], [101], [102], To test whether search was being fully interrupted, [103]. There does not appear to be any general purpose a second experiment showed two interleaved displays, vision. What we “see” when confronted with a new
  • 11. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 11 scene depends as much on our goals and expectations switched to a completely different actor. Nearly as it does on the light that enters our eyes. two-thirds of the subjects failed to report that the These new ndings differ from the initial ideas of main actor was replaced, instead describing him preattentive processing: that only certain features are rec- using details from the initial actor. ognized without the need for focused attention, and that 3) Nothing is Stored. No details are represented other features cannot be detected, even when viewers ac- internally after a scene is abstracted. When we tively search for them. More recent work in preattentive need speci c details, we simply re-examine the vision has shown that the visual differences between a scene. We are blind to change unless it affects our target and its neighbors, what a viewer is searching for, abstracted knowledge of the scene, or unless it and how the image is presented can all have an effect on occurs where we are looking. search performance. For example, Wolfe’s guided search 4) Everything is Stored, Nothing is Compared. De- theory assumes both bottom-up (i.e., preattentive) and tails about a scene are stored, but cannot be ac- top-down (i.e., attention-based) activation of features in cessed without an external stimulus. In one study, an image [48], [61], [62]. Other researchers like Treisman an experimenter asks a pedestrian for directions have also studied the dual effects of preattentive and [103]. During this interaction, a group of students attention-driven demands on what the visual system walks between the experimenter and the pedes- sees [45], [46]. Wolfe’s discussion of postattentive am- trian, surreptitiously taking a basketball the exper- nesia points out that details of an image cannot be re- imenter is holding. Only a very few pedestrians membered across separate scenes except in areas where reported that the basketball had gone missing, viewers have focused their attention [80]. but when asked speci cally about something the New research in psychophysics has shown that an experimenter was holding, more than half of the interruption in what is being seen—a blink, an eye sac- remaining subjects remembered the basketball, of- cade, or a blank screen—renders us “blind” to signi cant ten providing a detailed description. changes that occur in the scene during the interruption. 5) Feature Combination. Details from an initial view This change blindness phenomena can be illustrated us- and the current view are merged to form a com- ing a task similar to one shown in comic strips for many bined representation of the scene. Viewers are not years [101], [102], [103], [104]. Fig. 13 shows three pairs aware of which parts of their mental image come of images. A signi cant difference exists between each from which scene. image pair. Many viewers have a dif cult time seeing Interestingly, none of the explanations account for all any difference and often have to be coached to look of the change blindness effects that have been identi ed. carefully to nd it. Once they discover it, they realize that This suggests that some combination of these ideas— the difference was not a subtle one. Change blindness or some completely different hypothesis—is needed to is not a failure to see because of limited visual acuity; properly model the phenomena. rather, it is a failure based on inappropriate attentional Simons and Rensink recently revisited the area of guidance. Some parts of the eye and the brain are clearly change blindness [107]. They summarize much of the responding differently to the two pictures. Yet, this does work-to-date, and describe important research issues not become part of our visual experience until attention that are being studied using change blindness experi- is focused directly on the objects that vary. ments. For example, evidence shows that attention is re- The presence of change blindness has important im- quired to detect changes, although attention alone is not plications for visualization. The images we produce are necessarily suf cient [108]. Changes to attended objects normally novel for our viewers, so existing knowledge can also be missed, particularly when the changes are cannot be used to guide their analyses. Instead, we strive unexpected. Changes to semantically important objects to direct the eye, and therefore the mind, to areas of are detected faster than changes elsewhere [104]. Low- interest or importance within a visualization. This ability level object properties of the same kind (e.g., color or forms the rst step towards enabling a viewer to abstract size) appear to compete for recognition in visual short- details that will persist over subsequent images. term memory, but different properties seem to be en- Simons offers a wonderful overview of change blind- coded separately and in parallel [109]—similar in some ness, together with some possible explanations [103]. ways to Treisman’s original feature integration theory 1) Overwriting. The current image is overwritten by [21]. Finally, experiments suggest the locus of attention the next, so information that is not abstracted from is distributed symmetrically around a viewer’s xation the current image is lost. Detailed changes are only point [110]. detected at the focus of attention. Simons and Rensink also described hypotheses that 2) First Impression. Only the initial view of a scene they felt are not supported by existing research. For is abstracted, and if the scene is not perceived to example, many people have used change blindness to have changed, it is not re-encoded. One example suggest that our visual representation of a scene is of rst impression is an experiment by Levins and sparse, or altogether absent. Four hypothetical models of Simon where subjects viewed a short movie [105], vision were presented that include detailed representa- [106]. During a cut scene, the central character was tions of a scene, while still allowing for change blindness.
  • 12. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 12 (a) (b) (c) (d) (e) (f) Fig. 13. Change blindness, a major difference exists between each pair of images; (a–b) object added/removed; (c–d) color change; (e–f) luminance change A detailed representation could rapidly decay, making on this subject were conducted by Mack and Rock [101]. it unavailable for future comparisons; a representation Viewers were shown a cross at the center of xation and could exist in a pathway that is not accessible to the asked to report which arm was longer. After a very small comparison operation; a representation could exist and number of trials (two or three) a small “critical” object be accessible, but not be in a format that supports the was randomly presented in one of the quadrants formed comparison operation; or an appropriate representation by the cross. After answering which arm was longer, could exist, but the comparison operation is not applied viewers were then asked, “Did you see anything else even though it could be. on the screen besides the cross?” Approximately 25% of the viewers failed to report the presence of the critical object. This was surprising, since in target detection 4.5 Inattentional Blindness experiments (e.g., Figs. 1a–d) the same critical objects A related phenomena called inattentional blindness sug- are identi ed with close to 100% accuracy. gests that viewers can completely fail to perceive visually salient objects or activities. Some of the rst experiments These unexpected results led Mack and Rock to mod-
  • 13. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 13 tional blindness can be sustained over longer durations [111]. Neisser’s experiment superimposed video streams of two basketball games [112]. Players wore white shirts in one stream and black shirts in the other. Subjects at- tended to one team—either white or black—and ignored the other. Whenever the subject’s team made a pass, they were told to press a key. After about 30 seconds of video, a third stream was superimposed showing a woman walking through the scene with an open umbrella. The stream was visible for about 4 seconds, after which another 25 seconds of basketball video was shown. Following the trial, only a small number of observers reported seeing the woman. When subjects only watched the screen and did not count passes, 100% noticed the woman. Fig. 14. Images from Simons and Chabris’s inattentional Simons and Chabris controlled three conditions during blindness experiments, showing both superimposed and their experiment. Two video styles were shown: three single-stream video frames containing a woman with an superimposed video streams where the actors are semi- umbrella, and a woman in a gorilla suit [111] transparent, and a single stream where the actors are lmed together. This tests to see if increased realism af- fects awareness. Two unexpected actors were also used: ify their experiment. Following the critical trial another a woman with an umbrella, and a woman in a gorilla two or three noncritical trials were shown—again asking suit. This studies how actor similarity changes awareness viewers to identify the longer arm of the cross—followed (Fig. 14). Finally, two types of tasks were assigned to by a second critical trial and the same question, “Did subjects: maintain one count of the bounce passes your you see anything else on the screen besides the cross?” team makes, or maintain two separate counts of the Mack and Rock called these divided attention trials. The bounce passes and the aerial passes your team makes. expectation is that after the initial query viewers will This varies task dif culty to measure its impact on anticipate being asked this question again. In addition awareness. to completing the primary task, they will also search for After the video, subjects wrote down their counts, a critical object. In the nal set of displays viewers were and were then asked a series of increasingly speci c told to ignore the cross and focus entirely on identifying questions about the unexpected actor, starting with “Did whether a critical object appears in the scene. Mack and you notice anything unusual?” to “Did you see a go- Rock called these full attention trials, since a viewer’s rilla/woman carrying an umbrella?” About half of the entire attention is directed at nding critical objects. subjects tested failed to notice the unexpected actor, Results showed that viewers were signi cantly better demonstrating sustained inattentional blindness in a dy- at identifying critical objects in the divided attention tri- namic scene. A single stream video, a single count task, als, and were nearly 100% accurate during full attention and a woman actor all made the task easier, but in every trials. This con rmed that the critical objects were salient case at least one-third of the observers were blind to the and detectable under the proper conditions. unexpected event. Mack and Rock also tried placing the cross in the pe- riphery and the critical object at the xation point. They assumed this would improve identifying critical trials, 4.6 Attentional Blink but in fact it produced the opposite effect. Identi cation In each of the previous methods for studying visual rates dropped to as low as 15%. This emphasizes that attention, the primary emphasis is on how human at- subjects can fail to see something, even when it is directly tention is limited in its ability to represent the details of in their eld of vision. a scene, and in its ability to represent multiple objects Mack and Rock hypothesized that “there is no per- at the same time. But attention is also severely limited ception without attention.” If you do not attend to an in its ability to process information that arrives in quick object in some way, you may not perceive it at all. succession, even when that information is presented at This suggestion contradicts the belief that objects are a single location in space. organized into elementary units automatically and prior Attentional blink is currently the most widely used to attention being activated (e.g., Gestalt theory). If atten- method to study the availability of attention across time. tion is intentional, without objects rst being perceived Its name—“blink”—derives from the nding that when there is nothing to focus attention on. Mack and Rock’s two targets are presented in rapid succession, the second experiments suggest this may not be true. of the two targets cannot be detected or identi ed when More recent work by Simons and Chabris recreated a it appears within approximately 100–500 msec following classic study by Neisser to determine whether inatten- the rst target [113], [114].
  • 14. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 14 In a typical experiment, visual items such as words visualization and in psychophysics. For example, exper- or pictures are shown in a rapid serial presentation at a iments conducted by the authors on perceiving orien- single location. Raymond et al. [114] asked participants tation led to a visualization technique for multivalued to identify the only white letter ( rst target) in a 10-item scalar elds [19], and to a new theory on how targets per second stream of black letters (distractors), then to are detected and localized in cognitive vision [117]. report whether the letter “X” (second target) occurred in the subsequent letter stream. The second target was 5.1 Visual Attention present in 50% of trials and, when shown, appeared at random intervals after the rst target ranging from Understanding visual attention is important, both in 100–800 msec. Reports of both targets were required visualization and in graphics. The proper choice of visual after the stimulus stream ended. The attentional blink features will draw the focus of attention to areas in a is de ned as having occurred when the rst target is visualization that contain important data, and correctly reported correctly, but the second target is not. This weight the perceptual strength of a data element based usually happens for temporal lags between targets of on the attribute values it encodes. Tracking attention can 100–500 msec. Accuracy recovers to a normal baseline be used to predict where a viewer will look, allowing level at longer intervals. different parts of an image to be managed based on the Curiously, when the second target is presented im- amount of attention they are expected to receive. mediately following the rst target (i.e., with no delay Perceptual Salience. Building a visualization often between the two targets), reports of the second target are begins with a series of basic questions, “How should I quite accurate [115]. This suggests that attention operates represent the data? How can I highlight important data over time like a window or gate, opening in response to values when they appear? How can I ensure that viewers nding a visual item that matches its current criterion perceive differences in the data accurately?” Results from and then closing shortly thereafter to consolidate that research on visual attention can be used to assign visual item as a distinct object. The attentional blink is therefore features to data values in ways that satisfy these needs. an index of the “dwell time” needed to consolidate A well known example of this approach is the design a rapidly presented visual item into visual short term of colormaps to visualize continuous scalar values. The memory, making it available for conscious report [116]. vision models agree that properties of color are preat- Change blindness, inattentional blindness, and atten- tentive. They do not, however, identify the amount of tional blink have important consequences for visualiza- color difference needed to produce distinguishable col- tion. Signi cant changes in the data may be missed ors. Follow-on studies have been conducted by visualiza- if attention is fully deployed or focused on a speci c tion researchers to measure this difference. For example, location in a visualization. Attending to data elements Ware ran experiments that asked a viewer to distinguish in one frame of an animation may render us temporarily individual colors and shapes formed by colors. He used blind to what follows at that location. These issues must his results to build a colormap that spirals up the lumi- be considered during visualization design. nance axis, providing perceptual balance and controlling simultaneous contrast error [118]. Healey conducted a visual search experiment to determine the number of col- 5 V ISUALIZATION AND G RAPHICS ors a viewer can distinguish simultaneously. His results How should researchers in visualization and graphics showed that viewers can rapidly choose between up to choose between the different vision models? In psy- seven isoluminant colors [119]. Kuhn et al. used results chophysics, the models do not compete with one another. from color perception experiments to recolor images in Rather, they build on top of one another to address ways that allow colorblind viewers to properly perceive common problems and new insights over time. The color differences [120]. Other visual features have been models differ in terms of why they were developed, and studied in a similar fashion, producing guidelines on in how they explain our eye’s response to visual stimulae. the use of texture—size, orientation, and regularity [121], Yet, despite this diversity, the models usually agree on [122], [123]—and motion— icker, direction, and velocity which visual features we can attend to. Given this, we [38], [124]—for visualizing data. recommend considering the most recent models, since An alternative method for measuring image salience these are the most comprehensive. is Daly’s visible differences predictor, a more physically- A related question asks how well a model ts our based approach that uses light level, spatial frequency, needs. For example, the models identify numerous visual and signal content to de ne a viewer’s sensitivity at features as preattentive, but they may not de ne the each image pixel [125]. Although Daly used his metric difference needed to produce distinguishable instances to compare images, it could also be applied to de ne of a feature. Follow-on experiments are necessary to perceptual salience within an visualization. extend the ndings for visualization design. Another important issue, particularly for multivari- Finally, although vision models have proven to be ate visualization, is feature interference. One common surprisingly robust, their predictions can fail. Identifying approach visualizes each data attribute with a separate these situations often leads to new research, both in visual feature. This raises the question, “Will the visual