This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2effort or cognitive skill , . An important discoveryin these early studies was the identi cation of a limitedset of visual features that are detected very rapidly bylow-level, fast-acting visual processes. These propertieswere initially called preattentive, since their detectionseemed to precede focused attention, occurring withinthe brief period of a single xation. We now know thatattention plays a critical role in what we see, even at thisearly stage of vision. The term preattentive continues tobe used, however, for its intuitive notion of the speed (a) (b)and ease with which these properties are identi ed. Typically, tasks that can be performed on large multi-element displays in less than 200–250 milliseconds(msec) are considered preattentive. Since a saccade takesat least 200 msec to initiate, viewers complete the taskin a single glance. An example of a preattentive taskis the detection of a red circle in a group of blue circles(Figs. 1a–b). The target object has a visual property “red”that the blue distractor objects do not. A viewer caneasily determine whether the target is present or absent. Hue is not the only visual feature that is preattentive. (c) (d)In Figs. 1c–d the target is again a red circle, whilethe distractors are red squares. Here, the visual systemidenti es the target through a difference in curvature. A target de ned by a unique visual property—a redhue in Figs. 1a–b, or a curved form in Figs. 1c–d—allowsit to “pop out” of a display. This implies that it can beeasily detected, regardless of the number of distractors.In contrast to these effortless searches, when a target isde ned by the joint presence of two or more visual prop-erties it often cannot be found preattentively. Figs. 1e– (e) (f)f show an example of these more dif cult conjunction Fig. 1. Target detection: (a) hue target red circle absent;searches. The red circle target is made up of two features: (b) target present; (c) shape target red circle absent; (d)red and circular. One of these features is present in each target present; (e) conjunction target red circle present; (f)of the distractor objects—red squares and blue circles. A target absentsearch for red items always returns true because thereare red squares in each display. Similarly, a search forcircular items always sees blue circles. Numerous studies ary between two groups of elements, where all ofhave shown that most conjunction targets cannot be the elements in each group have a common visualdetected preattentively. Viewers must perform a time- property (see Fig. 10),consuming serial search through the display to con rm • region tracking: viewers track one or more elementsits presence or absence. with a unique visual feature as they move in time If low-level visual processes can be harnessed during and space, andvisualization, they can draw attention to areas of poten- • counting and estimation: viewers count the number oftial interest in a display. This cannot be accomplished in elements with a unique visual feature.an ad-hoc fashion, however. The visual features assignedto different data attributes must take advantage of thestrengths of our visual system, must be well-suited to the 3 T HEORIES OF P REATTENTIVE V ISIONviewer’s analysis needs, and must not produce visual A number of theories attempt to explain how preat-interference effects (e.g., conjunction search) that mask tentive processing occurs within the visual system. Weinformation. describe ve well-known models: feature integration, Fig. 2 lists some of the visual features that have been textons, similarity, guided search, and boolean maps.identi ed as preattentive. Experiments in psychology We next discuss ensemble coding, which shows thathave used these features to perform the following tasks: viewers can generate summaries of the distribution of • target detection: viewers detect a target element with visual features in a scene, even when they are unable to a unique visual feature within a eld of distractor locate individual elements based on those same features. elements (Fig. 1), We conclude with feature hierarchies, which describe • boundary detection: viewers detect a texture bound- situations where the visual system favors certain visual
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 3 orientation length closure size , , ,  ,   ,  curvature density number hue   , ,  , , , ,  luminance intersections terminators 3D depth , ,    ,  icker direction of motion velocity of motion lighting direction , , , ,  , ,  , , , ,  , Fig. 2. Examples of preattentive visual features, with references to papers that investigated each feature’s capabilitiesfeatures over others. Because we are interested equally focusing in particular on the image features that led toin where viewers attend in an image, as well as to what selective perception. She was inspired by a physiologicalthey are attending, we will not review theories focusing nding that single neurons in the brains of monkeysexclusively on only one of these functions (e.g., the responded selectively to edges of a speci c orientationattention orienting theory of Posner and Petersen ). and wavelength. Her goal was to nd a behavioral consequence of these kinds of cells in humans. To do3.1 Feature Integration Theory this, she focused on two interrelated problems. First, sheTreisman was one of the rst attention researchers to sys- tried to determine which visual properties are detectedtematically study the nature of preattentive processing, preattentively , , . She called these properties
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 4 individual feature maps red luminance green orientation yellow size blue contrast master map focus of attentionFig. 3. Treisman’s feature integration model of earlyvision—individual maps can be accessed in parallel to (a) (b) (c)detect feature activity, but focused attention is required tocombine features at a common spatial location  Fig. 4. Textons: (a,b) two textons A and B that appear different in isolation, but have the same size, number of terminators, and join points; (c) a target group of B-“preattentive features” . Second, she formulated a textons is dif cult to detect in a background of A-textonshypothesis about how the visual system performs preat- when random rotation is applied tentive processing . Treisman ran experiments using target and boundarydetection to classify preattentive features (Figs. 1 and 10), has a unique feature, one can simply access the givenmeasuring performance in two different ways: by re- feature map to see if any activity is occurring. Featuresponse time, and by accuracy. In the response time model maps are encoded in parallel, so feature detection isviewers are asked to complete the task as quickly as almost instantaneous. A conjunction target can only bepossible while still maintaining a high level of accuracy. detected by accessing two or more feature maps. InThe number of distractors in a scene is varied from few order to locate these targets, one must search seriallyto many. If task completion time is relatively constant through the master map of locations, looking for anand below some chosen threshold, independent of the object that satis es the conditions of having the correctnumber of distractors, the task is said to be preattentive combination of features. Within the model, this use of(i.e., viewers are not searching through the display to focused attention requires a relatively large amount oflocate the target). time and effort. In the accuracy version of the same task, the display In later work, Treisman has expanded her strict di-is shown for a small, xed exposure duration, then chotomy of features being detected either in parallel or inremoved. Again, the number of distractors in the scene serial , . She now believes that parallel and serialvaries across trials. If viewers can complete the task accu- represent two ends of a spectrum that include “more”rately, regardless of the number of distractors, the feature and “less,” not just “present” and “absent.” The amountused to de ne the target is assumed to be preattentive. of difference between the target and the distractors will Treisman and others have used their experiments to affect search time. For example, a long vertical line cancompile a list of visual features that are detected preat- be detected immediately among a group of short verticaltentively (Fig. 2). It is important to note that some of lines, but a medium-length line may take longer to see. Treisman has also extended feature integration to ex-these features are asymmetric. For example, a sloped line plain situations where conjunction search involving mo-in a sea of vertical lines can be detected preattentively, tion, depth, color, and orientation have been shown to bebut a vertical line in a sea of sloped lines cannot. preattentive , , . Treisman hypothesizes that In order to explain preattentive processing, Treisman a signi cant target–nontarget difference would allowproposed a model of low-level human vision made up individual feature maps to ignore nontarget information.of a set of feature maps and a master map of locations Consider a conjunction search for a green horizontal bar(Fig. 3). Each feature map registers activity for a spe- within a set of red horizontal bars and green verticalci c visual feature. When the visual system rst sees bars. If the red color map could inhibit information aboutan image, all the features are encoded in parallel into red horizontal bars, the search reduces to nding a greentheir respective maps. A viewer can access a particular horizontal bar in a sea of green vertical bars, whichmap to check for activity, and perhaps to determine the occurs preattentively.amount of activity. The individual feature maps giveno information about location, spatial arrangement, orrelationships to activity in other maps, however. 3.2 Texton Theory This framework provides a general hypothesis that Jul´ sz was also instrumental in expanding our under- eexplains how preattentive processing occurs. If the target standing of what we “see” in a single xation. His start-
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 5ing point came from a dif cult computational problemin machine vision, namely, how to de ne a basis set forthe perception of surface properties. Jul´ sz’s initial in- evestigations focused on determining whether variationsin order statistics were detected by the low-level visualsystem , , . Examples included contrast—a rst-order statistic—orientation and regularity—second- (a) (b)order statistics—and curvature—a third-order statistic. Fig. 5. N-N similarity affecting search ef ciency for anJul´ sz’s results were inconclusive. First-order variations e L-shaped target: (a) high N-N (nontarget-nontarget) sim-were detected preattentively. Some, but not all, second- ilarity allows easy detection of the target L; (b) low N-Norder variations were also preattentive. as were an even similarity increases the dif culty of detecting the target Lsmaller set of third-order variations.  Based on these ndings, Jul´ sz modi ed his theory eto suggest that the early visual system detects threecategories of features called textons , , : • as N-N similarity decreases, search ef ciency de- 1) Elongated blobs—Lines, rectangles, or ellipses— creases and search time increases, and with speci c hues, orientations, widths, and so on. • T-N and N-N similarity are related; decreasing N-N 2) Terminators—ends of line segments. similarity has little effect if T-N similarity is low; 3) Crossings of line segments. increasing T-N similarity has little effect if N-N Jul´ sz believed that only a difference in textons or in e similarity is high.their density could be detected preattentively. No posi- Treisman’s feature integration theory has dif cultytional information about neighboring textons is available explaining Fig. 5. In both cases, the distractors seemwithout focused attention. Like Treisman, Jul´ sz sug- e to use exactly the same features as the target: oriented,gested that preattentive processing occurs in parallel and connected lines of a xed length. Yet experimental resultsfocused attention occurs in serial. show displays similar to Fig. 5a produce an average Jul´ sz used texture segregation to demonstrate his the- e search time increase of 4.5 msec per distractor, versusory. Fig. 4 shows an example of an image that supports 54.5 msec per distractor for displays similar to Fig. 5b. Tothe texton hypothesis. Although the two objects look explain this, Duncan and Humphreys proposed a three-very different in isolation, they are actually the same step theory of visual selection.texton. Both are blobs with the same height and width, 1) The visual eld is segmented in parallel into struc-made up of the same set of line segments with two termi- tural units that share some common property, fornators. When oriented randomly in an image, one cannot example, spatial proximity or hue. Structural unitspreattentively detect the texture boundary between the may again be segmented, producing a hierarchicaltarget group and the background distractors. representation of the visual eld. 2) Access to visual short-term memory is a limited3.3 Similarity Theory resource. During target search a template of the tar-Some researchers did not support the dichotomy of get’s properties is available.The closer a structuralserial and parallel search modes. They noted that groups unit matches the template, the more resources itof neurons in the brain seemed to be competing over receives relative to other units with a poorer match.time to represent the same object. Work in this area 3) A poor match between a structural unit and theby Quinlan and Humphreys therefore began by inves- search template allows ef cient rejection of othertigating two separate factors in conjunction search . units that are strongly grouped to the rejected unit.First, search time may depend on the number of items Structural units that most closely match the targetof information required to identify the target. Second, template have the highest probability of access to visualsearch time may depend on how easily a target can short-term memory. Search speed is therefore a functionbe distinguished from its distractors, regardless of the of the speed of resource allocation and the amountpresence of unique preattentive features. Follow-on work of competition for access to visual short-term memory.by Duncan and Humphreys hypothesized that search Given this, we can see how T-N and N-N similarity affectability varies continuously, and depends on both the search ef ciency. Increased T-N similarity means moretype of task and the display conditions , , . structural units match the template, so competition forSearch time is based on two criteria: T-N similarity and visual short-term memory access increases. DecreasedN-N similarity. T-N similarity is the amount of similarity N-N similarity means we cannot ef ciently reject largebetween targets and nontargets. N-N similarity is the numbers of strongly grouped structural units, so re-amount of similarity within the nontargets themselves. source allocation time and search time increases.These two factors affect search time as follows: Interestingly, similarity theory is not the only attempt • as T-N similarity increases, search ef ciency de- to distinguish between preattentive and attentive results creases and search time increases, based on a single parallel process. Nakayama and his
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 6 wn ou r -do col top green green green Y e B ag im G up R tio n m- tto nta bo orie steep steep steep right left steep shallow (a) (b)Fig. 6. Guided search for steep green targets, an imageis ltered into categories for each feature map, bottom-up and top-down activation “mark” target regions, and anactivation map combines the information to draw attentionto the highest “hills” in the map colleagues proposed the use of stereo vision and oc-clusion to segment a three-dimensional scene, wherepreattentive search could be performed independently (c) (d)within a segment , . Others have presented sim-ilar theories that segment by object , or by signal Fig. 7. Boolean maps: (a) red and blue vertical andstrength and noise . The problem of distinguishing horizontal elements; (b) map for “red”, color label is red,serial from parallel processes in human cognition is one orientation label is unde ned; (c) map for “vertical”, orien-of the longest-standing puzzles in the eld, and one that tation label is vertical, color label is unde ned; (d) map forresearchers often return to . set intersection on “red” and “vertical” maps 3.4 Guided Search Theory top-down in uence. A viewer’s attention is drawn fromMore recently, Wolfe et al. has proposed the theory hill to hill in order of decreasing activation.of “guided search” , , . This was the rst In addition to traditional ”parallel” and ”serial” tar-attempt to actively incorporate the goals of the viewer get detection, guided search explains similarity theory’sinto a model of human search. He hypothesized that results. Low N-N similarity causes more distractors to re-an activation map based on both bottom-up and top- port bottom-up activation, while high T-N similarity re-down information is constructed during visual search. duces the target element’s bottom-up activation. GuidedAttention is drawn to peaks in the activation map that search also offers a possible explanation for cases whererepresent areas in the image with the largest combination conjunction search can be performed preattentively ,of bottom-up and top-down in uence. , : viewer-driven top-down activation may permit As with Treisman, Wolfe believes early vision divides ef cient search for conjunction targets.an image into individual feature maps (Fig. 6). Withineach map a feature is ltered into multiple categories,for example, colors might be divided into red, green, 3.5 Boolean Map Theoryblue, and yellow. Bottom-up activation follows feature A new model of low-level vision has been presented bycategorization. It measures how different an element is Huang et al. to study why we often fail to notice featuresfrom its neighbors. Top-down activation is a user-driven of a display that are not relevant to the immediate taskattempt to verify hypotheses or answer questions by , . This theory carefully divides visual search“glancing” about an image, searching for the necessary into two stages: selection and access. Selection involvesvisual information. For example, visual search for a choosing a set of objects from a scene. Access determines“blue” element would generate a top-down request that which properties of the selected objects a viewer canis drawn to blue locations. Wolfe argued that viewers apprehend.must specify requests in terms of the categories provided Huang suggests that the visual system can divide aby each feature map , . Thus, a viewer could scene into two parts: selected elements and excludedsearch for “steep” or “shallow” elements, but not for elements. This is the “boolean map” that underlies hiselements rotated by a speci c angle. theory. The visual system can then access certain proper- The activation map is a combination of bottom-up and ties of the selected elements for more detailed analysis.top-down activity. The weights assigned to these two Boolean maps are created in two ways. First, a viewervalues are task dependent. Hills in the map mark regions can specify a single value of an individual feature tothat generate relatively large amounts of bottom-up or select all objects that contain the feature value. For
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 7 (a) (b) (a) (b)Fig. 8. Conjunction search for a blue horizontal target with Fig. 9. Estimating average size: (a) average size of greenboolean maps, select “blue” objects, then search within for elements is larger; (b) average size of blue elements isa horizontal target: (a) target present; (b) target absent larger example, a viewer could look for red objects, or vertical Fig. 8 shows two examples of searching for a blue hor-objects. If a viewer selected red objects (Fig. 7b), the izontal target. Viewers can apply the following strategycolor feature label for the resulting boolean map would to search for the target. First, search for blue objects,be “red”. Labels for other features (e.g., orientation, and once these are “held” in your memory, look for asize) would be unde ned, since they have not (yet) horizontal object within that group. For most observersparticipated in the creation of the map. A second method it is not dif cult to determine the target is present inof selection is for a viewer to choose a set of elements at Fig. 8a and absent in Fig. 8b.speci c spatial locations. Here the boolean map’s featurelabels are left unde ned, since no speci c feature value 3.6 Ensemble Codingwas used to identify the selected elements. Figs. 7a–c All the preceding characterizations of preattentive visionshow an example of a simple scene, and the resulting have focused on how low level-visual processes can beboolean maps for selecting red objects or vertical objects. used to guide attention in a larger scene and how a An important distinction between feature integration viewer’s goals interact with these processes. An equallyand boolean maps is that, in feature integration, presence important characteristic of low-level vision is its abilityor absence of a feature is available preattentively, but to generate a quick summary of how simple visual fea-no information on location is provided. A boolean map tures are distributed across the eld of view. The abilityencodes the speci c spatial locations of the elements that of humans to register a rapid and in-parallel summaryare selected, as well as feature labels to de ne properties of a scene in terms of its simple features was rstof the selected objects. reported by Ariely . He demonstrated that observers A boolean map can also be created by applying the could extract the average size of a large number of dotsset operators union or intersection on two existing maps from a single glimpse of a display. Yet, when observers(Fig. 7d). For example, a viewer could create an initial were tested on the same displays and asked to indicatemap by selecting red objects (Fig. 7b), then select vertical whether a speci c dot of a given size was present, theyobjects (Fig. 7c) and intersect the vertical map with were unable to do so. This suggests that there is athe red map currently held in memory. The result is a preattentive mechanism that records summary statisticsboolean map identifying the locations of red, vertical of visual features without retaining information aboutobjects (Fig. 7d). A viewer can only retain a single the constituent elements that generated the summary.boolean map. The result of the set operation immediately Other research has followed up on this remarkablereplaces the viewer’s current map. ability, showing that rapid averages are also computed Boolean maps lead to some surprising and counter- for the orientation of simple edges seen only in pe-intuitive claims. For example, consider searching for a ripheral vision , for color  and for some higher-blue horizontal target in a sea of red horizontal and level qualities such as the emotions expressed—happyblue vertical objects. Unlike feature integration or guided versus sad—in a group of faces . Exploration of thesearch, boolean map theory says this type of combined robustness of the ability indicates the precision of thefeature search is more dif cult because it requires two extracted mean is not compromised by large changes inboolean map operations in series: creating a blue map, the shape of the distribution within the set , .then creating a horizontal map and intersecting it against Fig. 9 shows examples of two average size estimationthe blue map to hunt for the target. Importantly, how- trials. Viewers are asked to report which group hasever, the time required for such a search is constant and a larger average size: blue or green. In Fig. 9a eachindependent of the number of distractors. It is simply the group contains six large and six small elements, but thesum of the time required to complete the two boolean green elements are all larger than their blue counterparts,map operations. resulting in a larger average size for the green group. In
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 8Fig. 9b the large and small elements in each group arethe same size, but there are more large blue elementsthan large green elements, producing a larger averagesize for the blue group. In both cases viewers respondedwith 75% accuracy or greater for diameter differences ofonly 8–12%. Ensemble encoding of visual properties mayhelp to explain our experience of gist, the rich contextualinformation we are able to obtain from the briefest ofglimpses at a scene. This ability may offer important advantages in certain (a) (b)visualization environments. For example, given a streamof real-time data, ensemble coding would allow viewers Fig. 10. Hue-on-form hierarchy: (a) horizontal formto observe the stream at a high frame rate, yet still iden- boundary is masked when hue varies randomly; (b) verti-tify individual frames with interesting relative distribu- cal hue boundary preattentively identi ed even when formtions of visual features (i.e., attribute values). Ensemble varies randomly coding would also be critical for any situation whereviewers want to estimate the amount of a particular data to a particular object in a scene?” An equally interestingattribute in a display. These capabilities were hinted at question is, “What do we remember about an objectin a paper by Healey et al. , but without the bene t or a scene when we stop attending to it and lookof ensemble coding as a possible explanation. at something else?” Many viewers assume that as we3.7 Feature Hierarchy look around us we are constructing a high-resolution, fully detailed description of what we see. ResearchersOne of the most important considerations for a visu- in psychophysics have known for some time that this isalization designer is deciding how to present informa- not true , , , , . In fact, in many casestion in a display without producing visual confusion. our memory for detail between glances at a scene is veryConsider, for example, the conjunction search shown in limited. Evidence suggests that a viewer’s current stateFigs. 1e–f. Another important type of interference results of mind can play a critical role in determining what isfrom a feature hierarchy that appears to exist in the being seen at any given moment, what is not being seen,visual system. For certain tasks one visual feature may and what will be seen next.be “more salient” than another. For example, duringboundary detection Callaghan showed that the visualsystem favors color over shape . Background varia- 4.1 Eye Trackingtions in color slowed—but did not completely inhibit— Although the dynamic interplay between bottom-up anda viewer’s ability to preattentively identify the presence top-down processing was already evident in the earlyof spatial patterns formed by different shapes (Fig. 10a). eye tracking research of Yarbus , some modern theo-If color is held constant across the display, these same rists have tried to predict human eye movements duringshape patterns are immediately visible. The interference scene viewing with a purely bottom-up approach. Mostis asymmetric: random variations in shape have no effect notably, Itti and Koch  developed the saliency theoryon a viewer’s ability to see color patterns (Fig. 10b). of eye movements based on Treisman’s feature integra-Luminance-on-hue and hue-on-texture preferences have tion theory. Their guiding assumption was that duringalso been found , , , , . each xation of a scene, several basic feature contrasts— Feature hierarchies suggest the most important data luminance, color, orientation—are processed rapidly andattributes should be displayed with the most salient in parallel across the visual eld, over a range of spatialvisual features, to avoid situations where secondary data scales varying from ne to coarse. These analyses arevalues mask the information the viewer wants to see. combined into a single feature-independent “conspicuity Various researchers have proposed theories for how map” that guides the deployment of attention and there-visual features compete for attention , , . They fore the next saccade to a new location, similar to Wolfe’spoint to a rough order of processing: (1) determine the activation map (Fig. 6). The model also includes an3D layout of a scene; (2) determine surface structures inhibitory mechanism—inhibition of return—to preventand volumes; (3) establish object movement; (4) interpret repeated attention and xation to previously viewedluminance gradients across surfaces; and (5) use color salient locations.to ne-tune these interpretations. If a con ict arises The surprising outcome of applying this model tobetween levels, it is usually resolved in favor of giving visual inspection tasks, however, has not been to suc-priority to an earlier process. cessfully predict eye movements of viewers. Rather, its bene t has come from making explicit the failure of a4 V ISUAL E XPECTATION AND M EMORY purely bottom-up approach to determine the movementPreattentive processing asks in part, “What visual prop- of attention and the eyes. It has now become almost rou-erties draw our eyes, and therefore our focus of attention tine in the eye tracking literature to use the Itti and Koch
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 9model as a benchmark for bottom-up saliency, againstwhich the top-down cognitive in uences on visual selec-tion and eye tracking can be assessed (e.g., , , , GREEN GREEN, ). For example, in an analysis of gaze during VERTICAL VERTICALeveryday activities, xations are made in the service oflocating objects and performing manual actions on them,rather than on the basis of object distinctiveness . Avery readable history of the technology involved in eyetracking is given in Wade and Tatler . (a) Other theorists have tried to use the pattern of eyemovements during scene viewing as a direct index of thecognitive in uences on scene perception. For example,in the scanpath theory of Stark , , , , WHITEthe saccades and xations made during initial viewing OBLIQUEbecome part of the lasting memory trace of a scene. Thus,according to this theory, the xation sequences duringinitial viewing and then later recognition of the samescene should be similar. Much research has con rmed (b)that there are correlations between the scanpaths ofinitial and subsequent viewings. Yet, at the same time, Fig. 11. Search for color-and-shape conjunction targets:there seem to be no negative effects on scene memory (a) text identifying the target is shown, followed by thewhen scanpaths differ between views . scene, green vertical target is present; (b) a preview One of the most profound demonstrations that eye is shown, followed by text identifying the target, whitegaze and perception were not one and the same was rst oblique target is absent reported by Grimes . He tracked the eyes of viewersexamining natural photographs in preparation for a latermemory test. On some occasions he would make large Wolfe believed that if multiple objects are recognizedchanges to the photos during the brief period—20–40 simultaneously in the low-level visual system, it mustmsec—in which a saccade was being made from one involve a search for links between the objects and theirlocation to another in the photo. He was shocked to nd representation in long-term memory (LTM). LTM can bethat when two people in a photo changed clothing, or queried nearly instantaneously, compared to the 40–50even heads, during a saccade, viewers were often blind msec per item needed to search a scene or to access short-to these changes, even when they had recently xated term memory. Preattentive processing can rapidly drawthe location of the changed features directly. the focus of attention to a target object, so little or no Clearly, the eyes are not a direct window to the searching is required. To remove this assistance, Wolfesoul. Research on eye tracking has shown repeatedly designed targets with two properties (Fig. 11):that merely tracking the eyes of a viewer during scene 1) Targets are formed from a conjunction of features—perception provides no privileged access to the cogni- they cannot be detected preattentively.tive processes undertaken by the viewer. Researchers 2) Targets are arbitrary combinations of colors andstudying the top-down contributions to perception have shapes—they cannot be semantically recognizedtherefore established methodologies in which the role and remembered.of memory and expectation can be studied through Wolfe initially tested two search types:more indirect methods. In the sections that follow, we 1) Traditional search. Text on a blank screen describedpresent ve laboratory procedures that have been devel- the target. This was followed by a display contain-oped speci cally for this purpose: postattentive amnesia, ing 4–8 potential target formed by combinations ofmemory-guided search, change blindness, inattentional colors and shapes in a 3 × 3 array (Fig. 11a).blindness, and attentional blink. Understanding what we 2) Postattentive search. The display was shown to theare thinking, remembering, and expecting as we look at viewer for up to 300 msec. Text describing thedifferent parts of a visualization is critical to designing target was then inserted into the scene (Fig. 11b).visualizations that encourage locating and retaining the Results showed that the preview provided no advan-information that is most important to the viewer. tage. Postattentive search was as slow (or slower) than the traditional search, with approximately 25–40 msec4.2 Postattentive Amnesia per object required for target present trials. This has aWolfe conducted a study to determine whether showing signi cant potential impact for visualization design. Inviewers a scene prior to searching it would improve most cases visualization displays are novel, and theirtheir ability to locate targets . Intuitively, one might contents cannot be committed to LTM. If studying aassume that seeing the scene in advance would help with display offers no assistance in searching for speci c datatarget detection. Wolfe’s results suggest this is not true. values, then preattentive methods that draw attention to
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 10 Wait 950 msecareas of potential interest are critical for ef cient dataexploration. Look 100 msec4.3 Attention Guided by Memory and PredictionAlthough research on postattentive amnesia suggeststhat there are few, if any, advantages from repeated view-ing of a display, several more recent ndings suggestthere are important bene ts of memory during search.Interestingly, all of these bene ts seem to occur outsideof the conscious awareness of the viewer. In the area of contextual cuing , , viewers Wait 950 msec nd a target more rapidly for a subset of the displays Look 100 msecthat are presented repeatedly—but in a random order—versus other displays that are presented for the rst Fig. 12. A rapid responses re-display trial, viewers aretime. Moreover, when tested after the search task was asked to report the color of the T target, two separatecompleted, viewers showed no conscious recollection or displays must be searched awareness that some of the displays were repeated orthat their search speed bene ted from these repetitions.Contextual cuing appears to involve guiding attention one with red elements, the other with blue elementsto a target by subtle regularities in the past experience (Fig. 12). Viewers were asked to identify the color of theof a viewer. This means that attention can be affected by target T—that is, to determine whether either of the twoincidental knowledge about global context, in particular, displays contained a T. Here, viewers are forced to stopthe spatial relations between the target and non-target one search and initiate another as the display changes.items in a given display. Visualization might be able to As before, extremely fast responses were observed forharness such incidental spatial knowledge of a scene by displays that were re-presented.tracking both the number of views and the time spent The interpretation that the rapid responses re ectedviewing images that are later re-examined by the viewer. perceptual predictions—as opposed to easy access to A second line of research documents the unconscious memory of the scene—was based on two crucial ndingstendency of viewers to look for targets in novel loca- , . The rst was the sheer speed at which ations in the display, as opposed to looking at locations search resumed after an interruption. Previous studiesthat have already been examined. This phenomenon is on the bene ts of visual priming and short term memoryreferred to as inhibition of return  and has been shown show responses that begin at least 500 msec after theto be distinct from strategic in uences on search, such onset of a display. Correct responses in the 100–250 msecas choosing consciously to search from left-to-right or range call for an explanation that goes beyond meremoving out from the center in a clockwise direction . memory. The second nding was that rapid responses A nal area of research concerns the bene ts of re- depended critically on a participant’s ability to formsuming a visual search that has been interrupted by mo- implicit perceptual predictions about what they expectedmentarily occluding the display , . Results show to see at a particular location in the display after itthat viewers can resume an interrupted search much returned to view.faster than they can start a new search. This suggests For visualization, rapid response suggests that athat viewers bene t from implicit (i.e., unconscious) viewer’s domain knowledge may produce expectationsperceptual predictions they make about the target based based on the current display about where certain dataon the partial information acquired during the initial might appear in future displays. This in turn couldglimpse of a display. improve a viewer’s ability to locate important data. Rapid resumption was rst observed when viewerswere asked to search for a T among L-shapes .Viewers were given brief looks at the display separated 4.4 Change Blindnessby longer waits where the screen was blank. They easily Both postattentive amnesia and memory-guided searchfound the target within a few glimpses of the display. agree that our visual system does not resemble theA surprising result was the presence of many extremely relatively faithful and largely passive process of modernfast responses after display re-presentation. Analysis re- photography. A much better metaphor for vision is thatvealed two different types of responses. The rst, which of a dynamic and ongoing construction project, whereoccurred only during re-presentation, required 100–250 the products being built are short-lived models of the ex-msec. This was followed by a second, slower set of ternal world that are speci cally designed for the currentresponses that peaked at approximately 600 msec. visually guided tasks of the viewer , , , To test whether search was being fully interrupted, . There does not appear to be any general purposea second experiment showed two interleaved displays, vision. What we “see” when confronted with a new
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 11scene depends as much on our goals and expectations switched to a completely different actor. Nearlyas it does on the light that enters our eyes. two-thirds of the subjects failed to report that the These new ndings differ from the initial ideas of main actor was replaced, instead describing himpreattentive processing: that only certain features are rec- using details from the initial actor.ognized without the need for focused attention, and that 3) Nothing is Stored. No details are representedother features cannot be detected, even when viewers ac- internally after a scene is abstracted. When wetively search for them. More recent work in preattentive need speci c details, we simply re-examine thevision has shown that the visual differences between a scene. We are blind to change unless it affects ourtarget and its neighbors, what a viewer is searching for, abstracted knowledge of the scene, or unless itand how the image is presented can all have an effect on occurs where we are looking.search performance. For example, Wolfe’s guided search 4) Everything is Stored, Nothing is Compared. De-theory assumes both bottom-up (i.e., preattentive) and tails about a scene are stored, but cannot be ac-top-down (i.e., attention-based) activation of features in cessed without an external stimulus. In one study,an image , , . Other researchers like Treisman an experimenter asks a pedestrian for directionshave also studied the dual effects of preattentive and . During this interaction, a group of studentsattention-driven demands on what the visual system walks between the experimenter and the pedes-sees , . Wolfe’s discussion of postattentive am- trian, surreptitiously taking a basketball the exper-nesia points out that details of an image cannot be re- imenter is holding. Only a very few pedestriansmembered across separate scenes except in areas where reported that the basketball had gone missing,viewers have focused their attention . but when asked speci cally about something the New research in psychophysics has shown that an experimenter was holding, more than half of theinterruption in what is being seen—a blink, an eye sac- remaining subjects remembered the basketball, of-cade, or a blank screen—renders us “blind” to signi cant ten providing a detailed description.changes that occur in the scene during the interruption. 5) Feature Combination. Details from an initial viewThis change blindness phenomena can be illustrated us- and the current view are merged to form a com-ing a task similar to one shown in comic strips for many bined representation of the scene. Viewers are notyears , , , . Fig. 13 shows three pairs aware of which parts of their mental image comeof images. A signi cant difference exists between each from which scene.image pair. Many viewers have a dif cult time seeing Interestingly, none of the explanations account for allany difference and often have to be coached to look of the change blindness effects that have been identi ed.carefully to nd it. Once they discover it, they realize that This suggests that some combination of these ideas—the difference was not a subtle one. Change blindness or some completely different hypothesis—is needed tois not a failure to see because of limited visual acuity; properly model the phenomena.rather, it is a failure based on inappropriate attentional Simons and Rensink recently revisited the area ofguidance. Some parts of the eye and the brain are clearly change blindness . They summarize much of theresponding differently to the two pictures. Yet, this does work-to-date, and describe important research issuesnot become part of our visual experience until attention that are being studied using change blindness experi-is focused directly on the objects that vary. ments. For example, evidence shows that attention is re- The presence of change blindness has important im- quired to detect changes, although attention alone is notplications for visualization. The images we produce are necessarily suf cient . Changes to attended objectsnormally novel for our viewers, so existing knowledge can also be missed, particularly when the changes arecannot be used to guide their analyses. Instead, we strive unexpected. Changes to semantically important objectsto direct the eye, and therefore the mind, to areas of are detected faster than changes elsewhere . Low-interest or importance within a visualization. This ability level object properties of the same kind (e.g., color orforms the rst step towards enabling a viewer to abstract size) appear to compete for recognition in visual short-details that will persist over subsequent images. term memory, but different properties seem to be en- Simons offers a wonderful overview of change blind- coded separately and in parallel —similar in someness, together with some possible explanations . ways to Treisman’s original feature integration theory 1) Overwriting. The current image is overwritten by . Finally, experiments suggest the locus of attention the next, so information that is not abstracted from is distributed symmetrically around a viewer’s xation the current image is lost. Detailed changes are only point . detected at the focus of attention. Simons and Rensink also described hypotheses that 2) First Impression. Only the initial view of a scene they felt are not supported by existing research. For is abstracted, and if the scene is not perceived to example, many people have used change blindness to have changed, it is not re-encoded. One example suggest that our visual representation of a scene is of rst impression is an experiment by Levins and sparse, or altogether absent. Four hypothetical models of Simon where subjects viewed a short movie , vision were presented that include detailed representa- . During a cut scene, the central character was tions of a scene, while still allowing for change blindness.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 12 (a) (b) (c) (d) (e) (f)Fig. 13. Change blindness, a major difference exists between each pair of images; (a–b) object added/removed; (c–d)color change; (e–f) luminance changeA detailed representation could rapidly decay, making on this subject were conducted by Mack and Rock .it unavailable for future comparisons; a representation Viewers were shown a cross at the center of xation andcould exist in a pathway that is not accessible to the asked to report which arm was longer. After a very smallcomparison operation; a representation could exist and number of trials (two or three) a small “critical” objectbe accessible, but not be in a format that supports the was randomly presented in one of the quadrants formedcomparison operation; or an appropriate representation by the cross. After answering which arm was longer,could exist, but the comparison operation is not applied viewers were then asked, “Did you see anything elseeven though it could be. on the screen besides the cross?” Approximately 25% of the viewers failed to report the presence of the critical object. This was surprising, since in target detection4.5 Inattentional Blindness experiments (e.g., Figs. 1a–d) the same critical objectsA related phenomena called inattentional blindness sug- are identi ed with close to 100% accuracy.gests that viewers can completely fail to perceive visuallysalient objects or activities. Some of the rst experiments These unexpected results led Mack and Rock to mod-
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 13 tional blindness can be sustained over longer durations . Neisser’s experiment superimposed video streams of two basketball games . Players wore white shirts in one stream and black shirts in the other. Subjects at- tended to one team—either white or black—and ignored the other. Whenever the subject’s team made a pass, they were told to press a key. After about 30 seconds of video, a third stream was superimposed showing a woman walking through the scene with an open umbrella. The stream was visible for about 4 seconds, after which another 25 seconds of basketball video was shown. Following the trial, only a small number of observers reported seeing the woman. When subjects only watched the screen and did not count passes, 100% noticed the woman.Fig. 14. Images from Simons and Chabris’s inattentional Simons and Chabris controlled three conditions duringblindness experiments, showing both superimposed and their experiment. Two video styles were shown: threesingle-stream video frames containing a woman with an superimposed video streams where the actors are semi-umbrella, and a woman in a gorilla suit  transparent, and a single stream where the actors are lmed together. This tests to see if increased realism af- fects awareness. Two unexpected actors were also used:ify their experiment. Following the critical trial another a woman with an umbrella, and a woman in a gorillatwo or three noncritical trials were shown—again asking suit. This studies how actor similarity changes awarenessviewers to identify the longer arm of the cross—followed (Fig. 14). Finally, two types of tasks were assigned toby a second critical trial and the same question, “Did subjects: maintain one count of the bounce passes youryou see anything else on the screen besides the cross?” team makes, or maintain two separate counts of theMack and Rock called these divided attention trials. The bounce passes and the aerial passes your team makes.expectation is that after the initial query viewers will This varies task dif culty to measure its impact onanticipate being asked this question again. In addition awareness.to completing the primary task, they will also search for After the video, subjects wrote down their counts,a critical object. In the nal set of displays viewers were and were then asked a series of increasingly speci ctold to ignore the cross and focus entirely on identifying questions about the unexpected actor, starting with “Didwhether a critical object appears in the scene. Mack and you notice anything unusual?” to “Did you see a go-Rock called these full attention trials, since a viewer’s rilla/woman carrying an umbrella?” About half of theentire attention is directed at nding critical objects. subjects tested failed to notice the unexpected actor, Results showed that viewers were signi cantly better demonstrating sustained inattentional blindness in a dy-at identifying critical objects in the divided attention tri- namic scene. A single stream video, a single count task,als, and were nearly 100% accurate during full attention and a woman actor all made the task easier, but in everytrials. This con rmed that the critical objects were salient case at least one-third of the observers were blind to theand detectable under the proper conditions. unexpected event. Mack and Rock also tried placing the cross in the pe-riphery and the critical object at the xation point. Theyassumed this would improve identifying critical trials, 4.6 Attentional Blinkbut in fact it produced the opposite effect. Identi cation In each of the previous methods for studying visualrates dropped to as low as 15%. This emphasizes that attention, the primary emphasis is on how human at-subjects can fail to see something, even when it is directly tention is limited in its ability to represent the details ofin their eld of vision. a scene, and in its ability to represent multiple objects Mack and Rock hypothesized that “there is no per- at the same time. But attention is also severely limitedception without attention.” If you do not attend to an in its ability to process information that arrives in quickobject in some way, you may not perceive it at all. succession, even when that information is presented atThis suggestion contradicts the belief that objects are a single location in space.organized into elementary units automatically and prior Attentional blink is currently the most widely usedto attention being activated (e.g., Gestalt theory). If atten- method to study the availability of attention across time.tion is intentional, without objects rst being perceived Its name—“blink”—derives from the nding that whenthere is nothing to focus attention on. Mack and Rock’s two targets are presented in rapid succession, the secondexperiments suggest this may not be true. of the two targets cannot be detected or identi ed when More recent work by Simons and Chabris recreated a it appears within approximately 100–500 msec followingclassic study by Neisser to determine whether inatten- the rst target , .
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 14 In a typical experiment, visual items such as words visualization and in psychophysics. For example, exper-or pictures are shown in a rapid serial presentation at a iments conducted by the authors on perceiving orien-single location. Raymond et al.  asked participants tation led to a visualization technique for multivaluedto identify the only white letter ( rst target) in a 10-item scalar elds , and to a new theory on how targetsper second stream of black letters (distractors), then to are detected and localized in cognitive vision .report whether the letter “X” (second target) occurredin the subsequent letter stream. The second target was 5.1 Visual Attentionpresent in 50% of trials and, when shown, appearedat random intervals after the rst target ranging from Understanding visual attention is important, both in100–800 msec. Reports of both targets were required visualization and in graphics. The proper choice of visualafter the stimulus stream ended. The attentional blink features will draw the focus of attention to areas in ais de ned as having occurred when the rst target is visualization that contain important data, and correctlyreported correctly, but the second target is not. This weight the perceptual strength of a data element basedusually happens for temporal lags between targets of on the attribute values it encodes. Tracking attention can100–500 msec. Accuracy recovers to a normal baseline be used to predict where a viewer will look, allowinglevel at longer intervals. different parts of an image to be managed based on the Curiously, when the second target is presented im- amount of attention they are expected to receive.mediately following the rst target (i.e., with no delay Perceptual Salience. Building a visualization oftenbetween the two targets), reports of the second target are begins with a series of basic questions, “How should Iquite accurate . This suggests that attention operates represent the data? How can I highlight important dataover time like a window or gate, opening in response to values when they appear? How can I ensure that viewers nding a visual item that matches its current criterion perceive differences in the data accurately?” Results fromand then closing shortly thereafter to consolidate that research on visual attention can be used to assign visualitem as a distinct object. The attentional blink is therefore features to data values in ways that satisfy these needs.an index of the “dwell time” needed to consolidate A well known example of this approach is the designa rapidly presented visual item into visual short term of colormaps to visualize continuous scalar values. Thememory, making it available for conscious report . vision models agree that properties of color are preat- Change blindness, inattentional blindness, and atten- tentive. They do not, however, identify the amount oftional blink have important consequences for visualiza- color difference needed to produce distinguishable col-tion. Signi cant changes in the data may be missed ors. Follow-on studies have been conducted by visualiza-if attention is fully deployed or focused on a speci c tion researchers to measure this difference. For example,location in a visualization. Attending to data elements Ware ran experiments that asked a viewer to distinguishin one frame of an animation may render us temporarily individual colors and shapes formed by colors. He usedblind to what follows at that location. These issues must his results to build a colormap that spirals up the lumi-be considered during visualization design. nance axis, providing perceptual balance and controlling simultaneous contrast error . Healey conducted a visual search experiment to determine the number of col-5 V ISUALIZATION AND G RAPHICS ors a viewer can distinguish simultaneously. His resultsHow should researchers in visualization and graphics showed that viewers can rapidly choose between up tochoose between the different vision models? In psy- seven isoluminant colors . Kuhn et al. used resultschophysics, the models do not compete with one another. from color perception experiments to recolor images inRather, they build on top of one another to address ways that allow colorblind viewers to properly perceivecommon problems and new insights over time. The color differences . Other visual features have beenmodels differ in terms of why they were developed, and studied in a similar fashion, producing guidelines onin how they explain our eye’s response to visual stimulae. the use of texture—size, orientation, and regularity ,Yet, despite this diversity, the models usually agree on , —and motion— icker, direction, and velocitywhich visual features we can attend to. Given this, we , —for visualizing data.recommend considering the most recent models, since An alternative method for measuring image saliencethese are the most comprehensive. is Daly’s visible differences predictor, a more physically- A related question asks how well a model ts our based approach that uses light level, spatial frequency,needs. For example, the models identify numerous visual and signal content to de ne a viewer’s sensitivity atfeatures as preattentive, but they may not de ne the each image pixel . Although Daly used his metricdifference needed to produce distinguishable instances to compare images, it could also be applied to de neof a feature. Follow-on experiments are necessary to perceptual salience within an visualization.extend the ndings for visualization design. Another important issue, particularly for multivari- Finally, although vision models have proven to be ate visualization, is feature interference. One commonsurprisingly robust, their predictions can fail. Identifying approach visualizes each data attribute with a separatethese situations often leads to new research, both in visual feature. This raises the question, “Will the visual