Journal of Experimental Psychology: General Copyright 1998 by the American Psychological Association, Inc.1998. IfcL 127. No. 4, 398-415 0096-3445/9№ 00 Does Consistent Scene Context Facilitate Object Perception? Andrew Hollingworth and John M. Henderson Michigan State University The conclusion that scene knowledge interacts with object perception depends on evidence that object detection is facilitated by consistent scene context. Experiment 1 replicated the I. Biederman, R. J. Mezzanotte, and J. C. Rabinowitz (1982) object-detection paradigm. Detection performance was higher for semantically consistent versus inconsistent objects. However, when the paradigm was modified to control for response bias (Experiments 2 and 3) or when response bias was eliminated by means of a forced-choice procedure (Experiment 4), no such advantage obtained. When an additional source of biasing information was eliminated by presenting the object label after the scene (Experiments 3 and 4), there was either no effect of consistency (Experiment 4) or an inconsistent object advantage (Experiment 3). These results suggest that object perception is not facilitated by consistent scene context. To what degree is perception affected by our knowledge directly addresses the influence of our knowledge andof the world? This question has historically been central in beliefs about meaningful relationships in the world on ourtheories of perception and cognition. For example, the perception of the visual environment.apparent role of semantic constraint in visual recognition led For the purposes of this study, we define object identifica-to the emergence of so-called New Look psychology tion from the perspective of current computational theories(Bruner, 1957,1973). In cognitive psychology, the effects of (Biederman, 1987; Bulthoff, Edelman, & Tarr, 1995; Marr,contextual expectations on perception have been couched in 1982; Marr & Nishihara, 1978; Ullman, 1996). At a generalterms of debates between bottom-up versus top-down pat- level of description, these theories assume two processingtern recognition (Neisser, 1967) and modular versus interac- stages in object identification. First, the retinal image istive perception (e.g., Fodor, 1983; Pylyshyn, 1980; Rumel- transformed into a perceptual description that is compatiblehart, McClelland, & the PDF Research Group, 1986). More with a set of memory descriptions. This first stage can berecently, the effect of higher-level knowledge on perception further broken down into two sub-stages, an early stage ofhas become critical in discussions of the role of re-entrant visual analysis that translates the current pattern of retinalneural pathways in the early cortical processing of visual stimulation into perceptual primitives, and an additionalstimulation (Barlow, 1994; Churchland, Ramachandran, & stage that uses these primitives to produce descriptions ofSejnowski, 1994; Kosslyn, 1994; Mumford, 1994). In the the object tokens in the scene. Second, object descriptionspresent study, we explored the following version of the are matched to stored long-term memory descriptions ofcontext question: How is the identification of a visual object object types, leading to entry-level recognition. When aaffected by the meaning of the real-world scene in which that match is found, identification has occurred, and informationobject appears? This question is important because it stored hi memory about that object type, such as its identity, whether it is good to eat, and so on becomes available. In the Andrew Hollingworth submitted this work as part of the following experiments, we will consider the activation of anrequirements for his master of arts degree in psychology at entry-level label for an object stimulus as evidence of theMichigan State University. This research was supported by a successful completion of the matching stage of objectNational Science Foundation Graduate Fellowship to Andrew identification.Hollingworth, by U.S. Army Research Office Grant DAAH04-94- The hypothesis we set out to test was that the identifica-G-0404, and by National Science Foundation Grant SBR 96- tion of a real-world object is facilitated when that object is17274. We are solely responsible for the contents of this article, semantically consistent rather than inconsistent with thewhich should not be construed as an official U.S. Department of theArmy position, policy, or decision. scene in which it appears (Biederman, 1981; Biederman, We would like to thank the masters committee of Tom Carr, Mezzanotte, & Rabinowitz, 1982; Boyce & Pollatsek, 1992;Fernanda Ferreira, and Rose Zacks for their helpful discussions of Boyce, Pollatsek, & Rayner, 1989; Friedman, 1979; Koss-the research and for their comments on a draft of this article. We lyn, 1994; Metzger & Antes, 1983; Palmer, 1975a; Ullman,would also like to thank Peter De Graef for his discussions of the 1996; see Henderson & Hollingworth, in press, for aresearch, Sandy Pollatsek and Johan Lauwereyns for their com- review). The strongest evidence supporting the view thatments on a draft of the article, and Gary Schrock for his technical consistent scene context facilitates object identificationassistance. comes from the object-detection paradigm introduced by Correspondence concerning this article should be addressed to Biederman and colleagues (Biederman, 1981; Biederman etAndrew Hollingworth or John M. Henderson, Department ofPsychology, 129 Psychology Research Building, Michigan State al., 1982). In this paradigm, participants were asked toUniversity, East Lansing, Michigan 48824—1117. Electronic mail determine whether & pre-specified object appeared within amay be sent to firstname.lastname@example.org email@example.com. briefly presented scene at a particular location. During each 398
OBJECT PERCEPTION IN SCENES 399trial, a label naming an object was presented until the on the finding that detection sensitivity (d1) was higher hiparticipant was ready to continue, followed by a line- base conditions than in violation conditions. This sensitivitydrawing of a natural scene presented for 150 ms, followed result has been replicated by Boyce et al. (1989) using aby a pattern mask with an embedded location cue. The similar object-detection paradigm.pattern mask remained on the screen until the participant The results of the Biederman et al. (1982) study have ledpressed one of two buttons to indicate whether the object to two general conclusions about the perception of objects innamed by the label had or had not appeared in the scene at natural scenes. First, scene meaning, including informationthe cued location.1 In target-present trials, the target label about the semantic relationship between scene and objectnamed the cued object. In catch trials, the target label named types, can be accessed very quickly. Such early activation isan object that did not appear in the scene. We will refer to a necessary condition for context effects, as contextualthis paradigm as the original object-detection paradigm. constraints must be active early enough to influence the The primary contextual manipulation in the original perception of object stimuli. This conclusion is supported byobject-detection paradigm was the relationship between the a number of studies demonstrating that the informationobject presented at the cued location (the cued object) and necessary to identify natural scenes can be obtained in lessthe scene in which that object appeared. Base scenes than 150 ms (Antes, Penland, & Metzger, 1981; Biederman, 1972; Biederman, Glass, & Stacy, 1973; Loftus, Nelson, &contained a cued object that was consistent with the scene, ICallman, 1983; Potter, 1976; Schyns & Oliva, 1994).and the scene contained no other objects that violated scene Second, stored knowledge about scenes and the objectscontext. Violation scenes contained a cued object that was likely to appear in them can be used to facilitate theinconsistent with the scene along one or more dimensions, construction of perceptual descriptions of consistent objects.including episodic probability, position, size, support, and Taken together, these two hypotheses combine to form theinterposition (whether the object occluded objects behind it perceptual schema model of scene context effects (Bieder-or was transparent). For the purposes of this article, we will man, 1981; Biederman et al., 1982; Palmer, 1975b; seefocus on cases of probability violation (i.e., the semantic Henderson, 1992, for further discussion). The perceptualconsistency between object and scene). The most conserva- schema model proposes that the stored representation of ative hypothesis regarding the information contained in a scene type contains information about the objects that formscene concept is that it specifies the object types typically that type. The early activation of this information can befound in the scene (Mandler & Johnson, 1976; Mandler & used to facilitate the perceptual analysis (e.g., the encodingParker, 1976). Therefore, manipulations of object probabil- of features or generation of a perceptual description) ofity provide the most direct means to investigate the influence objects that are consistent with the semantic constraintsof scene meaning on object perception. imposed by the scene. Biederman et al. (1982) found that detection performance In addition to the perceptual schema model, two otherwas best when the cued object did not violate any of the models of scene and object processing can account for the constraints imposed by scene meaning. They reported poorer facilitated detection of consistent objects in scenes. First, aperformance across all violation dimensions, with com- priming model of scene context effects (Friedman, 1979;pound violations (e.g., probability and support) producing Kosslyn, 1994; Palmer, 1975a) places the locus of contex-even greater decrements. Importantly, violations of semantic tual influence at the matching stage of object identification,relationships were found to be as disruptive as violations of when the perceptual description of an object token is structural relationships (e.g., support or interposition), sug- compared to long-term memory representations of object gesting that semantic relationships can be accessed very types. According to the priming model, the recognition of a rapidly and can then interact with the initial perceptual scene serves to prime the stored representations of object analysis of an object. The poorer performance in violation types consistent with that scene (i.e., the activation levels of conditions held across percentage correct performance, stored, consistent object representations are raised closer to sensitivity (d1), and reaction time. These measures, however, a threshold value). As a result, relatively less perceptual are not equally valid for investigating the influence of scene information needs to be encoded to bring a stored, consistent context on object perception. Reaction time in that study cannot be taken as strong evidence for the facilitated detection of consistent objects, as there were a significant 1 The following terminology has been used. The object named bynumber of errors, causing more than 30% of the trials to be the label appearing before the scene has been referred to as theexcluded from the analysis. In general, it is not clear that target object, the label itself as the target label, and the objectreaction time is a good measure of object perception presented at the cued location as the cued object. 2processes, as it may be influenced by post-identification Boyce and PoIIatsek (1992) used a paradigm in which, after a factors, such as response generation (Henderson, 1992).2 In scene had appeared on the screen, a single object wiggled, and participants were required to make an eye movement to the object addition, percentage correct performance in target-present and name it as quickly as possible. They found shorter naming trials does not provide a reliable measure of object-detection latencies for consistent versus inconsistent objects, which theyperformance, as participants demonstrated significant re- interpreted as support for the view that consistent scene context sponse biases that varied with the consistency of the target facilitates object perception. As with the Biederman et al. (1982) label with the scene. Thus, the conclusion that consistent experiment, however, it is not clear that naming latency is an scene context facilitates object perception rests most firmly appropriate measure of ease of object identification.
400 HOLLINGWORTH AND HENDERSONobject representation to the threshold value indicating that a Table 1match has been found.3 Summary of the Target-Present and Catch Trial Design in Second, facilitated detection of consistent objects in Biederman et al. (1982, Experiment 1) and in Experimentsscenes can be accounted for by an interactive activation 1—3 for a Sample Trial Presenting a Farmyard Scenemodel similar to that proposed by McClelland and Rumel- Semantic consistency manipulationhart (1981) for word and letter recognition (Boyce &Pollatsek, 1992; Boyce etal., 1989; Metzger& Antes, 1983). Inconsistent Consistent (probabilityIn this model, scenes correspond to the word level in the violation) Trial (base)network and objects to the letter level. These two levelsmutually constrain each other, facilitating the perception of Biederman et al. (1982)objects consistent with scene meaning and inhibiting the Target-presentperception of inconsistent objects. In addition, partial activa- Cued object chicken mixertion at the object level could act to constrain the encoding of Target label "chicken" "mixer" Catchperceptual features consistent with that object type, though Cued object Pig mixerno interactive activation model of scene context effects has, Target label 70% consistent 70% consistentas yet, specifically included that level of interaction. ("horse"), 30% ("horse"), 30% Each of these models of the influence of scene knowledge inconsistent inconsistenton object perception predicts that perception of an object ("television") ("television")should be facilitated when that object is consistent with the Experiment 1scene in which it appears. We will therefore refer to them as Target-presentcontextual facilitation models of object perception in scenes. Cued object chicken mixer Target label "chicken" "mixer" Concerns With the Object-Detection Paradigm Catch Cued object chicken mixer Support for contextual facilitation models rests almost Target label 50% consistent 50% consistent ("horse"), 50% ("horse"), 50%entirely on results from object-detection experiments that inconsistent inconsistentshow detection benefits for consistent objects versus incon- ("television") ("television")sistent objects under brief presentation conditions (Bieder- Experiments 2 and 3man et al., 1982; Boyce et al., 1989). However, a number ofgeneral concerns have been raised regarding the original Target-presentobject-detection paradigm (De Graef, Christiaens, & Cued object chicken mixerdYdewalle, 1990; De Graef & dYdewalle, 1995; Hender- Target label "chicken" "mixer" Catchson, 1992). These concerns revolve around two central Cued object mixer chickenissues. First, the original object-detection paradigm may not "mixer" Target label "chicken"have adequately controlled participant response bias. Sec-ond, the presentation of the target label prior to the scene andthe location cue following the scene may have providedadditional sources of information that influenced detection should lead to higher false-alarm rates in consistent targetperformance. label catch trials and lower false-alarm rates in inconsistent target label catch trials. Biederman et al. found precisely thisCatch Trial Design and the Calculation of Sensitivity effect: False-alarm rates were higher when the target label In the original object-detection paradigm (Biederman etal., 1982), detection sensitivity (d) was calculated using the 3 Friedman (1979) recorded eye movements while participantspercentage correct rate in target-present trials (the hit rate) viewed scenes in preparation for a difficult memory test. Theand the error rate in catch trials (the false-alarm rate). Table duration of the first fixation (i.e., the total duration the eyes were1 summarizes the design of the target-present and catch trials fixated on the object the first time it was entered, now referred to asfor Biederman et al. (1982, Experiment 1). One concern with first pass gaze duration) was shorter for consistent versus inconsis-this design is that the method of calculating detection tent objects, a result that Friedman (1979) interpreted as support forsensitivity did not adequately control for participants bias to a priming model of scene context effects. The difference in fixationrespond "yes" more often when the catch trial label was duration was more than 300 ms, however, which is unlikely to besemantically consistent versus semantically inconsistent explained by perceptual factors alone (Biederman et al., 1982; Henderson, 1992). For example, participants may have lookedwith the subsequent scene. If participants attempt to detect longer at a semantically inconsistent object to integrate the alreadyan absent horse in a farmyard, for example, they should be identified object into a conceptual representation in which it wasbiased to respond "yes," as a horse is generally likely to incongruous. In addition, the instructions to prepare for a difficultappear there. If participants attempt to detect an absent memory test may have caused participants to consciously create antelevision in the farmyard, however, they should show less association between the inconsistent object and the scene, leadingbias to respond "yes," as there is contextual information to longer fixation durations (Biederman et al., 1982; Henderson,arguing against a televisions presence. This pattern of bias 1992).
OBJECT PERCEPTION IN SCENES 401was semantically consistent versus inconsistent with the 1989) have not met this criterion (see De Graef & dYdewalle,subsequent scene. 1995), and thus the sensitivity measures in those experi- To eliminate this sort of bias from measures of object- ments must be interpreted with caution.detection performance, detection sensitivity in the basecondition should be calculated using the hit rate in trials Target Label Previewwhen the target label was consistent with the scene and thefalse-alarm rate in trials when the target label was also The second concern with the original object-detectionconsistent with the scene. Similarly, detection sensitivity for paradigm involves the presentation of the target label beforetarget objects in the probability violation condition should scene viewing. There are two potential problems with such abe calculated using the hit rate in trials when the target label design. First, participants may have used the identity of thewas inconsistent with the scene and the false-alarm rate in target object to guide their search in the subsequent scenetrials when the target label was also inconsistent with the (Henderson, 1992). This strategy could have lead to ascene. The Biederman et al. (1982) study, however, did not consistent object-detection advantage if the spatial positionscalculate sensitivity in this manner: The false-alarm rates in of consistent objects were more predictable than the posi-both base and violation conditions averaged across catch tions of inconsistent objects. For example, participants maytrials on which the target label was consistent and inconsis- have known where to find a horse in a farmyard but wouldtent with the subsequent scene (see Table 1). Thus, differ- not necessarily have known where to find a television in aences in response bias as a function of target label consis- farmyard. Supporting this intuition, we have recently demon-tency were not necessarily controlled in the d measure. strated that viewers can more quickly locate semantically It is important to note that this method of computing d consistent versus inconsistent objects in a free-viewing,may have overestimated the sensitivity rate in base condi- visual search task (Henderson, Weeks, & Hollingworth, intions and underestimated the sensitivity rate in violation press). The information provided by the target label previewconditions. As discussed previously, the false-alarm rate was may have been particularly helpful in the Boyce et al. (1989) experiments. The scenes used in that study contained onlyhigher when the catch trial label was consistent versus five discrete objects presented against a very simple back-inconsistent with the scene. By averaging across these two ground. Thus, participants had to search through only acatch trial conditions for the purpose of calculating d, small number of objects during scene presentation. If thesensitivity in the base condition may have been artificially positions of consistent objects were more predictable thanraised because the averaged false-alarm rate was lower than the positions of inconsistent objects, an advantage for thethe false-alarm rate for consistent target label catch trials detection of consistent objects would be expected.alone. Similarly, sensitivity in the probability violation Second, the preview of the target label (e.g., "mixer")condition may have been artificially lowered because the may have led participants to expect the subsequent presenta-averaged false-alarm rate was higher than the false-alarm tion of a certain scene type (in this case, a kitchen). Therate for inconsistent target label catch trials alone. Thus, theBiederman et al. (1982) method of calculating d very likely generation of such expectations would have been particu- larly likely in the Biederman et al. (1982) study, becauseproduced an exaggerated sensitivity advantage for thedetection of consistent objects.4 70% of the target labels were consistent with the subsequent scene. When the target label specified an object inconsistent A second concern with the design of the original object-detection paradigm is that participants did not attempt to with the subsequent scene, however, the discontinuity between the expected and presented scene may have inter-detect the same object in the catch trials as in the correspond-ing target-present trials. Thus, detection sensitivity for a fered with perceptual processing, to the detriment of incon- sistent object detection.particular object was based on the correct detection of thatobject in the scene and the false detection of an entirelydifferent object that was not in the scene. Signal detection Location Cuetheory, however, requires that sensitivity measures be calcu- The final concern regarding the original object-detectionlated using the correct detection of a particular signal when it paradigm is that the location cue may have providedis present and the false detection of the same signal when it information useful to post-perceptual guessing. The positionis not present (Green & Swets, 1966). For example, suppose of the cue relative to the scene could have provided evidencethat the consistent cued object appearing in a kitchen scene concerning the types of objects likely to be found at thatwere a stove, and the consistent target label in the catch trial position (Henderson, 1992). A cue marking a position highwere "bread box." The hit rate would reflect the correct in a living room scene, for example, would constrain thedetection of a stove. The false-alarm rate, however, would objects that could have appeared there to clocks, pictures,reflect the false detection of a bread box. Because a bread curtains, etc. Such information would be useful whenbox is less likely to appear in a kitchen scene than a stove, attempting to detect a consistent object but not as usefulthe false-alarm rate would be artificially low, and theresulting sensitivity estimate would be artificially high. Thisexample demonstrates the importance of requiring partici- 4 It is important to note that the Boyce et al. (1989) replication ofpants to detect the same object on corresponding target- this paradigm is not subject to this criticism, as the consistency ofpresent and catch trials. The object-detection experiments the catch trial labels was controlled. However, the Boyce et al.conducted to date (Biederman et al., 1982; Boyce et al., study is subject to the remaining criticisms.
402 HOLLINGWORTH AND HENDERSONwhen attempting to detect an inconsistent object, be-cause the spatial position of an inconsistent object is lesspredictable. response The Present Study Results from object-detection experiments are the pri-mary support for the widely held view that consistent scenecontext facilitates object identification. It is therefore impor-tant to assess the previous concerns experimentally. To dothis, we conducted four experiments. Experiment 1 at-tempted to replicate the original Biederman et al. (1982)paradigm. Experiment 2 tested whether the consistent objectadvantage in the original object-detection paradigm was dueto the inadequate control of response bias. The originalparadigm was modified so that participants attempted todetect the same object on corresponding target-present andcatch trials. Experiment 3 investigated the influence ofpresenting the target label before the scene by manipulatingwhether the label appeared before or after the scene. Inaddition, the location cue was eliminated from the thirdexperiment to investigate whether its presence may haveaffected performance. Experiment 4 employed a forced-choice procedure to investigate the influence of scenecontext on object perception independently of response bias. Figure I. Schematic illustration of a trial in Experiment 1.In all four experiments, contextual facilitation modelspredict better detection of objects that are consistent with ascene versus objects that are inconsistent, because they cued object. In the catch trials, the target label named anpropose that stored scene knowledge of the types of objects object that did not appear in the scene. As in the originallikely to be found in a scene facilitates the identification of object-detection paradigm, there was no relationship be-those objects. tween the semantic consistency of the target label on 5 We made a number of modifications to the original Biederman Experiment 1 et al. (1982, Experiment 1) paradigm. First, we limited the cued object consistency manipulations to base (semantically consistent) The purpose of Experiment 1 was to replicate the original and probability violation (semantically inconsistent). Second, weBiederman et al. (1982) paradigm. We felt that replication did not include a bystander condition. Third, we used a paired-was necessary as a baseline against which to compare the scene design to control for scene-specific factors such as cuedresults from subsequent experiments in which we modified object distance from fixation and lateral masking. Fourth, wethe original paradigm. Figure 1 illustrates the main aspects presented the target label for 1500 ms rather than for a participant-of the paradigm. The basic design was the same as that of determined duration. Fifth, we used a scene presentation durationBiederman et al. (1982, Experiment I).5 A target label was of 200 ms rather than 150 ms. The 200 ms presentation durationpresented for 1500 ms, followed by a line drawing of a real- prevents eye movements during scene presentation, limiting view- ing to a single glance of the scene, but reduces the possibility ofworld scene for 200 ms, followed by a pattern mask with an floor effects. Sixth, in the Biederman et al. experiment, 30% of theembedded location cue. The participants task was to catch trial target labels were inconsistent with the subsequentdetermine whether the object named by the target label had scene, because 30% of the labels in the target-present trials wereor had not appeared in the scene at the cued location. inconsistent. In Experiment 1, 50% of the catch trial target labels The key manipulation in Experiment 1 was the semantic were inconsistent with the subsequent scene, because 50% of theconsistency between the cued object and me scene in which labels in the target-present trials were inconsistent. Finally, theit appeared. Semantically consistent cued objects were likely probability violation catch trials in the Biederman et al. paradigmto appear in the scene (e.g., a chicken in a farmyard); did not have the same design as the catch trials in the basesemantically inconsistent cued objects were unlikely to condition. For each scene in the probability violation condition, theappear in the scene (e.g., a mixer in a farmyard). In half of same object was cued in the catch trials as in the target-present trials. For each scene in the base condition, however, a differentthe trials, the target label named a semantically consistent object was cued in the catch trials than in the target-present trialsobject, and in the other half the target label named a (see Table 1). Thus, we chose to make the two consistencysemantically inconsistent object. Half of the trials presented conditions in Experiment 1 equivalent by always cueing the samea scene that contained the target object (target-present trials), object in target-present and catch trials for each scene in eachand half of the trials presented a scene that did not (catch consistency condition. This cueing manipulation is the same as thattrials). In the target-present trials, the target label named the employed by Boyce et al. (1989).
OBJECT PERCEPTION IN SCENES 403corresponding target-present and catch trials: For each cuedobject consistency condition, half of the catch trials pre-sented a consistent target label and half an inconsistent targetlabel. Note that, as in the original object-detection paradigm,the catch trial semantic consistency manipulation is dennedby the cued object appearing in the scene and not by theconsistency of the object the participant is attempting todetect. Table 1 summarizes the design of the target-presentand catch trials for Experiment 1.Method Participants. Twenty-four Michigan State University under-graduate students participated in the experiment for course credit.All participants had normal or corrected-to-normal vision. Theparticipants were naive with respect to the hypotheses underinvestigation. Stimuli. Twenty scenes and 20 cued objects were used asstimuli. The stimuli were generated from photographs of naturalscenes. Fourteen scenes were generated from those used by vanDiepen and De Graef (1994), and the other 6 scenes were generatedfrom photographs taken in the East Lansing, Michigan, area. Inboth cases, the main contours of the scenes were traced usingcommercial software to create gray-scale line drawings. Theimages generated from the two sources were not distinguishable.Semantically consistent cued objects for each scene were alsocreated by digitally tracing scanned images. These objects werecreated separately from the scenes. The 20 scenes were paired, andthe semantically inconsistent conditions were created by swappingobjects across scenes. For example, a mixer was the semanticallyconsistent cued object in a kitchen scene, and a live chicken was theconsistent cued object in a farmyard scene. These objects wereswapped across scenes so that the mixer was the semantically Figure 2. An example of the type of scene used and the cuedinconsistent object in the farmyard scene, and the chicken was the object semantic consistency manipulation. The top scene contains ainconsistent object in the kitchen scene. Figure 2 shows an example semantically consistent cued object (chicken), and the bottomof a stimulus scene and the semantic consistency manipulation. scene contains a semantically inconsistent cued object (mixer). Because each scene was to be presented a total of eight times This farmyard scene was paired with a kitchen scene in which theduring the experiment, two cued object positions were chosen mixer was consistent and the chicken inconsistent.within each scene to minimize participants ability to predict theobjects location. Each position was chosen as a place within thescene where the consistent cued object might reasonably appear.Both the consistent and inconsistent cued objects appeared in the The pattern mask presented after the scene consisted of overlap-same two positions. In a number of scenes, the two object positions ping line segments, curves, and angles, and was slightly larger thanrequired using two different sizes of each object (e.g., a fire hydrant the scene stimuli. The scenes were completely obliterated whenplaced in two positions on a receding sidewalk). In such a case, the presented simultaneously with the pattern mask. The location cuepaired scene also employed two different sizes of each object. The appearing within the pattern mask was a thick circle containing apercentage change in the size of each object was equated and was dot, subtending 1.4 degrees.the same in both of the paired scenes. As a consequence of this Apparatus. The stimuli were displayed on a flat-screen SVGApaired-scene design, each scene served as a control for its partner, monitor with a 100-Hz refresh rate. Responses were collected withreducing the influence of such factors as object size, eccentricity, a button box connected to a dedicated input-output (I-O) board.and lateral masking. Depression of a button stopped a millisecond clock on the I-O All scene and object manipulations were conducted using board. The display and I-O systems were interfaced with acommercially available software. The scenes subtended a visual 486-based microcomputer that controlled the experiment.angle of 23 degrees (width) by 15 degrees (height) at a viewing Procedure. Participants were tested individually. The experi-distance of 64 cm. Cued objects subtended about 2.75 degrees on menter first explained that the task would be to determine whetheraverage (range = 1.25 to 4.92 degrees). All images were displayed the object named by a label was present at a marked location in aas gray-scale contours on a white background at a resolution of briefly displayed scene. The participant was then seated in front of800 X 600 pixels X 16 levels of gray. Gray-scale was used for a computer monitor, with one hand resting on a button labeledanti-aliasing so that the contours appeared smooth and sharp. "yes" and the other on a button labeled "no." Viewing distance Target labels were created using lower-case, 24-point, anti- was maintained by a forehead rest.aliased Arial font. Labels for target-present trials named the cued During each experimental trial, participants saw a fixation crossobject. For catch trials, one consistent and one inconsistent label and a prompt instructing them to press a pacing button to begin thewere chosen for each scene. Each of these labels named an object trial. Once the participant pressed the button, the fixation crossthat never appeared in the scene. remained on the screen for an additional 500 ms, followed by a
404 HOLLINGWORTH AND HENDERSONblank screen containing a target label for 1500 ms, followed by the target object presence, F(l, 23) = 19.42, MSE = .0149,p <presentation of the scene for 200 ms, followed by a pattern mask .001. Simple effects tests indicated that the hit rate for scenescontaining an embedded location cue. There was no delay (i.e., the containing a consistent cued object (76.6%) was higher thaninter-stimulus interval was zero) between each display. The pattern that for scenes containing an inconsistent cued objectmask remained in view until the participant pressed the left (yes) (59.2%), F(l, 23) = 31.71, MSE = .0229, p < .001. Thisbutton to indicate that the object named by the target label had pattern was not present in the catch trials; the correctappeared in the scene at the cued location or the right (no) button toindicate that the target object had not appeared at the cued location. rejection rate was the same when the cued object wasAfter the response, there was a 4-s delay while the stimuli for the consistent (80.4%) versus inconsistent (78.5%), F(l, 23) =next trial were loaded into video memory, and then the prompt for 1.38, MSE = .0061,p > .25. To summarize, although the hitthe next trial appeared. rate was higher for consistent versus inconsistent cued Participants took part in a practice block of 16 trials (2 cued objects, the false-alarm rates were not different, 19.6% andobject consistency conditions X 2 target object presence conditions 21.5% respectively.x 2 cued object positions x 2 scenes). The two scenes used in the A analysis. Mean A for each cued object consistencypractice block were not used in the experimental trials. After the condition is shown in Table 2. Detection sensitivity waspractice trials, the experimenter answered any questions the reliably higher in scenes containing a consistent cued objectparticipant had about the procedure, and the participant proceeded (.861) than in scenes containing an inconsistent cued objectto the experimental trials. (.775), F(l, 23) = 20.50, MSE = .0043,p < .001. Each participant saw 160 experimental trials that were producedby a within-participant factorial combination of 2 cued objectconsistency conditions x 2 target object presence conditions X 2 Discussioncued object positions X 20 scenes. The position of the cued objectand the consistency of the catch trial label were counterbalanced Experiment 1 replicated the principal result of the originalbetween a pair of scenes and completely counterbalanced as a object-detection paradigm. We found that object-detectionbetween-participants factor. Because the cued object position performance was higher for scenes containing a consistentfactor was not of theoretical interest, the two levels of that factor cued object than for scenes containing an inconsistent cuedwere combined in the statistical analyses. Each participant saw all object. This experiment demonstrates that our scenes and160 trials in a different random order. The entire session lasted semantic consistency manipulation are sufficient to replicateapproximately 45 min. the basic consistent object contextual facilitation effect. The results of this experiment provide a baseline that can be used to assess the results of Experiments 2-4, in which theResults original paradigm will be modified to address the concerns specified in the Introduction. In the following analyses, two measures were used to In Biederman et al. (1982), false-alarm rates were higherassess object-detection performance. First, we report percent- when the catch trial label was consistent versus inconsistentage correct hits for the target-present trials and percentage with the scene. To investigate whether such biases werecorrect rejections for the catch trials. Second, because present in the current experiment, we conducted a fullreliable response biases were present, we report A, a analysis of the Experiment 1 catch trial data. The twononparametric measure of sensitivity (Grier, 1971). A can within-participants variables of cued object semantic consis-be interpreted as equivalent to percentage correct in a tency and target label semantic consistency were enteredforced-choice procedure. A was computed using the rate of into an ANOVA. There was a reliable effect of target labelcorrect responses in target-present trials (the hit rate) and the semantic consistency, F(l, 23) = 4.97, MSE = .0303, p <rate of errors in catch trials (the false-alarm rate). For the .05, with a higher correct rejection rate for inconsistentpurpose of replicating the Biederraan et al. (1982) paradigm, target labels (83.4%) than consistent target labels (75.5%).we did not separate the catch trial data as a function of the As found in the Biederman et al. study, participants weresemantic consistency of the target label. Thus, the A results more likely to falsely respond that an absent consistent targetreported in this section are based on the mean false-alarm object was present than to respond that an absent inconsis-rate in each cued object consistency condition, averaging tent target object was present. This response bias was likelyacross catch trials in which the target label was consistent caused by participants adopting a higher standard of evi-and inconsistent with the subsequent scene. dence to accept that an inconsistent object was present in the Percentage correct analysis. Mean percentage correct scene than that a consistent object was present.as a function of cued object consistency and target objectpresence is presented in Table 2. First, there was a reliablemain effect of target object presence, F(l, 23) = 12.79,MSB = .0506, p < .005. Participants responded correctly Table 267.9% of the time when the target object was present and Mean Percentage Hits, Percentage Correct Rejections79.5% of the time when the target object was absent. There (Percentage False Alarms), and A for Experiment 1was also a main effect of cued object consistency, F(l, 23) = Cued object % correct rejections31.57, MSE = .0141, p < .001, with better performance consistency %hits (% false alarms) Awhen the cued object was consistent (78.5%) than when it Consistent 76.6 80.4 (19.6) .861was inconsistent (68.9%) with the scene. Finally, there was a Inconsistent 59.2 78.5(21.5) .775reliable interaction between cued object consistency and
OBJECT PERCEPTION IN SCENES 405 As reported previously, the effect of cued object semantic design controls for the potential bias for participants toconsistency on catch trial performance did not approach respond "yes" more often when the catch trial target label isreliability, F(l, 23) = 1.38, MSE = .0061, p > .25, nor was consistent versus inconsistent with the scene.the interaction between cued object consistency and target To control for the general complexity of the scene inlabel consistency reliable, F < 1. The absence of a cued target-present and catch trials, the catch trial scenes con-object consistency effect on catch trial performance is tained the cued object from the paired scene at the cuedintriguing. According to contextual facilitation models, location. For example, if a participant viewed the labelconsistent cued objects should be easier to identify than "chicken" in a catch trial, the subsequent scene wouldinconsistent cued objects. Thus, contextual facilitation mod- contain the paired cued object (a mixer). Thus, in the catchels predict lower false-alarm rates when the cued object is trials, the semantic consistency manipulation was basedconsistent with the scene, because participants will be better on the relationship between the target label and the scene rather than the relationship between the cued object and theable to determine that the cued object does not match thetarget label. No such effect of cued object consistency wasfound. It is important to note that a similar lack of an effectof cued object consistency on catch trial performance was Methodreported by Biederman et al. (1982, Experiment 1). In thatexperiment, false-alarm rates were 16% in the base condi- Participants. Twenty-four Michigan State University under-tion and 15% in the probability violation condition. graduate students participated in the experiment for course credit. In summary, analysis of the catch trial data revealed that All participants had normal or corrected-to-normal vision. The participants were naive with respect to the hypotheses underthe semantic consistency of the target label reliably influ- investigation. None had participated in Experiment 1.enced performance, but the semantic consistency of the cued Stimuli. The stimuli were the same as in Experiment 1 with theobject had little influence on performance. Because the following modifications. First, for each scene, target labels in catchoriginal object-detection paradigm (and our replication of trials named the same object as in corresponding target-presentthat paradigm in Experiment 1) averaged across consistent trials. Second, the catch trial scenes contained the paired cuedand inconsistent target label catch trials when calculating object (e.g., a mixer when the target object was a chicken).false-alarm rates, it is likely that these experiments underes- Apparatus and procedure. The apparatus and procedure weretimated the false-alarm rate for base conditions and overesti- the same as in Experiment 1. Each participant saw 160 experimen- tal trials that were produced by a within-paiticipam factorialmated the false-alarm rate for violation conditions. This combination of 2 target label semantic consistency conditions X 2would result in artificially high sensitivity estimates in base target object presence conditions x 2 cued object positions x 20conditions and artificially low sensitivity estimates in scenes. Because the cued object position factor was not ofviolation conditions. Overall, these data suggest that theoretical interest, the two levels of that factor were combined inthe consistent object facilitation effect observed in prior the statistical analyses. Each participant saw all 160 trials in aobject-detection experiments may have been due, at least in different random order. The entire session lasted approximatelypart, to the fact that sensitivity measures did not control for 45min.response biases induced by the consistency of the targetlabel. Results Percentage correct analysis. Mean percentage correct Experiment 2 performance as a function of target label semantic consis- tency and target object presence is shown in Table 3. First, In Experiment 2 we modified the original object-detection there was a reliable main effect of target object presence,paradigm to address our concerns about catch trial design F(l, 23) = 8.61, MSE = .0739, p < .01. Participantsand the method of calculating sensitivity. In this experiment, responded correctly 65.6% of the time when the target objectthe target label in a catch trial named the same object as in a was present and 77.1% of the time when the target objectcorresponding target-present trial (see Table 1). This design was absent. There was no main effect of target labelhas two advantages over the original paradigm. First, semantic consistency, F < , with 71.9% correct perfor-measures of sensitivity in this experiment were based on the mance when the target label was consistent with the scenecorrect detection of a particular signal when it was present and 70.9% when the target label was inconsistent. Thereand the false detection of the same signal when it was not was, however, a reliable interaction between target labelpresent. Second, the semantic consistency of the target label semantic consistency and target object presence, F(l, 23) =with the scene on corresponding target-present and catch 65.85, MSE = .0269, p < .001. The hit rate in the consistenttrials was equivalent, because both labels named the same target label condition (75.7%) was higher than that in theobject. As a result, false-alarm rates in the semantically inconsistent target label condition (55.5%), but the reverseconsistent condition were based entirely on the false detec- pattern obtained in the catch trials, with a higher correcttion of consistent target objects, and false-alarm rates in the rejection rate in the inconsistent target label conditionsemantically inconsistent condition were based entirely on (86.3%) than in the consistent target label condition (68.0%).the false detection of inconsistent target objects. In contrast In other words, both the hit and false-alarm rates were higherto the original object-detection paradigm, this catch trial for consistent versus inconsistent target object-detection,
406 HOLLINGWORTH AND HENDERSONsuggesting that participants were more biased to respond suggest that the semantic consistency of the cued object has"yes" when attempting to detect a consistent target object. little if any effect on performance in catch trials, becauseGiven this pattern, sensitivity measures provide a better false-alarm rates did not differ as a function of cued objectindication of detection accuracy than percentage correct consistency.performance. A second explanation for the difference in results between A analysis. Mean A for each target object consistency Experiments 1 and 2 is that the facilitated detection ofcondition is presented in Table 3. No effect of target label consistent objects in Experiment 1 and in the originalsemantic consistency was obtained, F < 1. Participants were object-detection paradigm was an artifact produced by theequally accurate at detecting semantically consistent target method of calculating sensitivity. In these experiments,objects (.803) as semantically inconsistent target objects measures of sensitivity did not control response biases(.810). caused by the semantic consistency of the target label, as those paradigms averaged across catch trials on which the target object was consistent and inconsistent with the scene.Discussion In Experiment 2, when the catch trials were modified so that In Experiment 2, the catch trial design of the original response biases were controlled in A, no such consistentobject-detection paradigm was modified so that participants object advantage was obtained. This suggests that resultsattempted to detect the same target object on corresponding from the original object-detection paradigm reflected re-target-present and catch trials. The main result was that no sponse bias rather than the influence of scene context onadvantage was found for the detection of semantically object perception, and thus cannot be taken as strongconsistent versus inconsistent target objects. In addition, evidence for contextual facilitation of consistent objectthere were reliable response biases caused by target label perception.semantic consistency. As in Experiment 1, participants were A final explanation for the absence of a consistent objectbiased to respond "yes" more often when the catch trial facilitation effect in Experiment 2 hinges on the use of alabel was consistent versus inconsistent with the scene. This target label preview. The priming model of scene contextbias suggests that information sufficient to access scene effects proposes that the source of contextual facilitationmeaning was available within the 200 ms scene presentation effects is the spread of activation from the activated represen-duration. tation of a scene to stored descriptions of object types likely Why might the consistent object contextual facilitation to be found in the scene (Friedman, 1979; Kosslyn, 1994;effect, present in Experiment 1, be absent in Experiment 2? Palmer, 1975a). In Experiment 2, the presentation of theOne potential explanation is that, in the catch trials, the target label prior to scene viewing may have served to primepresence of the paired cued object at the cued location may the stored description of the target object, regardless of itshave biased performance. For example, in an inconsistent semantic consistency. Such priming could mask potentialcondition catch trial, the target label was "chicken"; this influences of scene context. In Experiment 3, the target labellabel was followed by a kitchen scene containing the paired was presented either before or after the scene. If the primingcued object (a mixer). If consistent objects are easier to model is correct, contextual facilitation of consistent objectdetect than inconsistent objects, as suggested by contextual detection should be observed when the target label isfacilitation models, false-alarm rates may have been artifi- presented after the scene, but not necessarily when the targetcially low when the target label was inconsistent with the label is presented before the scene.scene, because on those trials, the cued object was alwaysconsistent with the scene. This lower false-alarm rate in theinconsistent condition might have masked consistent object Experiment 3facilitation in our sensitivity measure. We have no direct Experiment 3 sought to provide further evidence concern-way to determine whether cued object semantic consistency ing the influence of scene context on object perception. Ininfluenced performance in this experiment, as that factor this experiment, the target label was presented either beforewas confounded with the semantic consistency of the target or after scene presentation. In addition, the location cue waslabel. The results from Experiment 1, as well as those eliminated. Otherwise, Experiment 3 was identical to Experi-reported by Biederman et al. (1982), however, provide no ment 2. The presentation of the target label after the scenesupport for this hypothesis. Both experiments strongly further refines the object-detection paradigm by eliminating two potential problems. First, participants may have used the target label preview to constrain the spatial extent of then-Table 3 search in the subsequent scene. This strategy would benefitMean Percentage Hits, Percentage Correct Rejections the detection of consistent objects, because the position of a(Percentage False Alarms), and A for Experiment 2 consistent object in a scene is easier to predict than the position of an inconsistent object. Second, the presentation Target label % correct rejections consistency %hits (% false alarms) A of a semantically inconsistent target label before the scene may have interfered with perceptual processing when the Consistent 75.7 68.0 (32.0) .803 scene implied by the label was not presented. In addition, Inconsistent 55.5 86.3 (13.7) .810 elimination of the location cue improves the object-
OBJECT PERCEPTION IN SCENES 407 Preview Postview Figure 3. Schematic illustration of a trial in Experiment 3.detection paradigm by undermining the potential strategy of location cue changed the participants task. In Experiments 1using cue position to assist post-presentation guessing. and 2, the task was to determine whether the target object Figure 3 illustrates the main aspects of the paradigm. In appeared at the cued location. In Experiment 3, the task wasthe target label preview condition, participants saw a target to determine whether the target object appeared anywhere inlabel for 1500 ms, followed by presentation of the scene the scene. As in Experiment 2, the semantic consistencyfor 200 ms, followed by a pattern mask containing a series manipulation in the catch trials was based on the relationshipof lowercase Xs, which remained on the screen until between the target label and the scene rather than theresponse. In the target label postview condition, participants relationship between the presented object and the scene.6saw a series of Xs in a blank field for 1500 ms, followed by Table 1 presents a summary of the target-present and catchpresentation of the scene for 200 ms, followed by a pattern trial design in Experiment 3.mask containing the target label, which remained on thescreen until response. The series of X& was employed toroughly equate stimulus presentation in the two conditions. 6 In Experiments 3 and 4, the object presented in the scene wasIn addition to the manipulation of target label (preview or not cued by a location dot. Therefore, we will refer to this object aspostview), we omitted the location cue. The absence of a the presented object rather than the cued object.
408 HOLLINGWORTH AND HENDERSONMethod obtained: Correct rejections for consistent target objects dropped significantly between preview and postview trials Participants. Twenty-four members of the Michigan State (11.0%), but correct rejections for inconsistent target objectsUniversity community were paid $5 each for their participation. All showed almost no decline (1.2%). As in Experiments 1 andparticipants had normal or corrected-to-normal vision. The partici- 2, both the hit rate and false-alarm rate were higher forpants were naive with respect to the hypotheses under investiga-tion. None had participated in previous experiments. consistent versus inconsistent target objects, suggesting that Stimuli. The stimuli were the same as in Experiment 2, except participants were biased to respond "yes" more often whenthat a blank space, subtending 6.3 X 1.5 degrees, was created in the attempting to detect a consistent versus an inconsistentcenter of the pattern mask to accommodate the object label or series object. Given this pattern, sensitivity measures again pro-ofATs. vide a better indication of detection accuracy than percent- Apparatus. The apparatus was the same as that used for age correct performance.Experiment 1, except that the stimuli were displayed on a A analysis. Mean A as a function of target labelflat-screen monitor with a 72-Hz refresh rate. presentation and target label semantic consistency is re- Procedure. The procedure was the same as that in Experiment ported in Table 4. There was a reliable main effect of target 1, except that the participant was informed that the target label label presentation, F(l, 23) = 23.52, MSE = .0067, p <could appear either before or after the scene. In either case, theparticipant was to press the button marked "yes" if the object .001, with better detection in the preview conditions (.836)named by the label had appeared anywhere in the scene or the than in the postview conditions (.755). In addition, there wasbutton marked "no" if the object had not appeared in the scene. a reliable main effect of semantic consistency, F(l, 23) = Each participant saw 160 experimental trials that were produced 5.87, MSE = .0034, p < .05, with better detection ofby a within-participant factorial combination of 2 target label semantically inconsistent target objects (.810) than consis-presentation conditions (preview, postview) X 2 target label tent target objects (.782). These effects were mediated by aconsistency conditions X 2 target object presence conditions X 20 marginally reliable interaction between label presentationscenes. Object position was manipulated between participants and and target object semantic consistency, F(l, 23) = 3.55,was tied to the preview-postview manipulation. In one group, if MSE = .0036, p = .07. Planned simple effects tests showedposition A was employed for the preview trials of a particular that there was no difference in performance for consistentscene, position B was used for all postview trials employing that versus inconsistent target objects in the preview condition,scene. These assignments were reversed in the second group.Because the object position factor was not of theoretical interest, F < 1, but there was a reliable advantage for inconsistentthe two levels of that factor were combined in the statistical target object detection in the postview condition, F(l, 23) =analyses. Each participant saw all 160 trials in a different random 5.35, MSE= .0060, p < . 05.order. The entire session lasted approximately 45 min. DiscussionResults The results from the preview condition of Experiment 3 replicated those of Experiment 2 and provided no support Percentage correct analysis. Mean percentage correct for contextual facilitation models of object perception inperformance as a function of target label presentation, target scenes. The pattern of performance as a function of targetlabel semantic consistency, and target object presence is label consistency was the same across the two experiments:shown in Table 4. First, there was a reliable main effect of There was no advantage for the detection of semanticallytarget label presentation (preview, postview), F(l, 23) = consistent versus inconsistent objects, and participants dem-36.00, MSB = .0082, p < .001. Participants responded onstrated a bias to respond that consistent target objects werecorrectly 73.8% of the time in the preview condition and present in the scene compared to inconsistent objects. In65.9% of the time in the postview condition. There was also addition, overall performance in the preview condition ofa main effect of target object presence, F(l, 23) = 10.89, Experiment 3 (A1 = .836) was higher than that in Experi-MSE = .0765, p < .005, with better performance in the ment 2 (A = .807). These experiments differed only in thecatch trials (76.4%) than in the target-present trials (63.2%). presence of the location cue, suggesting that the location cueThere was no main effect of target label semantic consis- provided little if any information to aid detection perfor-tency, F < 1, but there was a reliable interaction betweentarget label semantic consistency and target object presence,F(l, 23) = 107.67, MSE = .0313, p < .001. The hit rate for Table 4consistent target objects (76.4%) was higher than that for Mean Percentage Hits, Percentage Correct Rejectionsinconsistent target objects (50.1%), but the reverse pattern (Percentage False Alarms), and A for Experiment 3obtained in the catch trials, with a higher correct rejectionrate for inconsistent target objects (89.8%) than for consis- Target label % correct rejectionstent target objects (63.0%). Finally, there was a reliable consistency %hits (% false alarms) A3-way interaction between all factors considered, F(l, 23) = Preview10.63, MSE = .0080, p < .005. In the target-present trials, Consistent 79.4 68.5(31.5) .834performance for consistent target objects dropped slightly Inconsistent 56.7 90.4 (9.6) .839between preview and postview conditions (6.1%), but Postview Consistent 73.3 57.5 (42.5) .729performance for inconsistent target objects dropped more Inconsistent 43.5 89.2(10.8) .781significantly (13.2%). In the catch trials, the reverse pattern
OBJECT PERCEPTION IN SCENES 409mance. The results of the postview condition of Experiment consistency between the scene and the presented object. For3 also failed to support contextual facilitation models of this experiment we chose one additional consistent and oneobject perception in scenes. Contrary to the prediction of additional inconsistent object to be presented in each scene.those models, an advantage for the detection of inconsistent Thus, each scene could contain one of four presentedtarget objects was obtained. This result is in particular objects, two of which were consistent and two of which werecontrast to the prediction derived from the priming model inconsistent. In the forced-choice response screen, onethat consistent object facilitation may only be observed object label named the object presented in the scene, and thewhen the target label is presented after the scene. second label named the other object of the equivalent In addition, the results from Experiment 3 provide further semantic consistency. For example, when a consistent objectevidence that the semantic consistency between the cued (a chicken or a pig) was presented Ln the farmyard scene, theobject and the scene has little or no influence on catch trial forced choice screen presented the labels "chicken" andperformance. In Experiment 3, the location cue was elimi- "pig." As in previous experiments, contextual facilitationnated, and the task was to determine whether the target models predict that percentage correct discrimination perfor-object appeared anywhere in the scene. Participants were mance should be better when the presented object istherefore unaware, at least initially, that the identity of the consistent versus inconsistent with the scene in which itpresented object had any bearing on whether the target appears.object was present or absent. If the lower false-alarm rate forinconsistent versus consistent target objects in Experiment 2 Methodwas caused by the semantic consistency of the cued object,that difference should have disappeared, or at least have Participants. Twenty-four Michigan State University under-been attenuated, in Experiment 3. Contrary to this predic- graduate students participated in the experiment for course credit.tion, the false-alarm rates for inconsistent target label catch One of these original participants had to be replaced because he hadtrials in both the preview and postview conditions of difficulty understanding the instructions. All participants hadExperiment 3 were lower than that in Experiment 2. In normal or corrected-to-normal vision. The participants were naiveaddition, the difference in the false-alarm rate as a function with respect to the hypotheses under investigation. None had participated in previous experiments.of target label semantic consistency was actually larger in Stimuli. The stimuli were the same as in Experiments 1—3 withboth the preview and postview conditions of Experiment 3 the following modifications. For each scene, a second consistentthan in Experiment 2, suggesting that participants were more and a second inconsistent object were chosen. One scene from thebiased to respond that consistent objects were present in original set was replaced because of the difficulty of finding aExperiment 3 than in Experiment 2. Thus, the absence of a second object that would be consistent in the scene at the sameconsistent object advantage in Experiment 2 and the preview location as the original consistent object. Finally, only one objectcondition of Experiment 3, and the presence of the inconsis- location was used for each scene. The object labels appearing aftertent object advantage in the postview condition of Experi- scene presentation were centered vertically and positioned to thement 3, do not appear to have been caused by the semantic left and right of fixation. The labels were created using lower-case,consistency of the object presented in the scene during catch 24-point, anti-aliased Arial font. Apparatus and procedure. The apparatus was the same as intrials. Experiments 1 and 2. Participants were presented a scene for 250 ms, followed by a pattern mask for 30 ms, followed by a Experiment 4 forced-choice screen containing two object labels. There was no delay (i.e., the inter-stimulus interval was zero) between each The purpose of Experiment 4 was to provide converging display. The forced-choice screen remained in view until theevidence bearing on the general hypothesis that consistent participant pressed the left button to indicate that the object namedscene context facilitates object perception. In Experiments by the left-hand label had appeared or the right button to indicate1-3, A was employed to control for participant response that the object named by the right-hand label had appeared in the scene. Each participant saw 160 experimental trials that werebiases. In Experiment 4, we introduced a forced-choice produced by a within-participant factorial combination of 2 pre-procedure, similar to that developed by Reicher in the word sented object consistency conditions X 2 presented objects X 2recognition literature (Reicher, 1969), to eliminate response label positions in the forced-choice display X 20 scenes. Becausebias entirely. This paradigm eliminates response biases the presented object factor and the label position factor were not ofcaused by the semantic consistency of the object label theoretical interest, the two levels of each factor were combined inbecause participants must discriminate between two object the statistical analyses. Each participant saw all 160 trials in alabels, both of which are either semantically consistent or different random order. The entire session lasted approximately 45inconsistent with the scene. Figure 4 illustrates the mainaspects of the paradigm. A scene was presented for 250 ms, followed by a pattern Resultsmask for 30 ms, followed by a screen displaying two objectlabels. One label named an object presented in the scene, and The influence of object consistency on percentage correctthe other label named an object that had not appeared in the discrimination performance was analyzed via a simplescene. The participants task was to indicate which of the effects test. There was no effect of the consistency of thetwo labels named an object that had been presented in the presented object, F < 1. Participants responded correctlyscene. The main contextual manipulation was the semantic 70.7% of the time when the presented object was consistent
410 HOLLINGWORTH AND HENDERSON response Figure 4. Schematic illustration of a trial in Experiment 4.with the scene and 71.6% of the time when the presented In Experiment 4, only one object position was used forobject was inconsistent with the scene. The 95% confidence each scene. It is possible that the absence of a consistentinterval around these means was ± 1.69%. Thus, the experi- object advantage was due to participants learning thement had enough power to detect a 2.39% effect (see Loftus position at which the presented object appeared in each& Masson, 1994). scene. Such knowledge could allow participants to direct their attention quickly to the object position regardless ofDiscussion whether the presented object was semantically consistent or inconsistent with the scene. To investigate this possibility, In Experiment 4, we introduced a forced-choice proce-dure to investigate the influence of scene context on object we calculated percentage correct discrimination perfor-perception independently of response bias. Contrary to the mance as a function of object consistency and first halfprediction of contextual facilitation models, no advantage versus second half of the trials. If the learning of objectfor the detection of consistent objects was found. In fact, the positions masked a consistent object advantage, that advan-non-reliable trend was in the direction of better inconsistent tage would more likely be found in the first half of theobject detection. Thus, using the same general set of scene experiment than in the second. However, there was nostimuli, the original object-detection paradigm (Experiment evidence of a consistent object advantage in the first half of1) produced a consistent object advantage, but Experiment the experiment. Percentage correct discrimination was 67.3%4, in which response bias was eliminated, showed no such when the presented object was consistent and 68.8% whenadvantage. This suggests, again, that the consistent object the presented object was inconsistent. A second potentialadvantage in the original object-detection paradigm resulted, concern with Experiment 4 is that the scene was presentedat least in part, from the inadequate control of response bias for 250 ms, 100 ms longer than studies finding contextualrather than from the influence of scene context on object facilitation of object detection (Biederman et al., 1982;perception. Boyce et al., 1989). However, we have replicated Experi-
OBJECT PERCEPTION IN SCENES 411ment 4 with a scene presentation duration of 150 ms and versus inconsistent with the scene. Because of the design ofobtained no advantage for the discrimination of semantically the catch trials in this paradigm, sensitivity measures wereconsistent objects (Hollingworth & Henderson, in press-a). unable to adequately control for this response bias.In fact, discrimination performance was reliably higher In Experiment 2, we modified the original object-when the target object was inconsistent versus consistent detection paradigm so that participants attempted to detectwith the scene. the same object on trials in which it was and was not present in the scene. This modification assured that the detection performance measure (A1) was based on the correct detec- General Discussion tion of a particular signal when it was present in the scene The purpose of this study was to investigate whether and the false detection of the same signal when it was notobject identification is influenced by the semantic relation- present. In addition, response biases caused by the semanticship between an object and the scene in which it appears. consistency of the target label were controlled in A,One view of the influence of scene context on object providing a valid measure of object-detection performance.identification proposes that consistent scene context facili- Contrary to the prediction of contextual facilitation models,tates the perception of objects (Biederman, 1972; Biederman no advantage was found for the detection of consistentet al., 1973; Kosslyn, 1994; Ullman, 1996). This view has versus inconsistent objects.been instantiated in contextual facilitation models of object In Experiment 3, we manipulated whether the target labelperception in scenes, which propose that knowledge about appeared before or after the scene, and we eliminated thethe objects found in a given scene type facilitates the location cue. The purpose of the label manipulation was toperception of object stimuli consistent with that scene investigate whether presenting the label before the scene(Biederman, 1981; Biederman et al., 1982; Boyce & Pollat- may have interfered with scene processing when it wassek, 1992; Boyce et al., 1989; Friedman, 1979; Palmer, inconsistent with the subsequent scene, and may have1975a). The primary evidence supporting contextual facilita- allowed participants to constrain their subsequent search,tion models comes from experiments demonstrating that the biasing detection performance. The location cue was elimi-detection of an object in a scene is facilitated when the nated to see if it provided information useful to post-object is semantically consistent compared with when it is presentation guessing strategies. The results from the targetinconsistent with the scene (Biederman et al., 1982; Boyce label preview condition replicated those of Experiment 2et al., 1989). However, as discussed in the Introduction, and indicated that the location cue provided little or nothere are several potential problems with the original information to aid performance. The main result from theobject-detection paradigm that make interpretation of these target label postview condition was superior detection ofdata difficult (De Graef et al., 1990; De Graef & dYdewalle, inconsistent versus consistent objects.1995; Henderson, 1992). Specifically, the original paradigm In Experiment 4, we employed a forced-choice proceduremay not have adequately controlled for participant response to assess object-detection performance. This procedurebiases, and may have provided additional sources of informa- improved the object-detection paradigm by eliminating thetion that influenced detection performance. In this study, we possibility of response bias caused by the semantic consis-modified the original object-detection paradigm to address tency of the target label. Participants were asked to discrimi-these concerns. nate between two consistent object labels (when the pre- In Experiment 1, we replicated the original object- sented object was consistent with the scene) or between twodetection paradigm contextual facilitation effect (Biederman inconsistent object labels (when the presented object waset al., 1982; Boyce et al., 1989). Participants saw a target inconsistent with the scene). Contrary to the prediction oflabel naming an object, followed by a briefly presented contextual facilitation models, no advantage was found forscene, followed by a pattern mask with an embedded the detection of consistent objects.location cue. The object marked by the location cue could be In order to conclude from these data that consistent sceneeither semantically consistent or inconsistent with the scene context does not facilitate object identification, it is neces-in which it appeared. Detection performance was based on sary that these experiments meet two criteria. First, scenepercentage correct for trials in which the target label named meaning must have been available early enough to influencethe cued object and percentage of errors for trials in which object identification, if such influences exist. Second, thethe target label named a different object that did not appear contextual manipulation must have been strong enough toin the scene. In these latter trials, the semantic consistency of interact with object identification, if such interaction isthe target label with the subsequent scene was not related to possible. In this study, scene meaning was available earlythe consistency of the cued object with the scene. Using this enough and was strong enough to produce reliable responseparadigm, we replicated the basic consistent object contex- biases based on the consistency of the target label. Thesetual facilitation effect: Detection performance (A1) was response biases suggest that scene meaning and its relation-better when the cued object was semantically consistent ship to the identity of the target object was available fromversus inconsistent with the scene. However, a more detailed information obtained within the brief presentation durationanalysis of the catch trial data indicated that reliable of the scene stimulus. In addition, the contextual manipula-response biases were caused by the semantic consistency of tion was strong enough to replicate the Biederman et al.the target label: Participants responded "yes" more often (1982) results. Given that we found no evidence forwhen the catch trial target label was semantically consistent facilitated detection of semantically consistent objects in the