Canosa Saliency Based Decision Support

  • 538 views
Uploaded on

A model of visual saliency is often used to highlight interesting or perceptually significant features in an image. If a specific task is imposed upon the viewer, then the image features that …

A model of visual saliency is often used to highlight interesting or perceptually significant features in an image. If a specific task is imposed upon the viewer, then the image features that disambiguate task-related objects from non-task-related locations should be incorporated into the saliency determination as top-down information. For this study, viewers were given the task of locating potentially cancerous lesions in synthetically-generated medical images. An ensemble of saliency maps was created to model the target versus error features that attract attention. For MRI images, lesions are most reliably modeled by luminance features and errors are mostly modeled by color features, depending upon the type of error (search, recognition, or decision). Other imaging modalities showed similar differences between the target and error features
that contribute to top-down saliency. This study provides evidence that image-derived saliency is task-dependent and may be used to predict target or error locations in complex images.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
538
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
3
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Saliency-Based Decision Support Roxanne L. Canosa∗ Rochester Institute of Technology Figure 1: Examples of types of information collected from eye-tracking data. A large open circle indicates a lesion location, an ’X’ indicates a mouse click, and a small (open or filled) circle indicates a fixation. A large circle without an ’X’ indicates a false negative (search error); a small, unfilled circle indicates a fixation less than 350 msec (recognition error); a small filled circle indicates a fixation greater than 350 msec (decision error). Left, participant correctly located all three lesions. Right, four search errors and one decision error. Abstract of interest, and assigns to each region a weight according to the computed saliency. For example, bright colors, high luminance ar- A model of visual saliency is often used to highlight interesting eas, edges, and corners may rate highly in visual saliency and thus or perceptually significant features in an image. If a specific task would be assigned a higher probability of fixation. If image fea- is imposed upon the viewer, then the image features that disam- tures of a particular target are known in advance, these can be used biguate task-related objects from non-task-related locations should to modulate the relative saliency for enhanced target discernability. be incorporated into the saliency determination as top-down infor- mation. For this study, viewers were given the task of locating po- The saliency map used for this study is an adaptation of a well- tentially cancerous lesions in synthetically-generated medical im- known saliency model [Itti et al. 1998]. Map generation took ap- ages. An ensemble of saliency maps was created to model the tar- proximately 90 seconds using MATLAB on a 1.8 GHz Intel dual get versus error features that attract attention. For MRI images, core processor. The map consists of three essential feature - color, lesions are most reliably modeled by luminance features and errors luminance, and oriented edges. The color map is further separated are mostly modeled by color features, depending upon the type of into two color-opponent process features - the red/green compo- error (search, recognition, or decision). Other imaging modalities nent and the blue/yellow component. The final map is constructed showed similar differences between the target and error features as the summation, at each pixel, of the contribution from each fea- that contribute to top-down saliency. This study provides evidence ture at that pixel location. Equation 1 shows how the feature maps that image-derived saliency is task-dependent and may be used to are combined to produce the saliency map used for target loca- predict target or error locations in complex images. tion. ’C’ indicates the color map, ’I’ indicates the luminance map, ’E’ indicates the oriented-edge map, and ’P’ is a high-level proto- 1 Introduction object map that locates potential objects from highly textured re- gions in the image. Essentially, the only features that are used for A saliency map is a computational model of human visual per- the saliency map are color, luminance, and textured edges; it is the ception that defines a relationship between the components of a relative weight of each feature according to the target type that de- scene and the relative importance of those components to the viewer termines the final contribution of each individual feature to the final [Koch and Ullman 1985]. A saliency map includes a priority rating saliency map. The relative weights of the features are determined of each of the components and a gating mechanism whereby se- using a technique described below. lected regions are processed and non-selected regions are inhibited. According to the theory, the visual system performs an initial low- frequency parsing of the environment to identify potential regions saliency map = (C ∗ w1 + I ∗ w2 + E ∗ w3 + P ∗ w4)/4 (1) ∗ e-mail: rlc@cs.rit.edu Target saliency was derived from the image features at the known Copyright © 2010 by the Association for Computing Machinery, Inc. lesion locations. Error saliency was derived from mouse clicks and Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed fixations locations recorded during a target search task. Errors were for commercial advantage and that copies bear this notice and the full citation on the classified as false positives (mouse click on a non-target location) first page. Copyrights for components of this work owned by others than ACM must be and three categories of false negatives: search error (target never honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on fixated), recognition error (target fixated less than 350 millisec- servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail onds), and decision error (target fixated greater than 350 millisec- permissions@acm.org. onds) [Krupinski 2000]. ETRA 2010, Austin, TX, March 22 – 24, 2010. © 2010 ACM 978-1-60558-994-7/10/0003 $10.00 61
  • 2. 2 Method An ASL Model 504 remote eye-tracker was used for this experi- ment, along with the ASL Eye-trac 6 User Interface Software and Control Unit. 19 participants from the campus community (nine males and ten females between the ages of 18 and 58) were re- cruited, all na¨ve with respect to the purpose of the experiment, and ı with no prior experience locating lesions in radiological images. All participants were screened for normal color vision and normal or corrected-to-normal vision and were allowed an unlimited amount of time to detect as many targets in each image as possible. The eye- tracking session lasted approximately 30 minutes per participant, including calibration time. Prior to the start of the experiment, each participant was given an instruction sheet with information about how the experiment would proceed. The instructions stated, in gen- eral, that a feature would appear as a circular spot in the image and Figure 2: Unweighted saliency map using only low-level features could be located anywhere within the anatomical portion of the im- of color, luminance, and oriented edges (left) and after thresholding age (i.e., a feature would never be located on the image border). at 0.45 (right). Lesions are shown surrounded by a white square. Since the participants used in this study were not radiologists, the results are not directly applicable to a clinical setting; however, un- trained observers might still provide useful information about the The map score is used to determine how well a saliency map models target and error features that attract attention during search in com- attention. If the score is close to one, then the map is not a good plex imagery. model of attention - since St is nearly equal to Sm , any random The experiment consisted of monitoring and recording participants’ location would do just as well at predicting the response. If, on fixation locations, fixation durations, and mouse clicks as they the other hand, the score is greater than one, then the map is a good viewed eleven sets of six simulated brain images (66 images to- (better than random) model of attention because the target locations tal). Simulated lesions with known size, shape, contrast, and lo- tend to be on regions of the image that the model has computed as cation were inserted into the images at random locations. Each being highly salient. image had between zero and five lesions. The images were gen- The scoring procedure is repeated with a different set of weights to erated from single-mode PET and MRI phantoms and multi-mode produce another candidate map, and stops when the highest possi- fused PET/MRI images. Three sets of fused images were used, ble score is produced. Since an exhaustive search across the en- each set using a different color look-up table for displaying the tire weight space is computationally prohibitive, a genetic algo- mixed modes. The fused images were sub-divided into three cat- rithm was developed to find approximately optimal weights, using egories depending upon whether the lesions were embedded in the the scoring metric described above as the fitness criteria. The ge- PET image, the MRI image, or both. Figure 1 shows examples of netic algorithm was initialized with random weights for each fea- the fused PET/MRI images and the types of information that was ture map, and then over each generation (300 total) the two high- collected during the experiment. est scores were selected to randomly exchange their weights, with crossovers and mutations allowed according to established param- 3 Determining Feature Weights eters. A total of 2,400 trials were run before a solution converged. A na¨ve saliency map weights each of the low-level feature maps ı Figure 2 shows an example saliency map generated using only low- (color, luminance, and orientation) equally in the final summation level features (color, luminance, and oriented edges) without any step. An optimally weighted map would take into account the rela- task- or target-related information (as in the standard model [Itti tive importance of any feature for the target type. To determine the et al. 1998]). Figure 3 shows the same image with the saliency map optimal feature weights, a metric was developed to “score” a map, generated using weights learned from the genetic algorithm and ap- givn a specific weight vector. A map score is defined simply as the plied to the low-level feature maps. For this example the targets are ratio of the mean target saliency, St at some pre-defined locations lesions, with locations indicated on the image by a white square. to the mean saliency of the entire map, Sm . Figure 4 shows the weighted saliency map found for an MRI im- age with five lesions applied to an MRI image with 3 lesions. The Score = St / Sm . weighted map is able to correctly predict lesion locations in this different test image. Mean target saliency St is found by first generating a saliency map using a random set of weights for a particular input image. Next, the x,y-coordinates of a set of target locations are determined 4 Results from the eye-tracking data, ground-truth data, or from a record of observer responses (mouse clicks). For each target location, An ensemble of (approximately) optimally-weighted saliency maps the x,y-coordinate is used as an index into the saliency map, and was created, one for each of the different target types - lesion loca- the saliency value at that location is extracted. A 7x7 pixel win- tions, false positives, search errors, recognition errors, and decision dow (corresponding to 1/4◦ visual angle at the viewing distance of errors. The map feature weights for lesions locations are frequently 52 cm) is centered on the location, and all saliency values falling different from the map feature weights for errors. For example, within the window are averaged together. This procedure is re- Figure 5 shows that the highest weighted feature for lesions in the peated for every target location in the map and the average of those MRI images is luminance; however, for all of the MRI errors, the values is used as the mean target saliency, St . The mean map highest weighted feature is the blue-yellow color-opponent feature. saliency, Sm is the average saliency over all locations in the map Other imaging modalities also showed significant differences be- (target and non-target). The score of a map is then simply the ratio tween feature weights for target and error locations. This may be an between the mean target saliency and the mean map saliency. indication that visual search, recognition, and decision errors arise 62
  • 3. Figure 5: Relative weights of the low-level feature maps that are combined (summed) together to create the saliency map for the MRI images. Note that low-level features of the search target (lesions) are dominated by luminance information, whereas the low-level features that attract attention for each of the four error types are dominated by the blue-yellow color feature. Figure 3: Weighted saliency map with weights determined using a genetic algorithm optimized for target type (left) and after thresh- from specific attentional characteristics that differ from those for olding at 0.45 (right). Lesions are shown surrounded by a white correct detection in a search task. This information might be useful square. in a decision-support or computer-aided detection (CAD) system, to highlight or otherwise flag locations in the image that have a high probability of incorrect classification. 5 Conclusion Low-level features such as luminance, color, and edges can attract the attention of the human visual system during a search task, and those features are specific to certain types of targets. More re- search into the nature of decision-making at the level just below that of conscious awareness, such as is enabled by eye-tracking ex- periments, will help to uncover the pre-conscious biases and strate- gies that contribute to image interpretation, as well as image mis- interpretation. Acknowledgements Thanks to Karl Baum for generation of the MRI images. References I TTI , L., KOCH , C., AND N IEBUR , E. 1998. A model of saliency- based visual attention for rapid scene analysis. IEEE Trans- actions on Pattern Analysis and Machine Intelligence 20, 11, 1254–1259. KOCH , C., AND U LLMAN , S. 1985. Shifts in selective visual attention: Towards the underlying neural circuitry. Human Neu- robiology 4, 219–227. K RUPINSKI , E. A. 2000. The importance of perception research Figure 4: Weighted saliency map on different image, thresholded at in medical imaging. Radiation Medicine 18, 6, 329–324. 0.45. Lesion locations are correcly predicted by the saliency map. 63
  • 4. 64