Attention implements an information-processing bottleneck that allows only a small part of the incoming sensory information to reach short-term memory and visual awareness.
2. Overview
• Attention
• Visual saliency
• Bottom-up attention
• Koch-Ulman framework
• Visual Attention in brain
• Coarse to Fine theory
• Top-Down Facilitation
• Comparing Attentional Neural Network with human behavior
2
3. Attention
• Attention implements an information-processing bottleneck that
allows only a small part of the incoming sensory information to reach
short-term memory and visual awareness.
• key challenge is to select which impressions are relevant and which
inputs should be ignored.
• This process of selecting a subset of the input, and ignoring the rest, is
referred to as attention
• bottom-up and top-down attention, or stimulus-driven and goal-
oriented attention
3
4. Visual saliency
• At a pre-attentive stage some parts
of the scene may pop out.
• Visual saliency refers to the idea that
certain parts of a scene are pre-
attentively distinctive and create
some form of immediate significant
visual arousal
• how can a machine vision system
extract the salient regions from an
unknown background?
4
5. 1. low level feature
extraction
2. Saliency map
creation
3. Winner-Take-All
(WTA)
4. Inhibition of Return
(IoR)
5. Top-down attentional
bias
Flow diagram of a typical model for the control of attention
5
10. Saliency map construction
1- Cross-scaling sum on all created feature channels
))
(
),
(
),
(
),
(
(
))
(
(
))
(
),
(
),
(
),
(
(
4
3
2
1
3
1
3
1
3
1
s
S
s
S
s
S
s
S
S
s
S
S
s
S
s
S
s
S
s
S
S
O
O
O
O
s
O
I
s
I
Y
B
G
R
s
c
3- Saliency maps are then smoothed
with Gaussian filter
O
o
I
i
C
c S
W
S
W
S
W
S *
*
*
2- Integrated saliency map
c
S I
S
O
S
10
11. Segmentation
• Threshold segmentation (the saliency map is converted into a binary
image using a threshold)
)
(
)
(
0
)
(
1
)
(
sa
E
threshold
threshold
x
sa
threshold
x
sa
x
bm
}
)
(
{
A
B
z
B
A z
dilation erosion 11
12. • The ventral (’what’) stream processes visual
shape appearance and is largely responsible
for object recognition.
• The dorsal (’where’) stream encodes spatial
locations and processes motion information.
• Bottom-up information that can guide
attention propagates thus from the visual
cortex to the PFC.
• PFC areas can provide top-down signals to
control attention to some degree
How does the brain process attention?
12
13. • coarse, low spatial frequency (LSF)
information is processed first
• quickly projects from primary visual
cortex to higher level visual areas (PFC, OFC)
• Psychophysical and single-unit recordings in monkeys
indicate that low spatial frequencies are extracted from
scenes earlier than high spatial frequencies
13
14. 14
• We trained a 3 layer deep belief
network and performed an
unsupervised learning scheme
on the obtained deep
representations.
Developmental learning in DNNs: Fine to coarse development
• There’s a progression in
depth in hidden layers of
DBN where low level layers
represent finer distinctions
and high level layers
represent coarser
distinctions
Sadeghi, Zahra. "Deep learning and developmental learning: emergence of fine-to-coarse conceptual categories at layers of deep belief
network." Perception 45.9 (2016): 1036-1045.
15. • Input to the visual system is often noisy and ambiguous
• a growing body of theoretical work and empirical evidence support the idea
that visual recognition is facilitated by top-down expectations
• Context facilitates the recognition of related objects even if these objects are
ambiguous when seen in isolation
• an ambiguous object becomes recognizable if another object that shares the
same context is placed in an appropriate spatial relation to it.
15
Top-down processing contribution
16. + +
inconsistent case consistent case
Effect of context in occluded object recognition
500 ms 500 ms
'Type the name of the object and then press enter’
300 ms
300 ms
16
Sadeghi, Zahra. "The effect of top-down attention in occluded object recognition." arXiv preprint arXiv:2007.10232 (2020).
18. 18
Hit const vs
hit inconst
Miss const vs
miss inconst
Sup hit const vs
sup hit inconst
Sup miss const vs
sup miss inconst
Hypo_pos1 vs
hypo_neg1
Hypo_pos2 vs
hypo_neg2
Resp-time
const vs
inconst
p-val 0.0027 0.0027 0.0027 0.0027 0.0027 0.0027 4.6921e-11
Sadeghi, Zahra. "The effect of top-down attention in occluded object recognition." arXiv preprint arXiv:2007.10232 (2020).
20. Global-And-Local-Attention (GALA)
• Global-and-Local-attention (GALA) network extends the squeeze-and-
excitation (SE) network by adding a local saliency module.
• The attention mechanism is embedded in the cost function as a
regularization term
20
21. • three cases are considered:
1- networks trained on color images and tested on color images.
2- networks trained on grayscale image and tested on grayscale images.
3- networks trained on color images and tested on grayscale image.
the best performance in both color and grayscale
cases is achieved by gala click, while gala no click
and no gala no click obtained second and third best
results respectively.
the highest accuracy for all models is attributed to
the case in which images are trained on colorful
images and tested on colorful images.
21
Sadeghi, Zahra. "An Investigation on Performance of Attention Deep Neural Networks in Rapid Object
Recognition." Intelligent Computing Systems: Third International Symposium, ISICS 2020, Sharjah, United Arab Emirates,
March 18–19, 2020, Proceedings 3. Springer International Publishing, 2020.
22. • to test the effect of importance maps
collected in clickme.ai experiment, a rapid
object recognition experiment was
designed
• The dataset contains 100 images from
animal and non-animal.
• Phase scrambled masks are applied to
images
• eleven versions of each image ordered
ascendingly based on their level of pixel
revelation of important pixels.
22
23. Model and human performance
• Average accuracy of the two gala models
(gala-click and gala no-click) and ResNet-50
model (no-gala-no-click) is compared on
the behavioral test images at different
levels of pixel revelation.
• gala click and gala no click models
achieved similar accuracy.
• gala click model produces superior results
compared to all other models in full pixel
revelation.
• The second best performance in full level,
is achieved by gala-no-click model.
23
Sadeghi, Zahra. "An Investigation on Performance of Attention Deep Neural Networks in Rapid Object
Recognition." Intelligent Computing Systems: Third International Symposium, ISICS 2020, Sharjah, United Arab Emirates,
March 18–19, 2020, Proceedings 3. Springer International Publishing, 2020.
24. • Human visual attention is well-studied
• while there exist different models, they lack computational efficacy of
our visual system
• Attention Mechanisms in Neural Networks are still loosely based on
the visual attention mechanism found in humans.
24