Md Mushfiqul Alam: Biological, NeuralNet Approaches to Recognition, Gain Control

Biological, NeuralNet
Approaches to
Recognition, Gain Control
Md Mushfiqul Alam
PhD, Electrical Engineering
Oklahoma State University

Diversity of Cognitive Computing
Artificial
Intelligence
Machine
Learning
Natural
Language
Processing
Reasoning/
Question
Analysis
Neuromorphic
Chip
(TrueTnorth)
Robotics
Cognitive
ComputingNeuro
science
Feature
Engineering
2

Diversity of Cognitive Computing
Artificial
Intelligence
Machine
Learning
Natural
Language
Processing
Reasoning/
Question
Analysis
Neuromorphic
Chip
(TrueTnorth)
Robotics
Neuro
science
Feature
Engineering
3

 Perception
 Way to interpret sensory
information
 Understand environment
 Cognition
 Mental ability
 Judgement, evaluation
 Reasoning, problem solving
 Decision making
 Action
 Leads to experience
 Guided/unguided by
perception and cognition
 Visual Perception
 Low-level vision phenomenon:
visual masking
 Recognition effect in masking
 Biological plausibility of models
Focus: Perception
4

 Perception
 Way to interpret sensory
information
 Understand environment
 Cognition
 Mental ability
 Judgement, evaluation
 Reasoning, problem solving
 Decision making
 Action
 Leads to experience
 Guided/unguided by
perception and cognition
 Visual Perception
 Low-level vision phenomenon:
visual masking
 Recognition effect in masking
 Biological plausibility of models
Focus: Perception
5
Intra/Inter cortical feedback
certainly effective in brain. How
do we model such feedbacks
efficiently?

Visual Masking
 Perceptual local phenomenon: Distortion visibility
6
image distorted image
Grass looks less distorted than
child, sand, and water

Visual Masking
7
(less
masking)
(more
masking)
data from human subject
(Masking map)

Application: Compression
8
Encoder
Accurate prediction
of masking map
Fewer bits where
distortion less visible
Small file size
Smaller file
size, same
quality
Other applications:
Watermarking,
Texture synthesis,
Image quality
assessment

Flow of Talk
First: Database of Visual Masking
Second: Computational Models of Masking
Third: Application and Future
9

Flow of Talk
10

Traditional Stimuli
11
 Stimulus: A signal shown to human
subject
 Stimulus = Mask + Target
 Traditionally both masks and targets are
unnatural
 Pros: Well-defined features
 Cons:
 Cannot capture natural scenes
properties
 Results not effective for natural scenes
 nonlinear response of visual system
sine-wave grating
[Legge ‘80][Foley ‘94]
visual noise
[Carter ‘71]
Gabor pattern
[Foley ‘94]
Checkerboard
[Pashler ‘88]

Our Stimuli
12
(patch) (distortion)
+
Mask Target
Stimulus
(Subtend angle: 2 degrees)

Our Stimuli
13
+
Mask Target
Stimulus

Our Stimuli
14
+
Mask Target
Measure contrast
detection threshold, 𝑪 𝑻
Stimulus

Measure contrast
detection threshold, 𝑪 𝑻
Experiment Method
15
+
Mask Target
Stimulus
 Metric
 RMS contrast [Kingdom ‘90]
 𝑪 𝑻 = 𝟐𝟎 𝐥𝐨𝐠 𝟏𝟎
𝝈 𝑻
𝝁 𝑴
 𝜎 𝑇: Target standard deviation
 𝜇 𝑀: Average mask luminance
(cd/m2)
 Subject
Very consistent
 Pearson correlation coefficient
 Intra-subject 0.95
Inter-subject 0.92
 Number of measures
 1080 patches (Largest dataset)
 Three subject per patch
 Two runs per subject
 Six measures per patch
 Procedure
 Psychophysical Quest TAFC
[Peli ‘87]
 40 trials per run
 Dark room, CRT monitor,
14 bit resolution

Our Target
16
 Target: Log-Gabor noise
 Excites only one visual channel
 Well-accepted in vision
psychology
 [Caelli ‘86] [Teo‘94] [Watson ‘97]
[Ringach ‘09] [Geisler ‘14]
Log-Gabor filter
In~ U(0, 1)
2-D frequency
response
𝑥
𝑦
Log-Gabor noise
(3.7 cycles/degree,
vertical orientation)
Human data
[deValois ‘74]
Contrast
sensitivity
Spatial frequency (cycles/degree)
Peak approx.
4 cycles/degree

Our Masks
17
Image source: CSIQ database [Chandler ‘09]
Animal
Plant
Landscape
Urban
Structure
People
 Masks
 Six categories
 30 natural
scenes
 1080-patches

Masking Maps
18
 Category: Landscape
sunsetcolor
(more
masking)
(less
masking)
colorbar(dB)

Masking Maps
19
(more
masking)
(less
masking)
colorbar(dB)
geckos
Structural masking
[Chandler ‘09]
Entropy masking
[Waston ‘97]
 Category: Animal

Masking Maps
20
(more
masking)
(less
masking)
colorbar(dB)
trolley
 Category: Urban

Flow of Talk
21

Flow of Talk
 Model 1: Feature Regression
 Model 2: Gain Control
 Model 3: Convolutional Neural Net
22

Performance of Individual Features
23
0.24
0.28
0.30
0.31
0.40
0.41
0.41
0.42
0.47
0.48
0.50
0.50
0.52
0.70
0.0 0.2 0.4 0.6 0.8 1.0
Kurtosis
Slope of mag spectrum
Average luminance
Orientation energy
Standard deviation
Entropy
Local entropy
Skewness
Edge density
Band energy
Micheleson contrast
Intercept of magnitude spectrum
RMS contrast
Sharpness
Pearson correlation coefficient

Non-linear Feature Regression
24
-80
-60
-40
-20
0
20
-80-60-40-20020
(All data)
Pearson correlation: 0.79
RMSE : 5.38 dB
Experiment thresholds (dB)
Modelpredictions(dB)
-80
-60
-40
-20
0
20
-80-60-40-20020
(15% Test data)
RMSE : 5.35 dB

Gain-Control Model
25
Gain-control
model
Mask
+
Target
Mask
Contrast
detection
threshold,
𝐶 𝑇
(Inputs)
(Output)

Gain-Control Model
26
CSF
Log-
Gabor
Excitatory
nonlinearity
Inhibitory
nonlinearity
Pooling
Divide
CSF
Log-
Gabor
Excitatory
nonlinearity
Inhibitory
nonlinearity
Pooling
Divide
Subtract
Minkowski
pool
d’≈d
?
Change
target
contrast
Calculate
final
threshold
No
Yes
Mask
+
Target
Mask
𝑟 𝑀+𝑇
𝑟 𝑀
Watson and Solomon [‘97] Model

Gain-Control Model
27
 Single neuron response
 At location (𝑥0, 𝑦0), frequency
𝑓0, and orientation 𝜃0:



NΙ),,,(
0000
0000
)),,,((
),,,(
),,,(




fyx
qq
p
fyxzb
fyxz
gfyxr
𝑟 Divisive response 𝑔 Output gain
𝑧 Response before division IN Neighboring inhibitory neurons
𝑝 Excitatory exponent 𝛽𝑟 Minkowski space exponent
𝑞 Inhibitory exponent 𝛽𝑓 Minkowski frequency exponent
𝑏 Semi-saturation constant 𝛽 𝜃 Minkowski orientation exponent
 Summing all neuron
responses:
rr
f
f
yx f
TMM rrd








1
,
'






























    

Gain-Control Model
28
 We varied three parameters
 Biologically plausible ranges
[Valois ‘90][Watson ‘97][Chandler ‘09]
 Brute-force search
 Computationally tractable
 Changing 𝐵𝑊𝑓 increases data dimensionality
Symbol Name Range Optimum
𝑞 Inhibitory exponent 1.05 − 4 𝟐. 𝟑𝟓
𝑔 Output gain 0.01 − 0.205 𝟎. 𝟏
𝐵𝑊𝑓 Frequency channel
bandwidth (octave)
1.5 − 3.5 𝟐. 𝟕𝟓

Gain-Control Model
29
-80
-60
-40
-20
0
20
-80-60-40-20020
(All data)
RMSE : 5.2 dB
-80
-60
-40
-20
0
20
-80-60-40-20020
(15% test data)
RMSE : 5.2 dB

CNN Model
 Three layer network
 4320 data, increased by patch flipping
 654 trainable parameters
 70% training, 15% validation, and 15% test
 Committee of 50 nets
 Training Alg. Resilient backprop. [Riedmiller ‘93]
 FIRST toolbox [Pranita ‘13]
30
Input
Patch
Feature map
𝑑 𝑟
1
× 𝑑 𝑐
1 Feature maps
𝑑 𝑟
2 × 𝑑 𝑐
2 Output
1 × 1
First layer
kernel
First layer convolution kernel size
RMSE(dB)
0 10 20 30 40 50 60
6
6.5
7
7.5
8
8.5
Train+Valid
Test
19 × 19

Trained Kernels
31
 First layer convolution kernels
Better performance
Betterperformance

Convolution Outputs
32
 First layer convolution outputs
Better performance
Betterperformance
Mask patch
Firstlayerconvolutionoutputs

CNN Model
33
-80
-60
-40
-20
0
20
-80-60-40-20020
(All data)
RMSE : 5.5 dB
-80
-60
-40
-20
0
20
-80-60-40-20020
(15% Test data)
RMSE : 5.5 dB

Model Comparison
34
5.4
5.2
5.5
5.4
5.2
5.5
3.5
4
4.5
5
5.5
6
Feature
based
Gain
control
CNN
RMSE(dB)
33
66
5
0
20
40
60
80
Executiontime
Perimage(sec.)
Feature
based
Gain
control
CNN
(Accuracy) (Time Complexity)

Flow of Talk
 Model 1: Feature Regression
 Model 2: Gain Control
 Model 3: Convolutional Neural Net
 Effects of Recognition
35

Recognition Effects
36
Experiment masking mapchild_
swimming
Over predictions
Feature-based Contrast gain control CNN model20 40 60 80 100
20
40
60
80
100 -45
-40
-35
-30
-25
-20

Recognition Effects
37
Feature-based Contrast gain control CNN model
Experiment masking mapgeckos
Over predictions
20 40 60 80 100
20
40
60
80
100 -45
-40
-35
-30
-25

Recognition Effects
38
 Gain-control shortcoming
 Only V1 simple cells are
modeled
 Cognitive studies undoubtedly
showed
 Active feedbacks to V1 from
higher level cortices for
conscious perception
[Bullier ‘01][Juan ‘03]
 My hypothesis: Recognition
effects in masking can be
modeled via intra/inter cortical
feedbacks

Recognition via Facilitation and
Inhibition (Structural Facilitation)
39
 Two steps:
 First: Determine if structure recognition actually affecting masking
 Second: How much facilitation to incorporate
 Facilitation through neuron inhibition
+
-
-
-
-
- -
- ---
-
-
- + -
-
-
- -
- -
Weak structure
higher neuron inhibition
Strong structure
lower neuron inhibition

Structure Detection
40
child_swimming local luminance local sharpness
local entropy
average
local texture
standard deviation
local texture
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1

Structure Detection
41
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
child_swimming local luminance local sharpness
local entropy
average
local texture
standard deviation
local texture

Structure Detection
42
 Structure map created via:
 𝐿 𝑛: local luminance, 𝑆ℎ 𝑛: local sharpness, 𝐸 𝑛: local entropy, 𝐷𝜇 𝑛
: average local
texture, 𝐷 𝜎 𝑛
: standard deviation local texture
22
)1()1( nn
DDEShLS nnn  
fisher
geckos
structure map structure map
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.090.09
0.05
0.07
0.0
0.03
0.02
0.04
0.06
0.08
0.01
foxy
child_swimming

Integrating Structural Facilitation in
Gain Control
43
Facilitation through neuron inhibition
 Inhibition multiplier 𝜆 𝑠: 0.2~1.0, depend on structure map
 𝜆 𝑠 calculated from structure map
 where 𝑆𝑖: 𝑖 𝑡ℎ
block of 𝑆, 𝑝 𝑆, 80 : 80 𝑡ℎ
percentile of 𝑆
 max 𝑆 : Strong structure
 Kurt 𝑆 : Localized structure



NI),,,(
0000
0000
)),,,((
),,,(
),,,(




fyx
q
s
q
p
fyxzb
fyxz
gfyxr






















 

 
otherwise,1
5.3)(Kurt&04.0)max(,
005.0
)80,(),(
exp1/1801
85,85
1,1,
SS
SpyxS
yx
i
is

Integrating Facilitation in Gain
Control
44
Facilitation through neuron inhibition
 Inhibition multiplier 𝜆 𝑠: 0.2~1.0, depend on structure map
 𝜆 𝑠 calculated from structure map
 where 𝑆𝑖: 𝑖 𝑡ℎ
block of 𝑆, 𝑝 𝑆, 80 : 80 𝑡ℎ
percentile of 𝑆
 max 𝑆 : Strong structure
 Kurt 𝑆 : Localized structure



NI),,,(
0000
0000
)),,,((
),,,(
),,,(




fyx
q
s
q
p
fyxzb
fyxz
gfyxr






















 

 
otherwise,1
5.3)(Kurt&04.0)max(,
005.0
)80,(),(
exp1/1801
85,85
1,1,
SS
SpyxS
yx
i
is
0 0.02 0.04 0.06 0.08
0
0.2
0.4
0.6
0.8
1
Structure, 𝑆
Inhibitionmultiplier,𝜆𝑠
Strong structure
(lower neuron
inhibition)
Weak structure
(higher neuron
inhibition)

Results with Structural Facilitation
45
Only gain-control
(Pearson
correlation: 0.68)
geckos
Experiment
map
Gain-control with
structure facilitation
(Pearson
correlation: 0.77)20 40 60 80 100
20
40
60
80
100
-35
-30
-25
-20
-15

46
Only gain-control
(Pearson
correlation: 0.58)
foxy
Experiment
map
Gain-control with
(Pearson
correlation: 0.63)20 40 60 80 100
20
40
60
80
100
-35
-30
-25
-20

47
Only gain-control
(Pearson
correlation: 0.85)
child_swimming
Experiment
map
Gain-control with
(Pearson
correlation: 0.87)20 40 60 80 100
20
40
60
80
100
-40
-35
-30
-25
-20
-15

48
Only gain-control
(Pearson
correlation: 0.55)
couple
Experiment
map
Gain-control with
(Pearson
correlation: 0.60)20 40 60 80 100
20
40
60
80
100
-40
-35
-30
-25
-20
-15

Flow of Talk
49

HEVC Compression
50
CNN
Model
(HEVC)
A patch
Feed-
Forward
network
𝐶 𝑇
log(𝑄𝑠𝑡𝑒𝑝) = 𝛼𝐶 𝑇
2
+ 𝛽𝐶 𝑇 + 𝛾
𝛼, 𝛽, 𝛾
𝑄𝑃 𝑇 = max min 𝑟𝑜𝑢𝑛𝑑
log(𝑄𝑠𝑡𝑒𝑝)
log 2
1
6
+ 4 , 51 , 0
 Better Quantization Scheme for HEVC video compression
 Block based QP prediction
 Based on detection threshold 𝐶 𝑇 – CNN model
 Fast prediction (2.2 sec/image, 106x faster than gain-control)

HEVC Compression
51
-60 -40 -20 0 20
-60
-50
-40
-30
-20
-10
0
10
20
-60 -40 -20 0 20
-60
-50
-40
-30
-20
-10
0
10
20
(a) Training + Validation
PCC: 0.95
(b) Testing, PCC: 0.93
Ground truth distortion visibilities:
Outputs of Gaincontrol+Structural facilitation (CGC+SF) model (dB)
CNNpredictions(dB)

Quantization Threshold Map
52
𝑄𝑃 map from CGC+SF 𝑄𝑃 map from CNN
redwood
log_
seaside
0
10
20
30
40
50

Reference image
Visually equivalent
image
Coded using 𝑄𝑃 map
For CGC+SF model
For CNN model
SSIM: 0.94,
bpp: 3.01
SSIM: 0.94
bpp: 2.60, gain 13.7%
SSIM: 0.94
bpp: 2.58, gain 14.3%
53
SSIM:0.89
bpp: 2.13
SSIM: 0.88
bpp: 1.82, gain 14.9%
SSIM: 0.88
bpp: 2.12, gain 0.5%
(lake)
(redwood)

SSIM: 0.96
bpp: 2.43
SSIM: 0.94
bpp: 1.69, gain 18.3%
SSIM: 0.92
bpp: 1.35, gain 35.2%
54
SSIM: 0.96
bpp: 2.49
SSIM: 0.94
bpp: 2.25, gain 9.6%
SSIM: 0.94
bpp: 2.39, gain 3.9%
(shroom)
(foxy)
Reference image
Visually equivalent
image
For CGC+SF model
For CNN model

Conclusions Future Challenges
55
Conclusions and Future Challenges
 Largest dataset of
masking presented: usable
for model benchmarking
 Accuracy of gain-control
improved via structural
facilitation and feedback
 Fast CNN model of
masking developed
 HEVC compression
efficiency improved
 First: Discovering actual
route of feedback in visual
pathway.
 Second: Developing a
CNN version of gain-
control mechanism with
feedback.
 Third: What about temporal
masking?
Thank you

These works published in External Figure Sources
References, Contacts, Downloads
 M. M. Alam, P. Patil, M. T. Hagan, and D. M. Chandler,
"A computational model for predicting local distortion
visibility via convolutional neural network trained on
natural scenes," (Accepted) IEEE ICIP 2015.
 M. M. Alam, T. Nguyen, and D. M. Chandler, "A
perceptual strategy for HEVC based on a convolutional
neural network trained on natural videos," SPIE
Applications of Digital Image Processing XXXVIII, 2015.
(doi: 10.1117/12.2188913).
 J. P. Evert, M. M. Alam, and D. M. Chandler, "Predicting
the visibility of dynamic DCT distortion in natural
videos," SPIE Applications of Digital Image Processing
XXXVIII, 2015. (doi: 10.1117/12.2188460)
 M. M. Alam, P. Patil, M. Hagan, and D. M. Chandler,
"Relations between local and global perceptual image
quality and visual masking," SPIE Human Vision and
Electronic Imaging XX, pp. 93940M, February 08, 2015,
(doi:10.10.1117/12.2084935).
 M. M. Alam, K. P. Vilankar, D. J. Field, and D. M.
Chandler, "Local masking in natural images: A database
and analysis," Journal of Vision, July, 2014, vol. 14, no.
8, (doi:10.1167/14.8.22).
 D. M. Chandler, M. M. Alam, and T. D. Phan, "Seven
challenges for image quality research," proc. of SPIE
Human Vision & Electronic Imaging XX, Feb. 9, 2014,
(doi: 10.1117/2.1201401.005276).
 M. M. Alam, K. P. Vilankar, and D. M. Chandler, "A
database of local masking in natural images," Proc.
SPIE Human Vision and Electronic Imaging XVIII, pp.
86510G. Feb. 03, 2013, (doi:10.1117/12.2008581).
 http://darkmatternews.com/single-molecular-event-
linked-mammalian-brain-development/
 http://www.techcyn.com/upload/figure7-12.jpg
56
Contact
Md Mushfiqul Alam (Mushfiq)
mushfiqulalam@gmail.com
http://www.mushfiqulalam.com/
The dataset:
http://vision.okstate.edu/masking/
Poster:
http://www.mushfiqulalam.com/
downloads/icip-2015-poster
Codes and thesis on request
Downloads

Md Mushfiqul Alam: Biological, NeuralNet Approaches to Recognition, Gain Control

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (15)

Similar to Md Mushfiqul Alam: Biological, NeuralNet Approaches to Recognition, Gain Control

Similar to Md Mushfiqul Alam: Biological, NeuralNet Approaches to Recognition, Gain Control (20)

Recently uploaded

Recently uploaded (20)

Md Mushfiqul Alam: Biological, NeuralNet Approaches to Recognition, Gain Control

Editor's Notes