SlideShare a Scribd company logo
1 of 56
Biological, NeuralNet
Approaches to
Recognition, Gain Control
Md Mushfiqul Alam
PhD, Electrical Engineering
Oklahoma State University
Diversity of Cognitive Computing
Artificial
Intelligence
Machine
Learning
Natural
Language
Processing
Reasoning/
Question
Analysis
Neuromorphic
Chip
(TrueTnorth)
Robotics
Cognitive
ComputingNeuro
science
Feature
Engineering
2
Diversity of Cognitive Computing
Artificial
Intelligence
Machine
Learning
Natural
Language
Processing
Reasoning/
Question
Analysis
Neuromorphic
Chip
(TrueTnorth)
Robotics
Neuro
science
Feature
Engineering
3
 Perception
 Way to interpret sensory
information
 Understand environment
 Cognition
 Mental ability
 Judgement, evaluation
 Reasoning, problem solving
 Decision making
 Action
 Leads to experience
 Guided/unguided by
perception and cognition
 Visual Perception
 Low-level vision phenomenon:
visual masking
 Recognition effect in masking
 Biological plausibility of models
Focus: Perception
4
 Perception
 Way to interpret sensory
information
 Understand environment
 Cognition
 Mental ability
 Judgement, evaluation
 Reasoning, problem solving
 Decision making
 Action
 Leads to experience
 Guided/unguided by
perception and cognition
 Visual Perception
 Low-level vision phenomenon:
visual masking
 Recognition effect in masking
 Biological plausibility of models
Focus: Perception
5
Intra/Inter cortical feedback
certainly effective in brain. How
do we model such feedbacks
efficiently?
Visual Masking
 Perceptual local phenomenon: Distortion visibility
6
image distorted image
Grass looks less distorted than
child, sand, and water
Visual Masking
7
(less
masking)
(more
masking)
data from human subject
(Masking map)
Application: Compression
8
Encoder
Accurate prediction
of masking map
Fewer bits where
distortion less visible
Small file size
Smaller file
size, same
quality
Other applications:
Watermarking,
Texture synthesis,
Image quality
assessment
Flow of Talk
First: Database of Visual Masking
Second: Computational Models of Masking
Third: Application and Future
9
Flow of Talk
First: Database of Visual Masking
Second: Computational Models of Masking
Third: Application and Future
10
Traditional Stimuli
11
 Stimulus: A signal shown to human
subject
 Stimulus = Mask + Target
 Traditionally both masks and targets are
unnatural
 Pros: Well-defined features
 Cons:
 Cannot capture natural scenes
properties
 Results not effective for natural scenes
 nonlinear response of visual system
sine-wave grating
[Legge ‘80][Foley ‘94]
visual noise
[Carter ‘71]
Gabor pattern
[Foley ‘94]
Checkerboard
[Pashler ‘88]
Our Stimuli
12
(patch) (distortion)
+
Mask Target
Stimulus
(Subtend angle: 2 degrees)
Our Stimuli
13
+
(patch) (distortion)
Mask Target
Stimulus
(Subtend angle: 2 degrees)
Our Stimuli
14
+
(patch) (distortion)
Mask Target
Measure contrast
detection threshold, 𝑪 𝑻
Stimulus
(Subtend angle: 2 degrees)
Measure contrast
detection threshold, 𝑪 𝑻
Experiment Method
15
+
(patch) (distortion)
Mask Target
Stimulus
 Metric
 RMS contrast [Kingdom ‘90]
 𝑪 𝑻 = 𝟐𝟎 𝐥𝐨𝐠 𝟏𝟎
𝝈 𝑻
𝝁 𝑴
 𝜎 𝑇: Target standard deviation
 𝜇 𝑀: Average mask luminance
(cd/m2)
 Subject
Very consistent
 Pearson correlation coefficient
 Intra-subject 0.95
Inter-subject 0.92
 Number of measures
 1080 patches (Largest dataset)
 Three subject per patch
 Two runs per subject
 Six measures per patch
 Procedure
 Psychophysical Quest TAFC
[Peli ‘87]
 40 trials per run
 Dark room, CRT monitor,
14 bit resolution
Our Target
16
 Target: Log-Gabor noise
 Excites only one visual channel
 Well-accepted in vision
psychology
 [Caelli ‘86] [Teo‘94] [Watson ‘97]
[Ringach ‘09] [Geisler ‘14]
Log-Gabor filter
In~ U(0, 1)
2-D frequency
response
𝑥
𝑦
Log-Gabor noise
(3.7 cycles/degree,
vertical orientation)
Human data
[deValois ‘74]
Contrast
sensitivity
Spatial frequency (cycles/degree)
Peak approx.
4 cycles/degree
Our Masks
17
Image source: CSIQ database [Chandler ‘09]
Animal
Plant
Landscape
Urban
Structure
People
 Masks
 Six categories
 30 natural
scenes
 1080-patches
Masking Maps
18
 Category: Landscape
sunsetcolor
(more
masking)
(less
masking)
colorbar(dB)
Masking Maps
19
(more
masking)
(less
masking)
colorbar(dB)
geckos
Structural masking
[Chandler ‘09]
Entropy masking
[Waston ‘97]
 Category: Animal
Masking Maps
20
(more
masking)
(less
masking)
colorbar(dB)
trolley
 Category: Urban
Flow of Talk
First: Database of Visual Masking
Second: Computational Models of Masking
Third: Application and Future
21
Flow of Talk
First: Database of Visual Masking
Second: Computational Models of Masking
 Model 1: Feature Regression
 Model 2: Gain Control
 Model 3: Convolutional Neural Net
Third: Application and Future
22
Performance of Individual Features
23
0.24
0.28
0.30
0.31
0.40
0.41
0.41
0.42
0.47
0.48
0.50
0.50
0.52
0.70
0.0 0.2 0.4 0.6 0.8 1.0
Kurtosis
Slope of mag spectrum
Average luminance
Orientation energy
Standard deviation
Entropy
Local entropy
Skewness
Edge density
Band energy
Micheleson contrast
Intercept of magnitude spectrum
RMS contrast
Sharpness
Pearson correlation coefficient
Non-linear Feature Regression
24
-80
-60
-40
-20
0
20
-80-60-40-20020
(All data)
Pearson correlation: 0.79
RMSE : 5.38 dB
Experiment thresholds (dB)
Modelpredictions(dB)
-80
-60
-40
-20
0
20
-80-60-40-20020
(15% Test data)
Pearson correlation: 0.82
RMSE : 5.35 dB
Gain-Control Model
25
Gain-control
model
Mask
+
Target
Mask
Contrast
detection
threshold,
𝐶 𝑇
(Inputs)
(Output)
Gain-Control Model
26
CSF
Log-
Gabor
Excitatory
nonlinearity
Inhibitory
nonlinearity
Pooling
Divide
CSF
Log-
Gabor
Excitatory
nonlinearity
Inhibitory
nonlinearity
Pooling
Divide
Subtract
Minkowski
pool
d’≈d
?
Change
target
contrast
Calculate
final
threshold
No
Yes
Mask
+
Target
Mask
𝑟 𝑀+𝑇
𝑟 𝑀
Watson and Solomon [‘97] Model
Gain-Control Model
27
 Single neuron response
 At location (𝑥0, 𝑦0), frequency
𝑓0, and orientation 𝜃0:



NΙ),,,(
0000
0000
)),,,((
),,,(
),,,(




fyx
qq
p
fyxzb
fyxz
gfyxr
𝑟 Divisive response 𝑔 Output gain
𝑧 Response before division IN Neighboring inhibitory neurons
𝑝 Excitatory exponent 𝛽𝑟 Minkowski space exponent
𝑞 Inhibitory exponent 𝛽𝑓 Minkowski frequency exponent
𝑏 Semi-saturation constant 𝛽 𝜃 Minkowski orientation exponent
 Summing all neuron
responses:
rr
f
f
yx f
TMM rrd








1
,
'






























    
Gain-Control Model
28
 We varied three parameters
 Biologically plausible ranges
[Valois ‘90][Watson ‘97][Chandler ‘09]
 Brute-force search
 Computationally tractable
 Changing 𝐵𝑊𝑓 increases data dimensionality
Symbol Name Range Optimum
𝑞 Inhibitory exponent 1.05 − 4 𝟐. 𝟑𝟓
𝑔 Output gain 0.01 − 0.205 𝟎. 𝟏
𝐵𝑊𝑓 Frequency channel
bandwidth (octave)
1.5 − 3.5 𝟐. 𝟕𝟓
Gain-Control Model
29
-80
-60
-40
-20
0
20
-80-60-40-20020
Experiment thresholds (dB)
Modelpredictions(dB)
(All data)
Pearson correlation: 0.83
RMSE : 5.2 dB
-80
-60
-40
-20
0
20
-80-60-40-20020
(15% test data)
Pearson correlation: 0.86
RMSE : 5.2 dB
CNN Model
 Three layer network
 4320 data, increased by patch flipping
 654 trainable parameters
 70% training, 15% validation, and 15% test
 Committee of 50 nets
 Training Alg. Resilient backprop. [Riedmiller ‘93]
 FIRST toolbox [Pranita ‘13]
30
Input
Patch
Feature map
𝑑 𝑟
1
× 𝑑 𝑐
1 Feature maps
𝑑 𝑟
2 × 𝑑 𝑐
2 Output
1 × 1
First layer
kernel
First layer convolution kernel size
RMSE(dB)
0 10 20 30 40 50 60
6
6.5
7
7.5
8
8.5
Train+Valid
Test
19 × 19
Trained Kernels
31
 First layer convolution kernels
Better performance
Betterperformance
Convolution Outputs
32
 First layer convolution outputs
Better performance
Betterperformance
Mask patch
Firstlayerconvolutionoutputs
CNN Model
33
-80
-60
-40
-20
0
20
-80-60-40-20020
Experiment thresholds (dB)
Modelpredictions(dB)
(All data)
Pearson correlation: 0.78
RMSE : 5.5 dB
-80
-60
-40
-20
0
20
-80-60-40-20020
(15% Test data)
Pearson correlation: 0.77
RMSE : 5.5 dB
Model Comparison
34
5.4
5.2
5.5
5.4
5.2
5.5
3.5
4
4.5
5
5.5
6
Feature
based
Gain
control
CNN
RMSE(dB)
33
66
5
0
20
40
60
80
Executiontime
Perimage(sec.)
Feature
based
Gain
control
CNN
(Accuracy) (Time Complexity)
Flow of Talk
First: Database of Visual Masking
Second: Computational Models of Masking
 Model 1: Feature Regression
 Model 2: Gain Control
 Model 3: Convolutional Neural Net
 Effects of Recognition
Third: Application and Future
35
Recognition Effects
36
Experiment masking mapchild_
swimming
Over predictions
Feature-based Contrast gain control CNN model20 40 60 80 100
20
40
60
80
100 -45
-40
-35
-30
-25
-20
Recognition Effects
37
Feature-based Contrast gain control CNN model
Experiment masking mapgeckos
Over predictions
20 40 60 80 100
20
40
60
80
100 -45
-40
-35
-30
-25
Recognition Effects
38
 Gain-control shortcoming
 Only V1 simple cells are
modeled
 Cognitive studies undoubtedly
showed
 Active feedbacks to V1 from
higher level cortices for
conscious perception
[Bullier ‘01][Juan ‘03]
 My hypothesis: Recognition
effects in masking can be
modeled via intra/inter cortical
feedbacks
Recognition via Facilitation and
Inhibition (Structural Facilitation)
39
 Two steps:
 First: Determine if structure recognition actually affecting masking
 Second: How much facilitation to incorporate
 Facilitation through neuron inhibition
+
-
-
-
-
- -
- ---
-
-
- + -
-
-
- -
- -
Weak structure
higher neuron inhibition
Strong structure
lower neuron inhibition
Structure Detection
40
child_swimming local luminance local sharpness
local entropy
average
local texture
standard deviation
local texture
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Structure Detection
41
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
child_swimming local luminance local sharpness
local entropy
average
local texture
standard deviation
local texture
Structure Detection
42
 Structure map created via:
 𝐿 𝑛: local luminance, 𝑆ℎ 𝑛: local sharpness, 𝐸 𝑛: local entropy, 𝐷𝜇 𝑛
: average local
texture, 𝐷 𝜎 𝑛
: standard deviation local texture
22
)1()1( nn
DDEShLS nnn  
fisher
geckos
structure map structure map
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.090.09
0.05
0.07
0.0
0.03
0.02
0.04
0.06
0.08
0.01
foxy
child_swimming
Integrating Structural Facilitation in
Gain Control
43
Facilitation through neuron inhibition
 Inhibition multiplier 𝜆 𝑠: 0.2~1.0, depend on structure map
 𝜆 𝑠 calculated from structure map
 where 𝑆𝑖: 𝑖 𝑡ℎ
block of 𝑆, 𝑝 𝑆, 80 : 80 𝑡ℎ
percentile of 𝑆
 max 𝑆 : Strong structure
 Kurt 𝑆 : Localized structure



NI),,,(
0000
0000
)),,,((
),,,(
),,,(




fyx
q
s
q
p
fyxzb
fyxz
gfyxr






















 

 
otherwise,1
5.3)(Kurt&04.0)max(,
005.0
)80,(),(
exp1/1801
85,85
1,1,
SS
SpyxS
yx
i
is
Integrating Facilitation in Gain
Control
44
Facilitation through neuron inhibition
 Inhibition multiplier 𝜆 𝑠: 0.2~1.0, depend on structure map
 𝜆 𝑠 calculated from structure map
 where 𝑆𝑖: 𝑖 𝑡ℎ
block of 𝑆, 𝑝 𝑆, 80 : 80 𝑡ℎ
percentile of 𝑆
 max 𝑆 : Strong structure
 Kurt 𝑆 : Localized structure



NI),,,(
0000
0000
)),,,((
),,,(
),,,(




fyx
q
s
q
p
fyxzb
fyxz
gfyxr






















 

 
otherwise,1
5.3)(Kurt&04.0)max(,
005.0
)80,(),(
exp1/1801
85,85
1,1,
SS
SpyxS
yx
i
is
0 0.02 0.04 0.06 0.08
0
0.2
0.4
0.6
0.8
1
Structure, 𝑆
Inhibitionmultiplier,𝜆𝑠
Strong structure
(lower neuron
inhibition)
Weak structure
(higher neuron
inhibition)
Results with Structural Facilitation
45
Only gain-control
(Pearson
correlation: 0.68)
geckos
Experiment
map
Gain-control with
structure facilitation
(Pearson
correlation: 0.77)20 40 60 80 100
20
40
60
80
100
-35
-30
-25
-20
-15
Results with Structural Facilitation
46
Only gain-control
(Pearson
correlation: 0.58)
foxy
Experiment
map
Gain-control with
structure facilitation
(Pearson
correlation: 0.63)20 40 60 80 100
20
40
60
80
100
-35
-30
-25
-20
Results with Structural Facilitation
47
Only gain-control
(Pearson
correlation: 0.85)
child_swimming
Experiment
map
Gain-control with
structure facilitation
(Pearson
correlation: 0.87)20 40 60 80 100
20
40
60
80
100
-40
-35
-30
-25
-20
-15
Results with Structural Facilitation
48
Only gain-control
(Pearson
correlation: 0.55)
couple
Experiment
map
Gain-control with
structure facilitation
(Pearson
correlation: 0.60)20 40 60 80 100
20
40
60
80
100
-40
-35
-30
-25
-20
-15
Flow of Talk
First: Database of Visual Masking
Second: Computational Models of Masking
Third: Application and Future
49
HEVC Compression
50
CNN
Model
(HEVC)
A patch
Feed-
Forward
network
𝐶 𝑇
log(𝑄𝑠𝑡𝑒𝑝) = 𝛼𝐶 𝑇
2
+ 𝛽𝐶 𝑇 + 𝛾
𝛼, 𝛽, 𝛾
𝑄𝑃 𝑇 = max min 𝑟𝑜𝑢𝑛𝑑
log(𝑄𝑠𝑡𝑒𝑝)
log 2
1
6
+ 4 , 51 , 0
 Better Quantization Scheme for HEVC video compression
 Block based QP prediction
 Based on detection threshold 𝐶 𝑇 – CNN model
 Fast prediction (2.2 sec/image, 106x faster than gain-control)
HEVC Compression
51
-60 -40 -20 0 20
-60
-50
-40
-30
-20
-10
0
10
20
-60 -40 -20 0 20
-60
-50
-40
-30
-20
-10
0
10
20
(a) Training + Validation
PCC: 0.95
(b) Testing, PCC: 0.93
Ground truth distortion visibilities:
Outputs of Gaincontrol+Structural facilitation (CGC+SF) model (dB)
CNNpredictions(dB)
Quantization Threshold Map
52
𝑄𝑃 map from CGC+SF 𝑄𝑃 map from CNN
redwood
log_
seaside
0
10
20
30
40
50
Reference image
Visually equivalent
image
Coded using 𝑄𝑃 map
For CGC+SF model
Coded using 𝑄𝑃 map
For CNN model
SSIM: 0.94,
bpp: 3.01
SSIM: 0.94
bpp: 2.60, gain 13.7%
SSIM: 0.94
bpp: 2.58, gain 14.3%
53
SSIM:0.89
bpp: 2.13
SSIM: 0.88
bpp: 1.82, gain 14.9%
SSIM: 0.88
bpp: 2.12, gain 0.5%
(lake)
(redwood)
SSIM: 0.96
bpp: 2.43
SSIM: 0.94
bpp: 1.69, gain 18.3%
SSIM: 0.92
bpp: 1.35, gain 35.2%
54
SSIM: 0.96
bpp: 2.49
SSIM: 0.94
bpp: 2.25, gain 9.6%
SSIM: 0.94
bpp: 2.39, gain 3.9%
(shroom)
(foxy)
Reference image
Visually equivalent
image
Coded using 𝑄𝑃 map
For CGC+SF model
Coded using 𝑄𝑃 map
For CNN model
Conclusions Future Challenges
55
Conclusions and Future Challenges
 Largest dataset of
masking presented: usable
for model benchmarking
 Accuracy of gain-control
improved via structural
facilitation and feedback
 Fast CNN model of
masking developed
 HEVC compression
efficiency improved
 First: Discovering actual
route of feedback in visual
pathway.
 Second: Developing a
CNN version of gain-
control mechanism with
feedback.
 Third: What about temporal
masking?
Thank you
These works published in External Figure Sources
References, Contacts, Downloads
 M. M. Alam, P. Patil, M. T. Hagan, and D. M. Chandler,
"A computational model for predicting local distortion
visibility via convolutional neural network trained on
natural scenes," (Accepted) IEEE ICIP 2015.
 M. M. Alam, T. Nguyen, and D. M. Chandler, "A
perceptual strategy for HEVC based on a convolutional
neural network trained on natural videos," SPIE
Applications of Digital Image Processing XXXVIII, 2015.
(doi: 10.1117/12.2188913).
 J. P. Evert, M. M. Alam, and D. M. Chandler, "Predicting
the visibility of dynamic DCT distortion in natural
videos," SPIE Applications of Digital Image Processing
XXXVIII, 2015. (doi: 10.1117/12.2188460)
 M. M. Alam, P. Patil, M. Hagan, and D. M. Chandler,
"Relations between local and global perceptual image
quality and visual masking," SPIE Human Vision and
Electronic Imaging XX, pp. 93940M, February 08, 2015,
(doi:10.10.1117/12.2084935).
 M. M. Alam, K. P. Vilankar, D. J. Field, and D. M.
Chandler, "Local masking in natural images: A database
and analysis," Journal of Vision, July, 2014, vol. 14, no.
8, (doi:10.1167/14.8.22).
 D. M. Chandler, M. M. Alam, and T. D. Phan, "Seven
challenges for image quality research," proc. of SPIE
Human Vision & Electronic Imaging XX, Feb. 9, 2014,
(doi: 10.1117/2.1201401.005276).
 M. M. Alam, K. P. Vilankar, and D. M. Chandler, "A
database of local masking in natural images," Proc.
SPIE Human Vision and Electronic Imaging XVIII, pp.
86510G. Feb. 03, 2013, (doi:10.1117/12.2008581).
 http://darkmatternews.com/single-molecular-event-
linked-mammalian-brain-development/
 http://www.techcyn.com/upload/figure7-12.jpg
56
Contact
Md Mushfiqul Alam (Mushfiq)
mushfiqulalam@gmail.com
http://www.mushfiqulalam.com/
The dataset:
http://vision.okstate.edu/masking/
Poster:
http://www.mushfiqulalam.com/
downloads/icip-2015-poster
Codes and thesis on request
Downloads

More Related Content

Viewers also liked (15)

Top 10 social networking websites
Top 10 social networking websitesTop 10 social networking websites
Top 10 social networking websites
 
Mbasalarysurvey2010 iimjobs
Mbasalarysurvey2010 iimjobsMbasalarysurvey2010 iimjobs
Mbasalarysurvey2010 iimjobs
 
Atam2014 manager agile
Atam2014 manager agileAtam2014 manager agile
Atam2014 manager agile
 
Formulating Ec Strategy
Formulating Ec StrategyFormulating Ec Strategy
Formulating Ec Strategy
 
Intermarkt tg
Intermarkt tgIntermarkt tg
Intermarkt tg
 
An Overview of High Efficiency Video Codec HEVC (H.265)
An Overview of High Efficiency Video Codec HEVC (H.265)An Overview of High Efficiency Video Codec HEVC (H.265)
An Overview of High Efficiency Video Codec HEVC (H.265)
 
H.264 vs HEVC
H.264 vs HEVCH.264 vs HEVC
H.264 vs HEVC
 
Team Exercise In Perception
Team Exercise In PerceptionTeam Exercise In Perception
Team Exercise In Perception
 
Mk212powerpoint
Mk212powerpointMk212powerpoint
Mk212powerpoint
 
Autocad Plant 3d 2010 Brochure Uk
Autocad Plant 3d 2010 Brochure UkAutocad Plant 3d 2010 Brochure Uk
Autocad Plant 3d 2010 Brochure Uk
 
Lecture12 2 13
Lecture12 2 13Lecture12 2 13
Lecture12 2 13
 
Occipital lobe ppt
Occipital lobe pptOccipital lobe ppt
Occipital lobe ppt
 
Presentation Mobiletag 270411
Presentation Mobiletag 270411Presentation Mobiletag 270411
Presentation Mobiletag 270411
 
Presentation Dailymotion 090311
Presentation  Dailymotion 090311Presentation  Dailymotion 090311
Presentation Dailymotion 090311
 
Cortex cerebral
Cortex cerebralCortex cerebral
Cortex cerebral
 

Similar to Md Mushfiqul Alam: Biological, NeuralNet Approaches to Recognition, Gain Control

An Introduction to boosting
An Introduction to boostingAn Introduction to boosting
An Introduction to boosting
butest
 
Image quality, digital technology and radiation protection
Image quality, digital technology and radiation protectionImage quality, digital technology and radiation protection
Image quality, digital technology and radiation protection
Rad Tech
 
Introduction
IntroductionIntroduction
Introduction
butest
 
TOWARDS OPTIMALITY OF IMAGE SEGMENTATION PART- I
TOWARDS OPTIMALITY OF IMAGE SEGMENTATION PART- ITOWARDS OPTIMALITY OF IMAGE SEGMENTATION PART- I
TOWARDS OPTIMALITY OF IMAGE SEGMENTATION PART- I
Anish Acharya
 
Combinatorial Problems2
Combinatorial Problems2Combinatorial Problems2
Combinatorial Problems2
3ashmawy
 
NIPS2007: structured prediction
NIPS2007: structured predictionNIPS2007: structured prediction
NIPS2007: structured prediction
zukun
 
Eurocon2009 Apalkov
Eurocon2009 ApalkovEurocon2009 Apalkov
Eurocon2009 Apalkov
Khryashchev
 
Simulation and hardware implementation of Adaptive algorithms on tms320 c6713...
Simulation and hardware implementation of Adaptive algorithms on tms320 c6713...Simulation and hardware implementation of Adaptive algorithms on tms320 c6713...
Simulation and hardware implementation of Adaptive algorithms on tms320 c6713...
Raj Kumar Thenua
 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech Enhancement
NAVER Engineering
 

Similar to Md Mushfiqul Alam: Biological, NeuralNet Approaches to Recognition, Gain Control (20)

An Introduction to boosting
An Introduction to boostingAn Introduction to boosting
An Introduction to boosting
 
Petar Petrov MSc thesis defense
Petar Petrov MSc thesis defensePetar Petrov MSc thesis defense
Petar Petrov MSc thesis defense
 
Image quality, digital technology and radiation protection
Image quality, digital technology and radiation protectionImage quality, digital technology and radiation protection
Image quality, digital technology and radiation protection
 
Introduction
IntroductionIntroduction
Introduction
 
Dataday Texas 2016 - Datadog
Dataday Texas 2016 - DatadogDataday Texas 2016 - Datadog
Dataday Texas 2016 - Datadog
 
Vladimir Milov and Andrey Savchenko - Classification of Dangerous Situations...
Vladimir Milov and  Andrey Savchenko - Classification of Dangerous Situations...Vladimir Milov and  Andrey Savchenko - Classification of Dangerous Situations...
Vladimir Milov and Andrey Savchenko - Classification of Dangerous Situations...
 
Edge Detection with Detail Preservation for RVIN Using Adaptive Threshold Fil...
Edge Detection with Detail Preservation for RVIN Using Adaptive Threshold Fil...Edge Detection with Detail Preservation for RVIN Using Adaptive Threshold Fil...
Edge Detection with Detail Preservation for RVIN Using Adaptive Threshold Fil...
 
K010615562
K010615562K010615562
K010615562
 
TOWARDS OPTIMALITY OF IMAGE SEGMENTATION PART- I
TOWARDS OPTIMALITY OF IMAGE SEGMENTATION PART- ITOWARDS OPTIMALITY OF IMAGE SEGMENTATION PART- I
TOWARDS OPTIMALITY OF IMAGE SEGMENTATION PART- I
 
2012 predictive clusters
2012 predictive clusters2012 predictive clusters
2012 predictive clusters
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Combinatorial Problems2
Combinatorial Problems2Combinatorial Problems2
Combinatorial Problems2
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
 
Event classification & prediction using support vector machine
Event classification & prediction using support vector machineEvent classification & prediction using support vector machine
Event classification & prediction using support vector machine
 
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
 
NIPS2007: structured prediction
NIPS2007: structured predictionNIPS2007: structured prediction
NIPS2007: structured prediction
 
Eurocon2009 Apalkov
Eurocon2009 ApalkovEurocon2009 Apalkov
Eurocon2009 Apalkov
 
Simulation and hardware implementation of Adaptive algorithms on tms320 c6713...
Simulation and hardware implementation of Adaptive algorithms on tms320 c6713...Simulation and hardware implementation of Adaptive algorithms on tms320 c6713...
Simulation and hardware implementation of Adaptive algorithms on tms320 c6713...
 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech Enhancement
 
Improving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsImproving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN Applications
 

Recently uploaded

Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
dharasingh5698
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 

Recently uploaded (20)

Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 

Md Mushfiqul Alam: Biological, NeuralNet Approaches to Recognition, Gain Control

  • 1. Biological, NeuralNet Approaches to Recognition, Gain Control Md Mushfiqul Alam PhD, Electrical Engineering Oklahoma State University
  • 2. Diversity of Cognitive Computing Artificial Intelligence Machine Learning Natural Language Processing Reasoning/ Question Analysis Neuromorphic Chip (TrueTnorth) Robotics Cognitive ComputingNeuro science Feature Engineering 2
  • 3. Diversity of Cognitive Computing Artificial Intelligence Machine Learning Natural Language Processing Reasoning/ Question Analysis Neuromorphic Chip (TrueTnorth) Robotics Neuro science Feature Engineering 3
  • 4.  Perception  Way to interpret sensory information  Understand environment  Cognition  Mental ability  Judgement, evaluation  Reasoning, problem solving  Decision making  Action  Leads to experience  Guided/unguided by perception and cognition  Visual Perception  Low-level vision phenomenon: visual masking  Recognition effect in masking  Biological plausibility of models Focus: Perception 4
  • 5.  Perception  Way to interpret sensory information  Understand environment  Cognition  Mental ability  Judgement, evaluation  Reasoning, problem solving  Decision making  Action  Leads to experience  Guided/unguided by perception and cognition  Visual Perception  Low-level vision phenomenon: visual masking  Recognition effect in masking  Biological plausibility of models Focus: Perception 5 Intra/Inter cortical feedback certainly effective in brain. How do we model such feedbacks efficiently?
  • 6. Visual Masking  Perceptual local phenomenon: Distortion visibility 6 image distorted image Grass looks less distorted than child, sand, and water
  • 8. Application: Compression 8 Encoder Accurate prediction of masking map Fewer bits where distortion less visible Small file size Smaller file size, same quality Other applications: Watermarking, Texture synthesis, Image quality assessment
  • 9. Flow of Talk First: Database of Visual Masking Second: Computational Models of Masking Third: Application and Future 9
  • 10. Flow of Talk First: Database of Visual Masking Second: Computational Models of Masking Third: Application and Future 10
  • 11. Traditional Stimuli 11  Stimulus: A signal shown to human subject  Stimulus = Mask + Target  Traditionally both masks and targets are unnatural  Pros: Well-defined features  Cons:  Cannot capture natural scenes properties  Results not effective for natural scenes  nonlinear response of visual system sine-wave grating [Legge ‘80][Foley ‘94] visual noise [Carter ‘71] Gabor pattern [Foley ‘94] Checkerboard [Pashler ‘88]
  • 12. Our Stimuli 12 (patch) (distortion) + Mask Target Stimulus (Subtend angle: 2 degrees)
  • 13. Our Stimuli 13 + (patch) (distortion) Mask Target Stimulus (Subtend angle: 2 degrees)
  • 14. Our Stimuli 14 + (patch) (distortion) Mask Target Measure contrast detection threshold, 𝑪 𝑻 Stimulus (Subtend angle: 2 degrees)
  • 15. Measure contrast detection threshold, 𝑪 𝑻 Experiment Method 15 + (patch) (distortion) Mask Target Stimulus  Metric  RMS contrast [Kingdom ‘90]  𝑪 𝑻 = 𝟐𝟎 𝐥𝐨𝐠 𝟏𝟎 𝝈 𝑻 𝝁 𝑴  𝜎 𝑇: Target standard deviation  𝜇 𝑀: Average mask luminance (cd/m2)  Subject Very consistent  Pearson correlation coefficient  Intra-subject 0.95 Inter-subject 0.92  Number of measures  1080 patches (Largest dataset)  Three subject per patch  Two runs per subject  Six measures per patch  Procedure  Psychophysical Quest TAFC [Peli ‘87]  40 trials per run  Dark room, CRT monitor, 14 bit resolution
  • 16. Our Target 16  Target: Log-Gabor noise  Excites only one visual channel  Well-accepted in vision psychology  [Caelli ‘86] [Teo‘94] [Watson ‘97] [Ringach ‘09] [Geisler ‘14] Log-Gabor filter In~ U(0, 1) 2-D frequency response 𝑥 𝑦 Log-Gabor noise (3.7 cycles/degree, vertical orientation) Human data [deValois ‘74] Contrast sensitivity Spatial frequency (cycles/degree) Peak approx. 4 cycles/degree
  • 17. Our Masks 17 Image source: CSIQ database [Chandler ‘09] Animal Plant Landscape Urban Structure People  Masks  Six categories  30 natural scenes  1080-patches
  • 18. Masking Maps 18  Category: Landscape sunsetcolor (more masking) (less masking) colorbar(dB)
  • 21. Flow of Talk First: Database of Visual Masking Second: Computational Models of Masking Third: Application and Future 21
  • 22. Flow of Talk First: Database of Visual Masking Second: Computational Models of Masking  Model 1: Feature Regression  Model 2: Gain Control  Model 3: Convolutional Neural Net Third: Application and Future 22
  • 23. Performance of Individual Features 23 0.24 0.28 0.30 0.31 0.40 0.41 0.41 0.42 0.47 0.48 0.50 0.50 0.52 0.70 0.0 0.2 0.4 0.6 0.8 1.0 Kurtosis Slope of mag spectrum Average luminance Orientation energy Standard deviation Entropy Local entropy Skewness Edge density Band energy Micheleson contrast Intercept of magnitude spectrum RMS contrast Sharpness Pearson correlation coefficient
  • 24. Non-linear Feature Regression 24 -80 -60 -40 -20 0 20 -80-60-40-20020 (All data) Pearson correlation: 0.79 RMSE : 5.38 dB Experiment thresholds (dB) Modelpredictions(dB) -80 -60 -40 -20 0 20 -80-60-40-20020 (15% Test data) Pearson correlation: 0.82 RMSE : 5.35 dB
  • 27. Gain-Control Model 27  Single neuron response  At location (𝑥0, 𝑦0), frequency 𝑓0, and orientation 𝜃0:    NΙ),,,( 0000 0000 )),,,(( ),,,( ),,,(     fyx qq p fyxzb fyxz gfyxr 𝑟 Divisive response 𝑔 Output gain 𝑧 Response before division IN Neighboring inhibitory neurons 𝑝 Excitatory exponent 𝛽𝑟 Minkowski space exponent 𝑞 Inhibitory exponent 𝛽𝑓 Minkowski frequency exponent 𝑏 Semi-saturation constant 𝛽 𝜃 Minkowski orientation exponent  Summing all neuron responses: rr f f yx f TMM rrd         1 , '                                   
  • 28. Gain-Control Model 28  We varied three parameters  Biologically plausible ranges [Valois ‘90][Watson ‘97][Chandler ‘09]  Brute-force search  Computationally tractable  Changing 𝐵𝑊𝑓 increases data dimensionality Symbol Name Range Optimum 𝑞 Inhibitory exponent 1.05 − 4 𝟐. 𝟑𝟓 𝑔 Output gain 0.01 − 0.205 𝟎. 𝟏 𝐵𝑊𝑓 Frequency channel bandwidth (octave) 1.5 − 3.5 𝟐. 𝟕𝟓
  • 29. Gain-Control Model 29 -80 -60 -40 -20 0 20 -80-60-40-20020 Experiment thresholds (dB) Modelpredictions(dB) (All data) Pearson correlation: 0.83 RMSE : 5.2 dB -80 -60 -40 -20 0 20 -80-60-40-20020 (15% test data) Pearson correlation: 0.86 RMSE : 5.2 dB
  • 30. CNN Model  Three layer network  4320 data, increased by patch flipping  654 trainable parameters  70% training, 15% validation, and 15% test  Committee of 50 nets  Training Alg. Resilient backprop. [Riedmiller ‘93]  FIRST toolbox [Pranita ‘13] 30 Input Patch Feature map 𝑑 𝑟 1 × 𝑑 𝑐 1 Feature maps 𝑑 𝑟 2 × 𝑑 𝑐 2 Output 1 × 1 First layer kernel First layer convolution kernel size RMSE(dB) 0 10 20 30 40 50 60 6 6.5 7 7.5 8 8.5 Train+Valid Test 19 × 19
  • 31. Trained Kernels 31  First layer convolution kernels Better performance Betterperformance
  • 32. Convolution Outputs 32  First layer convolution outputs Better performance Betterperformance Mask patch Firstlayerconvolutionoutputs
  • 33. CNN Model 33 -80 -60 -40 -20 0 20 -80-60-40-20020 Experiment thresholds (dB) Modelpredictions(dB) (All data) Pearson correlation: 0.78 RMSE : 5.5 dB -80 -60 -40 -20 0 20 -80-60-40-20020 (15% Test data) Pearson correlation: 0.77 RMSE : 5.5 dB
  • 35. Flow of Talk First: Database of Visual Masking Second: Computational Models of Masking  Model 1: Feature Regression  Model 2: Gain Control  Model 3: Convolutional Neural Net  Effects of Recognition Third: Application and Future 35
  • 36. Recognition Effects 36 Experiment masking mapchild_ swimming Over predictions Feature-based Contrast gain control CNN model20 40 60 80 100 20 40 60 80 100 -45 -40 -35 -30 -25 -20
  • 37. Recognition Effects 37 Feature-based Contrast gain control CNN model Experiment masking mapgeckos Over predictions 20 40 60 80 100 20 40 60 80 100 -45 -40 -35 -30 -25
  • 38. Recognition Effects 38  Gain-control shortcoming  Only V1 simple cells are modeled  Cognitive studies undoubtedly showed  Active feedbacks to V1 from higher level cortices for conscious perception [Bullier ‘01][Juan ‘03]  My hypothesis: Recognition effects in masking can be modeled via intra/inter cortical feedbacks
  • 39. Recognition via Facilitation and Inhibition (Structural Facilitation) 39  Two steps:  First: Determine if structure recognition actually affecting masking  Second: How much facilitation to incorporate  Facilitation through neuron inhibition + - - - - - - - --- - - - + - - - - - - - Weak structure higher neuron inhibition Strong structure lower neuron inhibition
  • 40. Structure Detection 40 child_swimming local luminance local sharpness local entropy average local texture standard deviation local texture 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
  • 41. Structure Detection 41 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 child_swimming local luminance local sharpness local entropy average local texture standard deviation local texture
  • 42. Structure Detection 42  Structure map created via:  𝐿 𝑛: local luminance, 𝑆ℎ 𝑛: local sharpness, 𝐸 𝑛: local entropy, 𝐷𝜇 𝑛 : average local texture, 𝐷 𝜎 𝑛 : standard deviation local texture 22 )1()1( nn DDEShLS nnn   fisher geckos structure map structure map 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090.09 0.05 0.07 0.0 0.03 0.02 0.04 0.06 0.08 0.01 foxy child_swimming
  • 43. Integrating Structural Facilitation in Gain Control 43 Facilitation through neuron inhibition  Inhibition multiplier 𝜆 𝑠: 0.2~1.0, depend on structure map  𝜆 𝑠 calculated from structure map  where 𝑆𝑖: 𝑖 𝑡ℎ block of 𝑆, 𝑝 𝑆, 80 : 80 𝑡ℎ percentile of 𝑆  max 𝑆 : Strong structure  Kurt 𝑆 : Localized structure    NI),,,( 0000 0000 )),,,(( ),,,( ),,,(     fyx q s q p fyxzb fyxz gfyxr                            otherwise,1 5.3)(Kurt&04.0)max(, 005.0 )80,(),( exp1/1801 85,85 1,1, SS SpyxS yx i is
  • 44. Integrating Facilitation in Gain Control 44 Facilitation through neuron inhibition  Inhibition multiplier 𝜆 𝑠: 0.2~1.0, depend on structure map  𝜆 𝑠 calculated from structure map  where 𝑆𝑖: 𝑖 𝑡ℎ block of 𝑆, 𝑝 𝑆, 80 : 80 𝑡ℎ percentile of 𝑆  max 𝑆 : Strong structure  Kurt 𝑆 : Localized structure    NI),,,( 0000 0000 )),,,(( ),,,( ),,,(     fyx q s q p fyxzb fyxz gfyxr                            otherwise,1 5.3)(Kurt&04.0)max(, 005.0 )80,(),( exp1/1801 85,85 1,1, SS SpyxS yx i is 0 0.02 0.04 0.06 0.08 0 0.2 0.4 0.6 0.8 1 Structure, 𝑆 Inhibitionmultiplier,𝜆𝑠 Strong structure (lower neuron inhibition) Weak structure (higher neuron inhibition)
  • 45. Results with Structural Facilitation 45 Only gain-control (Pearson correlation: 0.68) geckos Experiment map Gain-control with structure facilitation (Pearson correlation: 0.77)20 40 60 80 100 20 40 60 80 100 -35 -30 -25 -20 -15
  • 46. Results with Structural Facilitation 46 Only gain-control (Pearson correlation: 0.58) foxy Experiment map Gain-control with structure facilitation (Pearson correlation: 0.63)20 40 60 80 100 20 40 60 80 100 -35 -30 -25 -20
  • 47. Results with Structural Facilitation 47 Only gain-control (Pearson correlation: 0.85) child_swimming Experiment map Gain-control with structure facilitation (Pearson correlation: 0.87)20 40 60 80 100 20 40 60 80 100 -40 -35 -30 -25 -20 -15
  • 48. Results with Structural Facilitation 48 Only gain-control (Pearson correlation: 0.55) couple Experiment map Gain-control with structure facilitation (Pearson correlation: 0.60)20 40 60 80 100 20 40 60 80 100 -40 -35 -30 -25 -20 -15
  • 49. Flow of Talk First: Database of Visual Masking Second: Computational Models of Masking Third: Application and Future 49
  • 50. HEVC Compression 50 CNN Model (HEVC) A patch Feed- Forward network 𝐶 𝑇 log(𝑄𝑠𝑡𝑒𝑝) = 𝛼𝐶 𝑇 2 + 𝛽𝐶 𝑇 + 𝛾 𝛼, 𝛽, 𝛾 𝑄𝑃 𝑇 = max min 𝑟𝑜𝑢𝑛𝑑 log(𝑄𝑠𝑡𝑒𝑝) log 2 1 6 + 4 , 51 , 0  Better Quantization Scheme for HEVC video compression  Block based QP prediction  Based on detection threshold 𝐶 𝑇 – CNN model  Fast prediction (2.2 sec/image, 106x faster than gain-control)
  • 51. HEVC Compression 51 -60 -40 -20 0 20 -60 -50 -40 -30 -20 -10 0 10 20 -60 -40 -20 0 20 -60 -50 -40 -30 -20 -10 0 10 20 (a) Training + Validation PCC: 0.95 (b) Testing, PCC: 0.93 Ground truth distortion visibilities: Outputs of Gaincontrol+Structural facilitation (CGC+SF) model (dB) CNNpredictions(dB)
  • 52. Quantization Threshold Map 52 𝑄𝑃 map from CGC+SF 𝑄𝑃 map from CNN redwood log_ seaside 0 10 20 30 40 50
  • 53. Reference image Visually equivalent image Coded using 𝑄𝑃 map For CGC+SF model Coded using 𝑄𝑃 map For CNN model SSIM: 0.94, bpp: 3.01 SSIM: 0.94 bpp: 2.60, gain 13.7% SSIM: 0.94 bpp: 2.58, gain 14.3% 53 SSIM:0.89 bpp: 2.13 SSIM: 0.88 bpp: 1.82, gain 14.9% SSIM: 0.88 bpp: 2.12, gain 0.5% (lake) (redwood)
  • 54. SSIM: 0.96 bpp: 2.43 SSIM: 0.94 bpp: 1.69, gain 18.3% SSIM: 0.92 bpp: 1.35, gain 35.2% 54 SSIM: 0.96 bpp: 2.49 SSIM: 0.94 bpp: 2.25, gain 9.6% SSIM: 0.94 bpp: 2.39, gain 3.9% (shroom) (foxy) Reference image Visually equivalent image Coded using 𝑄𝑃 map For CGC+SF model Coded using 𝑄𝑃 map For CNN model
  • 55. Conclusions Future Challenges 55 Conclusions and Future Challenges  Largest dataset of masking presented: usable for model benchmarking  Accuracy of gain-control improved via structural facilitation and feedback  Fast CNN model of masking developed  HEVC compression efficiency improved  First: Discovering actual route of feedback in visual pathway.  Second: Developing a CNN version of gain- control mechanism with feedback.  Third: What about temporal masking? Thank you
  • 56. These works published in External Figure Sources References, Contacts, Downloads  M. M. Alam, P. Patil, M. T. Hagan, and D. M. Chandler, "A computational model for predicting local distortion visibility via convolutional neural network trained on natural scenes," (Accepted) IEEE ICIP 2015.  M. M. Alam, T. Nguyen, and D. M. Chandler, "A perceptual strategy for HEVC based on a convolutional neural network trained on natural videos," SPIE Applications of Digital Image Processing XXXVIII, 2015. (doi: 10.1117/12.2188913).  J. P. Evert, M. M. Alam, and D. M. Chandler, "Predicting the visibility of dynamic DCT distortion in natural videos," SPIE Applications of Digital Image Processing XXXVIII, 2015. (doi: 10.1117/12.2188460)  M. M. Alam, P. Patil, M. Hagan, and D. M. Chandler, "Relations between local and global perceptual image quality and visual masking," SPIE Human Vision and Electronic Imaging XX, pp. 93940M, February 08, 2015, (doi:10.10.1117/12.2084935).  M. M. Alam, K. P. Vilankar, D. J. Field, and D. M. Chandler, "Local masking in natural images: A database and analysis," Journal of Vision, July, 2014, vol. 14, no. 8, (doi:10.1167/14.8.22).  D. M. Chandler, M. M. Alam, and T. D. Phan, "Seven challenges for image quality research," proc. of SPIE Human Vision & Electronic Imaging XX, Feb. 9, 2014, (doi: 10.1117/2.1201401.005276).  M. M. Alam, K. P. Vilankar, and D. M. Chandler, "A database of local masking in natural images," Proc. SPIE Human Vision and Electronic Imaging XVIII, pp. 86510G. Feb. 03, 2013, (doi:10.1117/12.2008581).  http://darkmatternews.com/single-molecular-event- linked-mammalian-brain-development/  http://www.techcyn.com/upload/figure7-12.jpg 56 Contact Md Mushfiqul Alam (Mushfiq) mushfiqulalam@gmail.com http://www.mushfiqulalam.com/ The dataset: http://vision.okstate.edu/masking/ Poster: http://www.mushfiqulalam.com/ downloads/icip-2015-poster Codes and thesis on request Downloads

Editor's Notes

  1. Qualification , inst
  2. Ask dev time. 20 approx.
  3. Upload thesis Come 6.30 pm Webpage linkedIn