Mushfiq recently finished his PhD in Electrical and Computer Engineering from Oklahoma State University. In this video, he presents: (1) A database (the largest of its kind) created by a well-controlled psychophysical study using natural scenes, (2) How the most advanced biologically plausible model of V1 and a trained convolutional-neural-network fails to capture the recognition factors, and (3) How a computational approach can be adopted to integrate the recognition into the V1 responses. He also discusses and shows how such a model can be integrated to have a better video compression algorithm.
4. Perception
Way to interpret sensory
information
Understand environment
Cognition
Mental ability
Judgement, evaluation
Reasoning, problem solving
Decision making
Action
Leads to experience
Guided/unguided by
perception and cognition
Visual Perception
Low-level vision phenomenon:
visual masking
Recognition effect in masking
Biological plausibility of models
Focus: Perception
4
5. Perception
Way to interpret sensory
information
Understand environment
Cognition
Mental ability
Judgement, evaluation
Reasoning, problem solving
Decision making
Action
Leads to experience
Guided/unguided by
perception and cognition
Visual Perception
Low-level vision phenomenon:
visual masking
Recognition effect in masking
Biological plausibility of models
Focus: Perception
5
Intra/Inter cortical feedback
certainly effective in brain. How
do we model such feedbacks
efficiently?
6. Visual Masking
Perceptual local phenomenon: Distortion visibility
6
image distorted image
Grass looks less distorted than
child, sand, and water
8. Application: Compression
8
Encoder
Accurate prediction
of masking map
Fewer bits where
distortion less visible
Small file size
Smaller file
size, same
quality
Other applications:
Watermarking,
Texture synthesis,
Image quality
assessment
9. Flow of Talk
First: Database of Visual Masking
Second: Computational Models of Masking
Third: Application and Future
9
10. Flow of Talk
First: Database of Visual Masking
Second: Computational Models of Masking
Third: Application and Future
10
11. Traditional Stimuli
11
Stimulus: A signal shown to human
subject
Stimulus = Mask + Target
Traditionally both masks and targets are
unnatural
Pros: Well-defined features
Cons:
Cannot capture natural scenes
properties
Results not effective for natural scenes
nonlinear response of visual system
sine-wave grating
[Legge ‘80][Foley ‘94]
visual noise
[Carter ‘71]
Gabor pattern
[Foley ‘94]
Checkerboard
[Pashler ‘88]
21. Flow of Talk
First: Database of Visual Masking
Second: Computational Models of Masking
Third: Application and Future
21
22. Flow of Talk
First: Database of Visual Masking
Second: Computational Models of Masking
Model 1: Feature Regression
Model 2: Gain Control
Model 3: Convolutional Neural Net
Third: Application and Future
22
23. Performance of Individual Features
23
0.24
0.28
0.30
0.31
0.40
0.41
0.41
0.42
0.47
0.48
0.50
0.50
0.52
0.70
0.0 0.2 0.4 0.6 0.8 1.0
Kurtosis
Slope of mag spectrum
Average luminance
Orientation energy
Standard deviation
Entropy
Local entropy
Skewness
Edge density
Band energy
Micheleson contrast
Intercept of magnitude spectrum
RMS contrast
Sharpness
Pearson correlation coefficient
35. Flow of Talk
First: Database of Visual Masking
Second: Computational Models of Masking
Model 1: Feature Regression
Model 2: Gain Control
Model 3: Convolutional Neural Net
Effects of Recognition
Third: Application and Future
35
38. Recognition Effects
38
Gain-control shortcoming
Only V1 simple cells are
modeled
Cognitive studies undoubtedly
showed
Active feedbacks to V1 from
higher level cortices for
conscious perception
[Bullier ‘01][Juan ‘03]
My hypothesis: Recognition
effects in masking can be
modeled via intra/inter cortical
feedbacks
39. Recognition via Facilitation and
Inhibition (Structural Facilitation)
39
Two steps:
First: Determine if structure recognition actually affecting masking
Second: How much facilitation to incorporate
Facilitation through neuron inhibition
+
-
-
-
-
- -
- ---
-
-
- + -
-
-
- -
- -
Weak structure
higher neuron inhibition
Strong structure
lower neuron inhibition
40. Structure Detection
40
child_swimming local luminance local sharpness
local entropy
average
local texture
standard deviation
local texture
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
53. Reference image
Visually equivalent
image
Coded using 𝑄𝑃 map
For CGC+SF model
Coded using 𝑄𝑃 map
For CNN model
SSIM: 0.94,
bpp: 3.01
SSIM: 0.94
bpp: 2.60, gain 13.7%
SSIM: 0.94
bpp: 2.58, gain 14.3%
53
SSIM:0.89
bpp: 2.13
SSIM: 0.88
bpp: 1.82, gain 14.9%
SSIM: 0.88
bpp: 2.12, gain 0.5%
(lake)
(redwood)
54. SSIM: 0.96
bpp: 2.43
SSIM: 0.94
bpp: 1.69, gain 18.3%
SSIM: 0.92
bpp: 1.35, gain 35.2%
54
SSIM: 0.96
bpp: 2.49
SSIM: 0.94
bpp: 2.25, gain 9.6%
SSIM: 0.94
bpp: 2.39, gain 3.9%
(shroom)
(foxy)
Reference image
Visually equivalent
image
Coded using 𝑄𝑃 map
For CGC+SF model
Coded using 𝑄𝑃 map
For CNN model
55. Conclusions Future Challenges
55
Conclusions and Future Challenges
Largest dataset of
masking presented: usable
for model benchmarking
Accuracy of gain-control
improved via structural
facilitation and feedback
Fast CNN model of
masking developed
HEVC compression
efficiency improved
First: Discovering actual
route of feedback in visual
pathway.
Second: Developing a
CNN version of gain-
control mechanism with
feedback.
Third: What about temporal
masking?
Thank you
56. These works published in External Figure Sources
References, Contacts, Downloads
M. M. Alam, P. Patil, M. T. Hagan, and D. M. Chandler,
"A computational model for predicting local distortion
visibility via convolutional neural network trained on
natural scenes," (Accepted) IEEE ICIP 2015.
M. M. Alam, T. Nguyen, and D. M. Chandler, "A
perceptual strategy for HEVC based on a convolutional
neural network trained on natural videos," SPIE
Applications of Digital Image Processing XXXVIII, 2015.
(doi: 10.1117/12.2188913).
J. P. Evert, M. M. Alam, and D. M. Chandler, "Predicting
the visibility of dynamic DCT distortion in natural
videos," SPIE Applications of Digital Image Processing
XXXVIII, 2015. (doi: 10.1117/12.2188460)
M. M. Alam, P. Patil, M. Hagan, and D. M. Chandler,
"Relations between local and global perceptual image
quality and visual masking," SPIE Human Vision and
Electronic Imaging XX, pp. 93940M, February 08, 2015,
(doi:10.10.1117/12.2084935).
M. M. Alam, K. P. Vilankar, D. J. Field, and D. M.
Chandler, "Local masking in natural images: A database
and analysis," Journal of Vision, July, 2014, vol. 14, no.
8, (doi:10.1167/14.8.22).
D. M. Chandler, M. M. Alam, and T. D. Phan, "Seven
challenges for image quality research," proc. of SPIE
Human Vision & Electronic Imaging XX, Feb. 9, 2014,
(doi: 10.1117/2.1201401.005276).
M. M. Alam, K. P. Vilankar, and D. M. Chandler, "A
database of local masking in natural images," Proc.
SPIE Human Vision and Electronic Imaging XVIII, pp.
86510G. Feb. 03, 2013, (doi:10.1117/12.2008581).
http://darkmatternews.com/single-molecular-event-
linked-mammalian-brain-development/
http://www.techcyn.com/upload/figure7-12.jpg
56
Contact
Md Mushfiqul Alam (Mushfiq)
mushfiqulalam@gmail.com
http://www.mushfiqulalam.com/
The dataset:
http://vision.okstate.edu/masking/
Poster:
http://www.mushfiqulalam.com/
downloads/icip-2015-poster
Codes and thesis on request
Downloads