Finding the best solution for
Image Processing
Presented By : Pranjut Gogoi & Shubham
Goyal
2
Our Agenda
01 Image Processing history
02 Different Approaches
03 Residual Neural Networks
04 Performances
05 Ongoing researches
3
About Knoldus MachineX
MachineX is a group of data wizards.
We are a team of Data Scientist and engineers with a
product mindset who deliver competitive business
advantage.
4
An Intelligent
Meeting Assistant
Application
Record Videos
View DashBoard
5
6
An Intelligent
marketing tool
FishEye
FishEye
7
Machine learning library
in scala
KSAI
8
Enable organizations to
capture new value
and business capabilities
Innovation Labs
Consistently blogging, to
share our knowledge,
research
Blogs
Deeplearning, Coursera,
Stanford certified
professionals
Certifications
Insight & perspective to help
you to make right business
decisions
TOK Sessions
It’s great to contribute back
to the community. We
continuously advance open
source technologies to meet
demanding business
requirements.
Open Source
Contribution
Finding the best solution for
Image Processing
10
Image processing
11
Image processing History
Traditional way
12
Traditional Way
Traditional pipeline for image classification involves two
modules
● Feature extraction
● Classification
13
Problems
The problem with this pipeline
● Feature extraction cannot be tweaked according to
the classes and images
● Completely different from how we humans learn to
recognize things.
Convolutional Neural Network
(CNN, or ConvNet)
15
● Convolutional base
● Classifier
Transfer learning
17
The Application of
skills, knowledge,
and/or attitudes that
were learned in one
situation to another
learning situation
transfer learning is usually
expressed through the use of
pre-trained models
18
19
Problems
The problem was
● less learned rate in each generation
● Number of knowledge amount passed down was
less
20
21
Difference
Understanding various architectures of
Convolutional Networks
ResNet, AlexNet, VGGNet, Inception
23
ImageNet Large Scale Visual Recognition Challenge
(ILSVRC)
CNN architectures of ILSVRC
top competitors
24
AlexNet
● 5 Convolutional (CONV) layers and 3 Fully Connected (FC) layers
● 62 million trainable variables
25
AlexNet
26
AlexNet
● Data augmentation is carried out to reduce overfitting
● Used Relu which achieved 25% error rate about 6 times faster
than the same network with tanh nonlinearity.
● AlexNet introduced Local Response Normalization (LRN) to
help with the vanishing gradient problem
27
VGGNet
● VGG16 has a total of 138 million parameters
● Conv kernels are of size 3x3 and maxpool kernels are of size 2x2 with
stride of two
28
VGGNet
29
VGGNet
● It is painfully slow to train.
● Spatial pooling is carried out by five max-pooling layers, which
follow some of the conv. layers
30
ResNet : Deep Residual learning
32
Hierarchical Features and role of Depth
● Low, Mid , and High-level features
● More layers enrich the “levels” of the features
● Previous ImageNet models have depths of 16 and 30
layers
Is learning better networks as easy as
stacking more layers ?
34
Adding layers to deep
Convolutional neural nets
35
Construction Insight
● Consider a shallow architecture and its deeper
counterpart
● The deeper model would would just need to copy the
shallower model with identity mapping
● Construction solution suggests that a deeper model
should produce no higher training error that its shallow
counterpart
36
Residual Functions
● We explicitly reformulate the layers as learning residual functions
with reference to the layer inputs, instead of learning
unreferenced functions
● H[x] = F[x] + x
37
38
Residual vs Plain
39
Experiment
● 152 layer Layers on ImageNet
○ 8* Deeper than VGGNet
○ Less parameters
● ResNet achieve 3.57% error on Imagenet test
○ 1st place in ILSVRC
40
Results
● AlexNet and ResNet-152, both have about 60M parameters but there is
about 10% difference in their top-5 accuracy
● VGGNet not only has a higher number of parameters and FLOP as compared
to ResNet-152, but also has a decreased accuracy
● Training an AlexNet takes about the same time as training Inception (10
times less memory requirements)
41
Clinic Assistant
● Notebook http://bit.ly/2D2LOQT
● Web App https://virtual-clinic.onrender.com
42
History and its importance
● Origin of CNN(1980s-1999)
● Stagnation of CNN(Early 2000)
● Revival of CNN (2006-2011)
● Rise of CNN (2012-2014)
● Rapid increase in Architectural Innovations (2015-present)
● Important because we are not done yet.
43
Taxonomy of deep CNN
44
Spatial Exploitation based CNNs
● LeNet
● AlexNet
● ZefNet
● VGG
● GoogleNet
45
Depth based CNNs
● Highway Networks
● ResNet
● Inception-V3/V4
● Inception-ResNet
● ResNext
46
Multi-path based CNNs
● Highway Nets
● ResNet
● DenseNet
47
Width based CNNs
● WideResNet
● Pyramidal Net
● Xception
● Inception Family
48
Feature map exploitation based CNNs
● Squeeze and Excitation
● Competitive Squeeze and Excitation
49
Channel boosting
● Channel boosted using TL
50
Attention based CNNs
● Residual Attention Neural Network
● Convolutional block attention
● Concurrent Squeeze and Excitation
51
Improvement summary
● Learning capacity of CNN is significantly improved over
the years by exploiting depth and other structural
modifications.
○ Activation, loss function, optimization, regularization,
learning algorithms, and restructuring of processing
units.
● Major improvement on CNN
○ Main boost in CNN performance has been achieved by
replacing the conventional layer structure with blocks
52
Challenge Exists
● Deep NN are generally like a black box and thus may lack
in interpretation and explanation
● Each layer of CNN automatically tries to extract better and
problem specific features related to the task
● Deep CNNs are based on supervised learning
mechanism, and therefore, availability of a large and
annotated data is required for its proper learning
● Hyperparameter selection highly influences the
performance of CNN
● Efficient training of CNN demands powerful hardware
resources such as GPUs.
53
Future of research
● Ensemble learning
● Attention modeling
● Generative learning
54
References
● [1]. A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional
neural networks. In Advances in neural information processing systems,pages 1097–1105,2012.
● [2]. K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. arXiv preprint
arXiv:1512.03385,2015.
● [3]. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image
recognition. arXiv preprint arXiv:1409.1556,2014.
● [4]. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A.
Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition,pages 1–9,2015.
● https://arxiv.org/pdf/1901.06032.pdf
55
Thank You

Finding the best solution for Image Processing

  • 1.
    Finding the bestsolution for Image Processing Presented By : Pranjut Gogoi & Shubham Goyal
  • 2.
    2 Our Agenda 01 ImageProcessing history 02 Different Approaches 03 Residual Neural Networks 04 Performances 05 Ongoing researches
  • 3.
    3 About Knoldus MachineX MachineXis a group of data wizards. We are a team of Data Scientist and engineers with a product mindset who deliver competitive business advantage.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
    8 Enable organizations to capturenew value and business capabilities Innovation Labs Consistently blogging, to share our knowledge, research Blogs Deeplearning, Coursera, Stanford certified professionals Certifications Insight & perspective to help you to make right business decisions TOK Sessions It’s great to contribute back to the community. We continuously advance open source technologies to meet demanding business requirements. Open Source Contribution
  • 9.
    Finding the bestsolution for Image Processing
  • 10.
  • 11.
  • 12.
    12 Traditional Way Traditional pipelinefor image classification involves two modules ● Feature extraction ● Classification
  • 13.
    13 Problems The problem withthis pipeline ● Feature extraction cannot be tweaked according to the classes and images ● Completely different from how we humans learn to recognize things.
  • 14.
  • 15.
  • 16.
  • 17.
    17 The Application of skills,knowledge, and/or attitudes that were learned in one situation to another learning situation transfer learning is usually expressed through the use of pre-trained models
  • 18.
  • 19.
    19 Problems The problem was ●less learned rate in each generation ● Number of knowledge amount passed down was less
  • 20.
  • 21.
  • 22.
    Understanding various architecturesof Convolutional Networks ResNet, AlexNet, VGGNet, Inception
  • 23.
    23 ImageNet Large ScaleVisual Recognition Challenge (ILSVRC) CNN architectures of ILSVRC top competitors
  • 24.
    24 AlexNet ● 5 Convolutional(CONV) layers and 3 Fully Connected (FC) layers ● 62 million trainable variables
  • 25.
  • 26.
    26 AlexNet ● Data augmentationis carried out to reduce overfitting ● Used Relu which achieved 25% error rate about 6 times faster than the same network with tanh nonlinearity. ● AlexNet introduced Local Response Normalization (LRN) to help with the vanishing gradient problem
  • 27.
    27 VGGNet ● VGG16 hasa total of 138 million parameters ● Conv kernels are of size 3x3 and maxpool kernels are of size 2x2 with stride of two
  • 28.
  • 29.
    29 VGGNet ● It ispainfully slow to train. ● Spatial pooling is carried out by five max-pooling layers, which follow some of the conv. layers
  • 30.
  • 31.
    ResNet : DeepResidual learning
  • 32.
    32 Hierarchical Features androle of Depth ● Low, Mid , and High-level features ● More layers enrich the “levels” of the features ● Previous ImageNet models have depths of 16 and 30 layers
  • 33.
    Is learning betternetworks as easy as stacking more layers ?
  • 34.
    34 Adding layers todeep Convolutional neural nets
  • 35.
    35 Construction Insight ● Considera shallow architecture and its deeper counterpart ● The deeper model would would just need to copy the shallower model with identity mapping ● Construction solution suggests that a deeper model should produce no higher training error that its shallow counterpart
  • 36.
    36 Residual Functions ● Weexplicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions ● H[x] = F[x] + x
  • 37.
  • 38.
  • 39.
    39 Experiment ● 152 layerLayers on ImageNet ○ 8* Deeper than VGGNet ○ Less parameters ● ResNet achieve 3.57% error on Imagenet test ○ 1st place in ILSVRC
  • 40.
    40 Results ● AlexNet andResNet-152, both have about 60M parameters but there is about 10% difference in their top-5 accuracy ● VGGNet not only has a higher number of parameters and FLOP as compared to ResNet-152, but also has a decreased accuracy ● Training an AlexNet takes about the same time as training Inception (10 times less memory requirements)
  • 41.
    41 Clinic Assistant ● Notebookhttp://bit.ly/2D2LOQT ● Web App https://virtual-clinic.onrender.com
  • 42.
    42 History and itsimportance ● Origin of CNN(1980s-1999) ● Stagnation of CNN(Early 2000) ● Revival of CNN (2006-2011) ● Rise of CNN (2012-2014) ● Rapid increase in Architectural Innovations (2015-present) ● Important because we are not done yet.
  • 43.
  • 44.
    44 Spatial Exploitation basedCNNs ● LeNet ● AlexNet ● ZefNet ● VGG ● GoogleNet
  • 45.
    45 Depth based CNNs ●Highway Networks ● ResNet ● Inception-V3/V4 ● Inception-ResNet ● ResNext
  • 46.
    46 Multi-path based CNNs ●Highway Nets ● ResNet ● DenseNet
  • 47.
    47 Width based CNNs ●WideResNet ● Pyramidal Net ● Xception ● Inception Family
  • 48.
    48 Feature map exploitationbased CNNs ● Squeeze and Excitation ● Competitive Squeeze and Excitation
  • 49.
  • 50.
    50 Attention based CNNs ●Residual Attention Neural Network ● Convolutional block attention ● Concurrent Squeeze and Excitation
  • 51.
    51 Improvement summary ● Learningcapacity of CNN is significantly improved over the years by exploiting depth and other structural modifications. ○ Activation, loss function, optimization, regularization, learning algorithms, and restructuring of processing units. ● Major improvement on CNN ○ Main boost in CNN performance has been achieved by replacing the conventional layer structure with blocks
  • 52.
    52 Challenge Exists ● DeepNN are generally like a black box and thus may lack in interpretation and explanation ● Each layer of CNN automatically tries to extract better and problem specific features related to the task ● Deep CNNs are based on supervised learning mechanism, and therefore, availability of a large and annotated data is required for its proper learning ● Hyperparameter selection highly influences the performance of CNN ● Efficient training of CNN demands powerful hardware resources such as GPUs.
  • 53.
    53 Future of research ●Ensemble learning ● Attention modeling ● Generative learning
  • 54.
    54 References ● [1]. A.Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems,pages 1097–1105,2012. ● [2]. K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385,2015. ● [3]. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556,2014. ● [4]. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,pages 1–9,2015. ● https://arxiv.org/pdf/1901.06032.pdf
  • 55.