3. Challenge
sub-field of object recognition but is still a classification problem.
variable appearance of (plants ,bird , monkey ,…etc) .
high intra-class variability and small inter-class deference's,
Able to automatically distinguish between very similar object
categories.
Requires efforts from different aspects compared with generic
object recognition
Plant recognition tasks, ranging from identification of plants from
specific plant organs, to general plant recognition
localization "in the wild"
3
4. Interested
• We are mainly in three fundamental problems of fine-grained
categorization:
1) Building large-scale, high-quality datasets for benchmarking
fine-grained categorization methods.
2) Designing algorithms that are more suitable for fine-grained
recognition tasks.
3) Exploring ways to bring human’s expertise into fine-grained
categorization.
4
5. Background
Linnaeus’s Systema Naturae describes
about 6000 plant species while the 10th
edition
Currently the number of published and
accepted plant in the world is over 310,000
Plant recognition as flower, bark, fruit,
leaf or their combination
5
6. Literature Review(Leaf Recognition)
based on combining features of different character (shape features, color,
vein, texture features, etc.).
SIFT ,SURF,FAST methods geometric features, moment invariants,
Zernike moments and polar Fourier Transform.
Histogram of Oriented Gradients (HOG)
probabilistic neural network (PNN)using 12 digital morphological features,
derived from 5 basic features (diameter, physiological length,
physiological width, leaf area, leaf perimeter). plant leaf database (Flavi)
Pairwise Rotation Invariant Co-occurrence Local Binary Patterns
(PRICoLBP) with Support Vector Machine (SVM) classification,
(applications-texture, material, flower, leaf, food, scene classification)
6
7. L.R (Tree bark recognition)
method using Gabor filter banks is a linear filter .
a combination of Grey-Level Co-occurrence Matrix (GLCM) and
a binary texture feature called long connection length emphasis
Plant identification from diverse images
developed by scientists from four French research organizations (Cirad,
INRA, INRIA and IRD)
7
8. L.R (Texture recognition)
Texture itself is hard to define. There are various definitions of visual
texture.
Number of approaches is based on the popular local binary patterns
(LBP)
Many of them working only with image intensity and ignoring the
available color information
Encoder denoted as FV-CNN-VD, obtained by Fisher Vector pooling of a
very deep convolutional neural network (CNN) filter bank pre-trained on
ImageNet
Texture analysis is only applied to images with unambiguous
segmentation (bark and leaf recognition)
8
9. CONVOLUTIONAL NEURAL NETWORK
Convolutional Neural Network (CNN, or ConvNet)
Class of deep, feed-forward artificial neural networks.
Most commonly applied to analyzing visual imagery
CNNs use relatively little pre-processing compared to other image
classification algorithms
CNNs are only applied when sufficiently large datasets are available.
CNNs makes them suitable for plant recognition “in the wild” where
the views on plant organs.
9
30. Reference
• Plant Identification Based on Noisy Web Data: the Amazing Performance of Deep
Learning (LifeCLEF 2017) Hervé Goëau, Pierre Bonnet, Alexis Joly CLEF 2017
• Leaf recognition of woody species in Central Europe P Novotný, T. Suk Biosyst
Eng. 2013;115(4):444–52 2017
• Plant Identification Based on Noisy Web Data: the Amazing Performance of Deep
Learning (LifeCLEF 2017) Hervé Goëau, Pierre Bonnet, Alexis Joly CLEF 2017
• Plant Identification in an Open-world (LifeCLEF 2016) Hervé Goëau, Pierre
Bonnet, Alexis Joly CLEF 2016
• Plant Identification with Deep Convolutional Neural Network: SNUMedinfo at
LifeCLEF Plant Identification Task 2015 Sungbin Choi CLEF 2015
32
Therefore an inter-class value would be a value that compares multiple classes where the intra-class value would be a value that compares just the students in a single class.
A set of rotation-invariant features are introduced. They are the magnitudes of a set of orthogonal complex moments of the image known as Zernike moments. Scale and translation invariance are obtained
by first normalizing the image with respect to these parameters using its regular geometrical moments.
Zernike moments are used to extracting the features of printed digits in grayscale images
بالتاكيد مع wavelet يعطي نتائج جيدة جدا لاستخلات المميزات من الصورة
Rectified layer
1. Convolution Neural Network Tutorial
2. How image recognition works? Do you know how Deep Learning recognizes the objects in an image? It does it using a Convolution Neural Network Pixels of image fed as input Dog Bird Cat Hidden Layers Input Layer Output Layer
3. How image recognition works? Let’s see how CNN identifies the image of a bird Pixels of image fed as input Dog Bird Cat Hidden Layers Input Layer Output Layer Input layer accepts the pixels of the image as input in the form of arrays 1 2 1 9 2 1 7 40 2 30 11 35 70 11 1 4 3307552613 60 45 50 10 89 23
4. How image recognition works? Let’s see how CNN identifies the image of a bird Pixels of image fed as input Dog Bird Cat Hidden Layers Input Layer Output Layer Hidden layers carry out feature extraction by performing certain calculation and manipulation 1 2 1 9 2 1 7 40 2 30 11 35 70 11 1 4 3307552613 60 45 50 10 89 23
5. How image recognition works? Let’s see how CNN identifies the image of a bird Pixels of image fed as input Dog Bird Cat Hidden Layers Input Layer Output Layer There are multiple hidden layers like Convolution layer, ReLU layer, Pooling layer, etc that perform feature extraction from the image Convolution Layer This layer uses a matrix filter and performs convolution operation to detect patterns in the image 1 0 1 10 0 1 0 1 Matrix Filter
6. How image recognition works? Let’s see how CNN identifies the image of a bird Pixels of image fed as input Dog Bird Cat Hidden Layers Input Layer Output Layer There are multiple hidden layers like Convolution layer, ReLU layer, Pooling layer, etc that perform feature extraction from the image ReLU ReLU activation function is applied to the convolution layer to get a rectified feature map of the image
7. How image recognition works? Let’s see how CNN identifies the image of a bird Pixels of image fed as input Dog Bird Cat Hidden Layers Input Layer Output Layer There are multiple hidden layers like Convolution layer, ReLU layer, Pooling layer, etc that perform feature extraction from the image Pooling Pooling layer also uses multiple filters to detect edges, corners, eyes, feathers, beak, etc
8. How image recognition works? Let’s see how CNN identifies the image of a bird Pixels of image fed as input Dog Bird Cat Hidden Layers Input Layer 1 2 1 9 2 1 7 40 2 30 11 35 70 11 1 4 3307552613 60 45 50 10 89 23 Finally there is a fully connected layer that identifies the object in the image Output Layer
9. What’s in it for you? How CNN recognizes images? What is Convolution neural network? Use case implementation using CNN Introduction to CNN Layers in convolution neural network
10. Introduction to CNN Yann LeCun Pioneer of Convolution Neural Network Director of Facbook’s AI Research Group Built the first Convolution Neural Network called LeNet in 1988 It was used for character recognition tasks like reading zip codes, digits
11. Introduction to CNN Yann LeCun Pioneer of Convolution Neural Network Director of Facbook’s AI Research Group Built the first Convolution Neural Network called LeNet in 1988 It was used for character recognition tasks like reading zip codes, digits
12. Introduction to CNN Yann LeCun Pioneer of Convolution Neural Network Director of Facbook’s AI Research Group Built the first Convolution Neural Network called LeNet in 1988 It was used for character recognition tasks like reading zip codes, digits
13. Introduction to CNN Yann LeCun Pioneer of Convolution Neural Network Director of Facbook’s AI Research Group Built the first Convolution Neural Network called LeNet in 1988 It was used for character recognition tasks like reading zip codes, digits
14. What is a Convolution Neural Network? CNN is a feed forward neural network that is generally used to analyze visual images by processing data with grid like topology. A CNN is also known as a “ConvNet” Orchid Rose Flowers of 2 varieties (Orchid/Rose) Identifies the flowers Hidden Layers Input Layer Output Layer
15. What is a Convolution Neural Network? CNN is a feed forward neural network that is generally used to analyze visual images by processing data with grid like topology. A CNN is also known as a “ConvNet” Convolution operation forms the basis of any Convolution Neural Network In CNN, every image is represented in the form of arrays of pixel values Real Image of the digit 8 Represented in the form of an array 0 0 1 1 0 0 0 0 0 01 1 1 1 1 10 0 0 0 0 1 0 1 0 0 0 0 0 0 Digit 8 represented in the form of pixels of 0’s and 1’s
16. What is a Convolution Neural Network? Let’s understand the convolution operation using 2 matrices a and b of 1 dimension a b* Sum the product b = [1, 2, 3] a = [5, 3, 2, 5, 9, 7] b = [1, 2, 3] a = [5, 3, 7, 5, 9, 7] Matrix a and b Convolution
17. What is a Convolution Neural Network? Let’s understand the convolution operation using 2 matrices a and b of 1 dimension a b* Sum the product b = [1, 2, 3] a = [5, 3, 2, 5, 9, 7] [5, 6, 6] a b* = [17, ] Multiply the arrays element wise 17 b = [1, 2, 3] a = [5, 3, 7, 5, 9, 7] Matrix a and b Convolution
18. What is a Convolution Neural Network? Let’s understand the convolution operation using 2 matrices a and b of 1 dimension a b* Sum the product b = [1, 2, 3] a = [5, 3, 2, 5, 9, 7] a b* = [17, 22 ] Multiply the arrays element wise 17 b = [1, 2, 3] a = [5, 3, 7, 5, 9, 7] Matrix a and b Convolution [5, 6, 6] [3, 4, 15] 22
19. What is a Convolution Neural Network? Let’s understand the convolution operation using 2 matrices a and b of 1 dimension a b* Sum the product b = [1, 2, 3] a = [5, 3, 2, 5, 9, 7] a b* = [17, 22, 39,…….. ] Multiply the arrays element wise 17 b = [1, 2, 3] a = [5, 3, 7, 5, 9, 7] Matrix a and b Convolution [5, 6, 6] [3, 4, 15] 22 [2, 10, 27] 39 ……… ………
20. How CNN recognizes images? image for the symbol image for the symbol / Consider the following 2 images: When you press , the above image is processed
21. How CNN recognizes images? image for the symbol image for the symbol / Consider the following 2 images: When you press /, the above image is processed
22. How CNN recognizes images? Image represented in the form of a matrix of numbers 000000 11 0000000 1 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 000000 0 0 1 0 Real Image Represented in the form of black and white pixels
23. Layers in Convolution Neural Network Pooling Layer ReLU LayerConvolution Layer CNN Fully Connected Layer 1 2 34
24. Convolution Layer A Convolution Layer has a number of filters that perform convolution operation Every image is considered as a matrix of pixel values. Consider the following 5 5 image whose pixel values are only 0 and 1* 1 0 1 10 0 1 0 1 Filter 1 1 1 0 0 0 0 0 0 1 0 0 1 1 1 0 1 1 1 1 1 0 1 0 0 Image pixels 4 3 4 2 4 3 2 3 4 Convolved Feature Sliding the filter matrix over the image and computing the dot product to detect patterns
25. Convolution Layer A Convolution Layer has a number of filters that perform convolution operation Every image is considered as a matrix of pixel values. Consider the following 5 5 image whose pixel values are only 0 and 1* 1 0 1 10 0 1 0 1 Filter 1 1 1 0 0 0 0 0 0 1 0 0 1 1 1 0 1 1 1 1 1 0 1 0 0 Image pixels 4 x 1 x0 x 1 x0 x 1 x0 x 1 x0 x 1 1 1 1 0 1 1 0 0 1 Convolved Feature Sliding the filter matrix over the image and computing the dot product to detect patterns
26. Convolution Layer A Convolution Layer has a number of filters that perform convolution operation Every image is considered as a matrix of pixel values. Consider the following 5 5 image whose pixel values are only 0 and 1* 1 0 1 10 0 1 0 1 Filter 1 1 1 0 0 0 0 0 0 1 0 0 1 1 1 0 1 1 1 1 1 0 1 0 0 Image pixels 4 3 x 1 x0 x 1 x0 x 1 x0 x 1 x0 x 1 1 1 0 1 1 1 0 1 1 Convolved Feature Sliding the filter matrix over the image and computing the dot product to detect patterns
27. Convolution Layer A Convolution Layer has a number of filters that perform convolution operation Every image is considered as a matrix of pixel values. Consider the following 5 5 image whose pixel values are only 0 and 1* 1 0 1 10 0 1 0 1 Filter 1 1 1 0 0 0 0 0 0 1 0 0 1 1 1 0 1 1 1 1 1 0 1 0 0 Image pixels 4 3 4 x 1 x0 x 1 x0 x 1 x0 x 1 x0 x 1 1 0 0 1 1 0 1 1 1 Convolved Feature Sliding the filter matrix over the image and computing the dot product to detect patterns
28. Convolution Layer A Convolution Layer has a number of filters that perform convolution operation Every image is considered as a matrix of pixel values. Consider the following 5 5 image whose pixel values are only 0 and 1* 1 0 1 10 0 1 0 1 Filter 1 1 1 0 0 0 0 0 0 1 0 0 1 1 1 0 1 1 1 1 1 0 1 0 0 Image pixels 4 3 4 2 x 1 x0 x 1 x0 x 1 x0 x 1 x0 x 1 0 1 1 0 0 1 0 0 1 Convolved Feature Sliding the filter matrix over the image and computing the dot product to detect patterns
29. Convolution Layer A Convolution Layer has a number of filters that perform convolution operation Every image is considered as a matrix of pixel values. Consider the following 5 5 image whose pixel values are only 0 and 1* 1 0 1 10 0 1 0 1 Filter 1 1 1 0 0 0 0 0 0 1 0 0 1 1 1 0 1 1 1 1 1 0 1 0 0 Image pixels 4 3 4 2 4 x 1 x0 x 1 x0 x 1 x0 x 1 x0 x 1 1 1 1 0 1 1 0 1 1 Convolved Feature Sliding the filter matrix over the image and computing the dot product to detect patterns
30. Convolution Layer A Convolution Layer has a number of filters that perform convolution operation Every image is considered as a matrix of pixel values. Consider the following 5 5 image whose pixel values are only 0 and 1* 1 0 1 10 0 1 0 1 Filter 1 1 1 0 0 0 0 0 0 1 0 0 1 1 1 0 1 1 1 1 1 0 1 0 0 Image pixels 4 3 4 2 4 3 x 1 x0 x 1 x0 x 1 x0 x 1 x0 x 1 1 1 0 1 1 1 1 1 0 Convolved Feature Sliding the filter matrix over the image and computing the dot product to detect patterns
31. Convolution Layer A Convolution Layer has a number of filters that perform convolution operation Every image is considered as a matrix of pixel values. Consider the following 5 5 image whose pixel values are only 0 and 1* 1 0 1 10 0 1 0 1 Filter 1 1 1 0 0 0 0 0 0 1 0 0 1 1 1 0 1 1 1 1 1 0 1 0 0 Image pixels 4 3 4 2 4 3 2 x 1 x0 x 1 x0 x 1 x0 x 1 x0 x 1 0 0 1 0 0 1 0 1 1 Convolved Feature Sliding the filter matrix over the image and computing the dot product to detect patterns
32. Convolution Layer A Convolution Layer has a number of filters that perform convolution operation Every image is considered as a matrix of pixel values. Consider the following 5 5 image whose pixel values are only 0 and 1* 1 0 1 10 0 1 0 1 Filter 1 1 1 0 0 0 0 0 0 1 0 0 1 1 1 0 1 1 1 1 1 0 1 0 0 Image pixels 4 3 4 2 4 3 2 3 x 1 x0 x 1 x0 x 1 x0 x 1 x0 x 1 0 1 1 0 1 1 1 1 0 Convolved Feature Sliding the filter matrix over the image and computing the dot product to detect patterns
33. Convolution Layer A Convolution Layer has a number of filters that perform convolution operation Every image is considered as a matrix of pixel values. Consider the following 5 5 image whose pixel values are only 0 and 1* 1 0 1 10 0 1 0 1 Filter 1 1 1 0 0 0 0 0 0 1 0 0 1 1 1 0 1 1 1 1 1 0 1 0 0 Image pixels x 1 x0 x 1 x0 x 1 x0 x 1 x0 x 1 1 1 1 1 1 0 1 0 0 4 3 4 2 4 3 2 3 4 Convolved Feature Sliding the filter matrix over the image and computing the dot product to detect patterns
34. ReLU Layer Once the feature maps are extracted, the next step is to move them to a ReLU layer 1050-5-10 0 2 4 6 8 10 R(z) = max(0, z) ReLU Performs element wise operation Sets all negative pixels to 0 Introduces non-linearity to the network The output is a rectified feature map
35. ReLU Layer Real image is scanned in multiple convolution and ReLU layers for locating features
36. ReLU Layer Real image is scanned in multiple convolution and ReLU layers for locating features
37. Note for the instructor While explaining, please mention there are multiple Convolution, ReLU and Pooling layers connected one after another that carry out feature extraction in every layer. The input image is scanned multiple times to generate the input feature map.
38. Pooling Layer The rectified feature map now goes through a pooling layer. Pooling is a down-sampling operation that reduces the dimensionality of the feature map. 1 2 4 6 2 7 58 3 04 1 2 3 1 7 6 8 4 7 max pooling with 2x2 filters and stride 2 Max(3, 4, 1, 2) = 4 Pooled feature map Rectified feature map
39. Pooling Layer Identifies the edges, corners and other features of the bird Pooling layer uses different filters to identify different parts of the image like edges, corners, body, feathers, eyes, beak, etc.
40. Pooling Layer Structure of the Convolution Neural Network so far 1 1 1 0 0 0 0 0 0 1 0 0 1 1 1 0 1 1 1 1 1 0 1 0 0 Convolution Pooling Input Image Convolution Layer Pooling Layer ReLU
41. Flattening 6 8 4 7 Pooled feature map 6 8 4 7 Flattening Flattening is the process of converting all the resultant 2 dimensional arrays from pooled feature map into a single long continuous linear vector.
42. Flattening Pooling Layer Input Layer Flattening is the process of converting all the resultant 2 dimensional arrays from pooled feature map into a single long continuous linear vector. Flattening
43. Flattening 1 1 1 0 0 0 0 0 0 1 0 0 1 1 1 0 1 1 1 1 1 0 1 0 0 Convolution Pooling Input Image Convolution Layer Pooling Layer Flattening Input to the to final layer Structure of the network so far ReLU
44. Fully Connected Layer ………… Flattened Matrix The Flattened matrix from the pooling layer is fed as input to the Fully Connected Layer to classify the image
45. Fully Connected Layer ………… Flattened Matrix Dog Bird Cat The Flattened matrix from the pooling layer is fed as input to the Fully Connected Layer to classify the image Pixels from the flattened matrix fed as input
46. Fully Connected Layer Dog Bird Cat Identifies the image The Flattened matrix from the pooling layer is fed as input to the Fully Connected Layer to classify the image Pixels from the flattened matrix fed as input
47. Fully Connected Layer 1 1 1 0 0 0 0 0 0 1 0 0 1 1 1 0 1 1 1 1 1 0 1 0 0 Convolution Pooling Input Image Convolution Layer Pooling Layer Flattening ReLU Fully Connected Layer
48. Fully Connected Layer Lets see the entire process how CNN recognizes a bird Dog Bird Cat Feature Extraction in multiple hidden layers Classification in the output layer Convolution + ReLU + Max Pooling Fully Connected Layer
49. Use case implementation using CNN We will be using CIFAR-10 data set (from Canadian Institute For Advanced Research) for classifying images across 10 categories 01 03 05 07 09 02 04 06 08 10 airplane automobile bird cat deer dog frog horse ship truck
50. Use case implementation using CNN 1. Download data set 2. Import the CIFAR data set
51. Use case implementation using CNN 3. Reading the label names
52. Use case implementation using CNN 4. Display images using matplotlib
53. Use case implementation using CNN 4. Display images using matplotlib
54. Use case implementation using CNN 4. Display images using matplotlib
55. Use case implementation using CNN 5. Helper function to handle data
56. Use case implementation using CNN 5. Helper function to handle data
57. Use case implementation using CNN 6. To use the previous code, run the following 7. Creating the model
58. Use case implementation using CNN 8. Applying the helper functions
59. Use case implementation using CNN 8. Create the layers 9. Create the flattened layer by reshaping the pooling layer 10. Create the fully connected layer
60. Use case implementation using CNN 12. Apply the Loss function 11. Set output to y_pred 13. Create the optimizer 14. Create a variable to initialize all the global tf variables
61. Use case implementation using CNN 15. Run the model by creating a Graph Session
62. Key Takeaways