Slides for my session at Virtual ML.NET Conference about developing an image recognition machine learning model for a rock-paper-scissors mobile game with ML.NET and Xamarin
3. Luis Beltrán
• Researcher - Tomas Bata University in Zlín, Czech Republic.
• Lecturer - Tecnológico Nacional de México en Celaya,
Mexico.
• Xamarin, Azure and Artificial Intelligence
@darkicebeam
luis@luisbeltran.mx
4.
5. Deep Learning
• Deep learning is a subfield of Machine Learning
concerned with algorithms inspired by the structure
and function of the brain called artificial neural
networks.
• It is exceptionally effective in discovering patterns.
• Algorithms learn through a multi-layered hierarchy.
• If you supply the system with tons of information, it
will begin to understand and respond in helpful
ways.
6. Deep learning has an inbuilt automatic multi stage feature learning
process that learns rich hierarchical representations (i.e. features).
Low-level
features
Mid-level
features
Output (e.g. exterior,
interior)
High-level
features
Trainable
Classifier
7. • Image
Pixel Edge Texture Motif Part Object
• Text
Character Word Word-group Clause Sentence Story
• Each module in Deep Learning transforms its input representation into a
higher-level one, in a way similar to human cortex.
Low Level
Features
Mid Level
Features Output
High
Level
Features
Trainable
Classifier
Input
9. Convolution
Input Image Convolved Image
(Feature Map)
a b c d
e f g h
i j k l
m n o p
w1 w2
w3 w4
Filter
h1 h2
ℎ1 = 𝑓 𝑎 ∗ 𝑤1 + 𝑏 ∗ 𝑤2 + 𝑒 ∗ 𝑤3 + 𝑓 ∗ 𝑤4
ℎ2 = 𝑓 𝑏 ∗ 𝑤1 + 𝑐 ∗ 𝑤2 + 𝑓 ∗ 𝑤3 + 𝑔 ∗ 𝑤4
10. Lower Level to More Complex Features
Input Image
Layer 1
Feature Map
Layer 2
Feature Map
w1 w2
w3 w4
w5 w6
w7 w8
Filter 1
Filter 2
11. Pooling
• Max pooling: reports the maximum output within a rectangular
neighborhood.
• Average pooling: reports the average output of a rectangular
neighborhood.
1 3 5 3
4 2 3 1
3 1 1 3
0 1 0 4
MaxPool with 2X2 filter with
stride of 2
Input Matrix Output Matrix
4 5
3 4
12. Convolutional Neural Network
Feature Extraction Architecture
64
64
128
128
256
256
256
512
512
512
512
512
512
Filter
Max
Pool
Fully Connected
Layers
Living Room
Bed Room
Kitchen
Bathroom
Outdoor
Maxpool
Output
Vector
18. Main program
Loading data for
supervised learning
(images include tags)
Training and Validation sets
Load pipeline:
Images loaded in memory
Training options:
ImageClassificationTrainer
chosen, based on the
InceptionV3 architecture
Training pipeline:
Trying to predict a
category
Both pipelines are combined
19. Perform training
Model precision is validated
using validation dataset
Model Metrics calculated
Test the classification model using the new images
Prepare new images for validation
Export the model
Consume the model
20. ConsumingModel
Load a previously trained
classification model and prepare test
images that were not used before in
the training and validation stages
ClassifyImages: Test the model with new images
28. Unable to find an entry point named 'TF_StringEncodedSize' in DLL 'tensorflow'
“I think ml.net support tensorflow 2.3.1 not yet support 2.4,
so you must download SciSharp.TensorFlow.Redist 2.3.1”
https://github.com/dotnet/machinelearning-samples/issues/880
Deep learning can be considered as a subset of machine learning. It is a field that is based on learning and improving on its own by examining computer algorithms. While machine learning uses simpler concepts, deep learning works with artificial neural networks, which are designed to imitate how humans think and learn. Until recently, neural networks were limited by computing power and thus were limited in complexity. However, advancements in Big Data analytics have permitted larger, sophisticated neural networks, allowing computers to observe, learn, and react to complex situations faster than humans. Deep learning has aided image classification, language translation, speech recognition. It can be used to solve any pattern recognition problem and without human intervention.
Artificial neural networks, comprising many layers, drive deep learning. Deep Neural Networks (DNNs) are such types of networks where each layer can perform complex operations such as representation and abstraction that make sense of images, sound, and text.
Low-level features are minor details of the image, like lines or dots, that can be picked up by, say, a convolutional filter (for really low-level things) or (for more abstract things like edges). High-level features are built on top of low-level features to detect objects and larger shapes in the image.
Convolutional neural networks use both types of features: the first couple convolutional layers will learn filters for finding lines, dots, curves etc. while the later layers will learn to recognize common objects and shapes.
Convolution is a general purpose filter effect for images
In Convolutional Neural Networks, Filters detect spatial patterns such as edges in an image by detecting the changes in intensity values of the image.
In terms of an image, a high-frequency image is the one where the intensity of the pixels changes by a large amount, whereas a low-frequency image is the one where the intensity is almost uniform. Usually, an image has both high and low frequency components. The high-frequency components correspond to the edges of an object because at the edges the rate of change of intensity of pixel values is high.
Convolution is a simple mathematical operation which is fundamental to many common image processing operators. Convolution provides a way of `multiplying together' two arrays of numbers, generally of different sizes, but of the same dimensionality, to produce a third array of numbers of the same dimensionality.
3. Now when you apply a set of filters on top of that (pass it through the 2nd conv. layer), the output will be activations that represent higher-level features. Types of these features could be semicircles (a combination of a curve and straight edge) or squares (a combination of several straight edges). As you go through the network and go through more conv. layers, you get activation maps that represent more and more complex features.
Pooling layers provide an approach to down sampling feature maps by summarizing the presence of features in patches of the feature map. Two common pooling methods are average pooling and max pooling that summarize the average presence of a feature and the most activated presence of a feature respectively.
A pooling layer is a new layer added after the convolutional layer. Specifically, after a nonlinearity (e.g. ReLU) h