SlideShare a Scribd company logo
1
Machine learning
for document analysis and
understanding
TC10/TC11 Summer School on Document Analysis:
Traditional Approaches and New Trends
@La Rochelle, France. 8:30-10:30, 4th July 2018
Seiichi Uchida, Kyushu University, Japan
2
The Nearest Neighbor Method
The simplest ML for pattern recognition;
Everything starts from it!
3
The nearest neighbor method:
Learning = memorizing
Input PorkBeef
Orange
Watermelon
Pineapple Fish
Which reference
pattern is the most
similar?
Reference
patterns
4
Each pattern is represented
as a feature vector
Color feature
Texture
feature
Pork=(10, 2.5, 4.3)
*Those numbers are just a random example
Note: In the classical
nearest neighbor method,
those features are
designed by human
5
A different pattern becomes a different
feature vector
Beef = (8, 2.6, 0.9)
*Those numbers are just a random example
Pork=(10, 2.5, 4.3)
*Those numbers are just a random example
Color feature
Texture
feature
6
Reference patterns
in the feature vector space
Color feature
Texture
feature
7
An input pattern
in the feature vector space
We want to
recognize this
input x
Color feature
Texture
feature
8
input x
Nearest neighbor method
in the feature vector space
Nearest
neighbor
input = orange
Color feature
Texture
feature
99
How do you define
“the nearest neighbor”?
Distance-based
The smallest distance gives
the nearest neighbor
Ex.
• Euclidean distance /
Similarity-based
The largest similarity gives
the nearest neighbor
Ex.
• Inner product
• Cosine similarity
𝐱 𝐲
𝐱 𝐲
x
?
1010
Do you remember an important property
of “inner product”?
If and are in the similar direction, their
inner product becomes larger
The inner product evaluates
the similarity between and
11
Well, two different types of features
(Note: important to understand deep learning)
Features defined by the
pattern itself
 Orange pixels→ Many
 Blue pixels → Rare
 Roundness → High
 Symmetry →High
 Texture → Fine
…
Features defined by the
similarity to others
 Similarity to ”car” → Low
 Similarity to ”apple” → High
 Similarity to “monkey”→Low
 Similarity to “Kaki”
(persimmon) →Very high
…
12
The nearest neighbor method with
similarity-based feature vectors
Similarity
to “Kaki”
Similarity to “car”
Important note:
Similarity is used for not
only feature extraction
but also classification
13
A shallow explanation of
neural networks
Don’t think it is a black box.
If you know “inner-product”, it becomes
14
The neuron – its reality
https://commons.wikimedia.org/
15
From reality to computational model
https://commons.wikimedia.org/
input g  xgx
1x
jx
dx
1w
jw
dw
f
 xg
16
The neuron by computer
Σ  xg
1x
jx
dx
1
……
b
 bf
bxwfg
T
d
j
jj









 
xw
x
   
1
)(
x 1w
jw
dw
f
f: non-linear func.
input
output
17
The neuron by computer
Σ  xg
1x
jx
dx
1
……
b
x 1w
jw
dw
f
f: non-linear func.
Let’s
forget
 bf
bxwfg
T
d
j
jj









 
xw
x
   
1
)(
18
The neuron by computer
Σ  xg
1x
jx
dx
1
……
b
x 1w
jw
dwLet’s
forget


d
j
jj bxwg
1
)(x
19
The neuron by computer
Σ
1x
jx
dx
……
xwT
just “inner product”
of two vectors
x
1w
jw
dw
w
xw
x
T
d
j
jj xwg

 
   
1
)(
20
So, a neuron calculate…
xwT
Σ
1x
jx
dx
……
1w
jw
dw
xw andbetweensimilarityA
=0.9 if they
are similar
=0.02 if they
are dissimilar
21
So, if we have K neurons, we have
a K-dimensional similarity-based feature vector
…














xw
xw
xw
T
K
T
T

2
11w
2w
Kwx
1x
jx
dx
0.9
0.05
0.75
x
22
K-dimensional similarity-based feature
vector by K neurons
0.9
0.05
0.75
input
equiv.
similarity to
similarity to
23
Another function of the inner product
Similarity-based classification!
(Yes, the nearest neighbor method!)
Σ
1x
jx
dx
……
x
reference
pattern
of class k
24
Note: Multiple functions are realized by just combining neurons!
Just by layering the neuron elements, we
can have a complete recognition system!
…
Feature extraction
1w
Kw
1x
jx
dx
……
2w
Classification
AV
CV
BV
Similarity
to class A
Similarity
to class B
Similarity
to class C
Choose
max
25
Now the time for deep neural networks
1x
jx
dx
feature extraction layers
……
…
f
f
f
…
classification
f
f
f
26
An example: AlexNet
“Deep” neural network called AlexNet
A Krizhevsky, NIPS2012
feature extraction layers
classification
layers
27
Now the time for deep neural networks
1x
jx
dx
feature extraction layers
……
…
f
f
f
…
Classification
f
f
f
Why do we need to repeat
feature extraction?
28
Why do we need to repeat feature
extraction?
A
D
C
B
E
F
A difficult
classification
task
29
Why do we need to repeat feature
extraction?
A
D
C
B
E
F
1w2w
30
Why do we need to repeat feature
extraction?
A
D
C
B
E
F
1w2w
F
A
B
C
D
E
Large similarity to 𝐰
Small similarity to 𝐰
similarity to
similarity
to𝐰
Note: The lower picture is not
very accurate (because it does not
use inner-product-based but
distance-based space
transformation. However I believe
that it does not seriously damage
the explanation here.
31
Why do we need to repeat feature
extraction?
A
D
C
B
E
F
1w2w
F
A
B
C
D
E
It becomes more
separable
but still not
very separable
similarity to
similarity
to𝐰
32
Why do we need to repeat feature
extraction?
A
D
C
B
E
F
1w2w
F
A
B
C
D
E3w
4w
similarity to
similarity
to𝐰
33
Why do we need to repeat feature
extraction?
A
D
C
B
E
F
1w2w
F
A
B
C
D
E3w
4w
similarity to
similarity
to𝐰
A
D
E
B
C
F
similarity to
similarity
to𝐰
34
F
A
B
C
D
E3w
4w
similarity to
similarity
to𝐰
A
D
E
B
C
F
similarity to
similarity
to𝐰
Why do we need to repeat feature
extraction?
A
D
C
B
E
F
1w2w
3w
4w
Wow, they
become
separable!
35
Why do we need to repeat feature
extraction?
A
D
C
B
E
F
1w2w Now two classes
become totally
separable by
2v
1v
A
D
E
B
C
F
similarity to
similarity
to𝐰
A
D
E
B
C
F
similarity to
similarity
to𝐰
F
A
B
C
D
E3w
4w
similarity to
similarity
to𝐰
36
Remembering the non-linear function
Σ  xg
1x
jx
dx
1
……
b
x 1w
jw
dw
f
f: non-linear func.
37
The typical non-linear function:
Rectified linear function (ReLU)
Σ  xg
1x
jx
dx
1
……
b
x 1w
jw
dw
f
Rectified linear function
3838
How does ReLU affect
the similarity-based feature?
Minus elements in the feature vector are
forced to be zero
xwT
1
xwT
K
f
Unchanged
Unchanged
f
39
How to train neural networks:
Super-superficial explanation
40
In order to realize a DNN with
an expected “input-output” relation
…
1w
Kw
1x
jx
dx
……
2w
AV
CV
BV
Similarity to
class A
Similarity to
class B
Similarity to
class C
Those parameters should be tuned
1w AV2w
41
Training DNN; the goal
Class B
Class A
DNN
Knobs
Perfect
classification
boundary
Note: Actual number of #knobs (=#parameters)
42
Training DNN;
error-correcting learning by back propagation
NG
tuning
NG
NG
NG
Initial status
tuning
OK, end.
boundary
4343
Advanced topic: Why (SGD-based) back-
propagation works?
Many theoretical researches have been done
[Choromanska+, PMLR2015] [Wu+, arXiv2017]
Under several assumptions,
local minima is close to the
global minimum.
flat basin of loss surface
44
Knob = weight
= a pattern for similarity-based feature
Σ
1x
jx
dx
……
input weight
similarity
to
similarity to
This pattern is automatically
derived through training…
45
Optimal feature is extracted automatically
through training (Representation learning)
Google’s cat
https://googleblog.blogspot.jp/2012/06/
similarity
to
similarity toDetermined
automatically
46
DNN for image classification:
Convolutional neural networks
(CNN)
47
kw
How to deal with images by DNN?
x
xwT
k
400million-dim vector
400million-dim vector
①Intractable
computations
②Enormous
parameters
4848
kw
Convolution
= Repeating “local inner product” operations
= Linear filtering
x
ji
T
k ,xw
Low-dimensional
vector
ji,x
①Tractable
computations
②Trainable
#parameters
49
kw
Convolutional layer
x
ji,x
= Use the same weight
(filter coefficient)
at all locations
“Filtered”
image
50
kw
Pooling layer
x
ji,x
Keep only the
maximum value
①Deformation
compensation
②Local info
aggregation
51
Application to DAR:
Isolated character recognition
machine printed
handwritten
designed fonts
95.49%
99.79%
99.99%
[Uchida+, ICFHR2016]
Near-human performance
5252
Application to DAR:
Breaking Captcha
99.8% by 1 million training samples
[Goodfellow+, ArXiv, 2014]
53
Application to DAR:
Detecting a component in a character imageMulti-part
component
[Iwana+, ICDAR2017]
Q: Can CNN detect complex components accurately?
54
Application to DAR:
Font Recognition (DeepFont)
[Wang+, ACMMM2015]
55
Several tips about DNN/CNN
56
CNN can be used as a feature extractor
1x
jx
dx
feature extraction layers
……
…
f
f
f
…
classification
(discarded)
f
f
f
1x
jx
dx
……
…
f
f
f
Another classifier
e.g., SVM and LSTM
Anomaly
detector
Clustering
great
5757
The current CNN does not
“understand” characters yet
Adversarial examples
[Abe+, unpublished]
Motivated by [Nguyen+, CVPR2015]
Likelihood values for
classes “A” and “B”
58
On the other hand, CNN can learn “math
operation” through images
input images output “image”
showing the sum
[Hoshen+, AAAI, 2016]
5959
Visualization for deep learning:
DeCAF [Donahue+, arXiv 2013]
Visualizing the pattern distribution at each
layer
Near to the input layer Near to the output layer
6060
Visualization for deep learning:
DeepDream and its relations
Finding an input image that excites a neuron
at a certain layer
https://distill.pub/2017/feature-visualization/
6161
Visualization for deep learning:
Layer-wise Relevance Propagation (LRP)
Finding pixels which contribute the final
decision by a backward process
http://www.explain-ai.org/
62
Visualization for deep learning:
Local sensitivity analysis by making a hole
Motivated by [Zeiler+, arXiv, 2013][Ide+, Unpublished]
Likelihood of class “0” degrades a lot by making a hole around the pixel
6363
Visualization for deep learning:
Grad-CAM [Selvaraju+, arXiv2016]
Finding pixels which contribute the final
decision by a backward process
http://gradcam.cloudcv.org/
6464
tensorflow playground by Google
https://playground.tensorflow.org/
65
Several Variants of DNN/CNN
6666
Auto encoder
(= Nonlinear principal component analysis)
Training the network to output the input
App: Denoising by convolutional auto-encoder
Compact
representation
of the input
wikipedia
https://blog.sicara.com/keras-tutorial-content-based-image-retrieval-convolutional-denoising-autoencoder-dc91450cc511
6767
U-Net:
Conv-Deconv net that outputs an image
[Ronneberger+, MICCAI2015]
Skip connection
cell
image
cell boundary
image
68
Application to DAR:
Scene text eraser
[Nakamura+, ICDAR2017]
6969
Application to DAR:
Binarization
ICDAR-DIBCO2017 Winner (Smart Engines
Ltd, Moscow, Russia) used U-net
[Pratikakis+, ICDAR2017]
70
Application to DAR:
Dewarping [Ma+, CVPR2018]
Stacked U-nets
7171
Note: Deep Image Prior
[Ulyanov+, CVPR2018]
Conv-Deonv structure has an inherent
characteristics which is suitable for image
completion and other “low-pass” operations
train a conv-deconv
net just to generate
the left image but it
results in the right
image
7272
Generative Adversarial Networks
The battle of two neural networks
VS
Generate
“fake bill” Discriminate
fake or real bill
Generator Discriminator
Fake bill becomes more and more realistic
73
Application to DAR:
(Our) Style-consistent font generation
[Hayashi+, unpublished]
74
Application to DAR:
Oh no.. CVPR2018 was filled by Font-GANs
75
Huge variety of GANs:
Just several examples…
StackGANCycleGAN
Standard GAN (DCGAN)
https://www.slideshare.net/YunjeyChoi/generative-adversarial-networks-75916964
condition
(class)
Conditional GAN
76
Style Transfer [Gatys+, CVPR2016]
style image
(given)
content image
(given)
generated
image
77
Style Transfer [Gatys+, CVPR2016]
style image
(given)
content image
(given)
generated
image
similar internal
outputs
similar internal
output
78
Application to DAR:
Font Style Transfer
[Gantugs+, DAS2018]
79
SSD (Single Shot MultiBox Detector)
Fully-Conv Net that outputs bounding boxes
[Liu+, ECCV2016]
80
Application to DAR:
EAST: An Efficient and Accurate Scene Text Detector
[Zhou+, “EAST: An Efficient and Accurate Scene Text Detector”, CVPR2017]
Evaluating bounding box shape
81
Long short-term memory(LSTM),
which is the most typical
Recurrent Neural Networks
82
LSTM (Long short-term memory):
A recurrent neural network
… …
… …
Recurrent
structure
Info from
all the past
Gate
structure
Active info
selection
input vector
output vector
Also very effective for solving
the vanishing gradient problem
in t-direction
[Graves+, TPAMI2009]
83
LSTM NN
Recurrent NN
Recurrent
structure
Info from
all the past
LSTM NN
Gate
structure
Active info
selection input
output
input
output
input gate
forget gate
output gate
[Graves+, TPAMI2009]
84
Standard LSTM NN-based HWR
Character
class
Feature vector sequence
85
Extension to Bi-directional LSTM
Character
class
Feature vector sequence
combine Output using the past info
Output using the future info
86
Deep BLSTM network
[Frinken-Uchida, ICDAR2015]
Output
layer
Input
layer
LSTM layer
LSTM layer
LSTM layer
87
Application to DAR:
Convolutional Recurrent Neural Network (CRNN)
An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text
Recognition, IEEE TPAMI, 2017
88
Image Captioning (CNN+LSTM):
Converting an image to a “document”
[Vinyals+, arXiv2015]
89
Application to DAR:
End-to-end math OCR (Image LaTeX)
[Deng+, Image-to-Markup Generation with Coarse-to-Fine Attention, arXiv2017]
90
More conventional machine
learning techniques
(SVM, -machine, AdaBoost)
91
Support Vector Machines (SVM)
Still the best choice when the amount of data is
insufficient
92
Linear discriminant function
A A AB B
x
training patterns from class A and B
93
Linear discriminant function
A A AB B
x
bT
xw
positive
=classA
negative
=classB
94
Linear discriminant function
A A AB B
x
bT
xw
positive
=classA
negative
=classB
misrecognized
95
Linear discriminant function
A A AB B
x
bT
xw
positive
=classA
negative
=classB
no misrecognition!
96
Which one is the best?
A A AB B
x
bT
xw
positive
=classA
negative
=classB
All of those functions
can recognize all
training patterns…
97
Don’t forget unseen patterns…
A A AB B
x
bT
xw
positive
=classA
negative
=classB
B A
We might have those patterns
around the class boundary
A
98
Max-margin classification
A A AB B
x
bT
xw
positive
=classA
negative
=classB
margin margin
99
A A AB B
How can we get it?
Minimize the slope under constraints
x
bT
xw
1
-1
For us, the function value
should be more than 1
For us, the function value
should be more than 1
100
A A AB B
How can we get it?
Minimize the slope under constraints
x
bT
xw
1
-1
NG OK NG
101
A A AB B
How can we get it?
Minimize the slope under constraints
x
bT
xw
1
-1
nail
102
A A AB B
How can we get it?
Minimize the slope under constraints
x
bT
xw
1
-1
the minimum slope
satisfying the constraints
103
A A AB B
How can we get it?
Minimize the slope under constraints
x
bT
xw
1
-1
It also gives the maximum
margin classification!
104
A A AB B
Support vectors
x
bT
xw
1
-1
SV
SV
Only those SVs contribute to
determine the discriminant function
105
Two-dimensional case
Minimize the slope
106
No solution that satisfies the constraints:
Not linearly-separable
A A AB B
x
bT
xw
1
-1
B
107
A relaxation:
Replace the constraint as a penalty
A A AB B
x
bT
xw
1
-1
B
Penalty
Penalty
Minimize “slope + penalty”
108
-machine
An old partner for linear classifier
The idea of “kernel” comes from this
109
Mapping the feature vector space
to a higher-dimensional space
1x
2x
1
1
0
Not linearly-separable
1x
21xx
2x
Linearly-separable!

















21
2
1
2
1
xx
x
x
x
x
:
110
What happens in the original space
1x
2x
21xx
111
What happens in the original space
1y
2y
3y
dcybyay  321
A plane in 3D space
Rewrite
112
What happens in the original space
1x
2x
21xx
dxcxbxax  2121
??? What is this?
Revert
113
What happens in the original space
1x
2x
1
1
0
1
1
2
2121
cxb
axd
x
dxcxbxax



識別面:
Linear classification
in the higher-space
corresponds to a
non-linear classification
in the original space
Classification
boundary
114
Another example
A
A
B
B
1x
2x
B
1x
2
2
1 xx 
B
A
A
2x


















2
2
1
2
1
2
1
xx
x
x
x
x
:
115
What happens in the original space
 
 daxcx
cb
x
dxxcbxax




1
2
12
2
2
121
1
B
1x
2
2
1 xx 
B
A
A
2x
A
A
B
B
1x
2x
116116
Notes about -machine
Combination with SVM is popular
 -function leads “kernel”
Choosing a good mapping is not trivial
In the past, the choice was done by try-and-error
Recently….
   
  





i
iji
ji
jiji
i
ij
T
i
ji
jiji
i
ij
T
i
ji
jiji
kyy
yy
yy



xx
xx
xx
,
,
,
,
117117
Deep neural networks can find
a good mapping automatically
Feature extraction layer = a mapping
The mapping is specified by the weight
The weight (i.e, ) is optimized via training
It is so-called “representation learning”
…
…
118
Classifier Ensemble
and AdaBoost
119
Majority voting
ifsum>0thenA;elseB
1g
cg…
…
Cg
…
…1cg
Two-class
classifier
that returns:
1 for class A,
-1 for class B
+1
-1
+1
+1
input x
A
A
A
B
A
120
Weighted majority voting
1g
cg…
…
Cg
…
…1cg
Two-class
classifier
that returns:
1 for class A,
-1 for class B
input x
A
A
A
B
0.7
0.02
0.2
0.15
0.7
-0.2
ifsum>0thenA;elseB
A
well, how do we
decide the weighs?
121
AdaBoost:
A set of complementary classifiers
1g 0.7
training
patterns
1.Train
ifsum>0thenA;elseB
2. Reliability
122
AdaBoost:
A set of complementary classifiers
0.7
training
patterns
ifsum>0thenA;elseB
3. Give a large (small) weight to each
sample which is misrecognized
(correctly recognized) by
123
training
patterns
ifsum>0thenA;elseB
AdaBoost:
A set of complementary classifiers
0.7
0.43
4. Training with the weight
(Patterns with larger
weight should be
recognized correctly)
5. Reliability
124
training
patterns
ifsum>0thenA;elseB
AdaBoost:
A set of complementary classifiers
0.7
0.43
6. Give a large (small)
weight to each sample which
is misrecognized (correctly
recognized) by
Repeat until
convergence of
training accuracy
125125
Today I cannot explain the following ML
techniques…
 Semi-supervised learning methods
 ex. constrained clustering, virtual adversarial training,
 Weakly-supervised learning methods
 ex. Multiple-instance learning
 Unsupervised learning methods
 Clustering, self-organizing feature maps, intrinsic dimensionality
 Ensemble methods
 Random forests, ECOC, bagging, random subspace
 Robust regression
 Hidden Markov models, graphical models
 Error-correcting learning (and perceptron)
 Statistical inference
 Esp. Gaussian mixtures, maximum likelihood, Bayesian estimation
126
Concluding remarks:
New DAR research by ML
127
Near-human performance has been
achieved by big data and neural networks
machine printed
handwritten
designed fonts
95.49%
99.79%
99.99%
[Uchida+, ICFHR2016]
[Zhou+, CVPR2017]
Scene text detection
Scene text recognition
CRNN [Shi+, TPAMI, 2017]
F value=0.8 on
ICDAR2015 Incidental scene text
89.6% word recog. rate
on ICDAR2013
128
Now we can imagine
what we can do in the world
129
Beyond 100% = Computer can detect, read,
and collect all text information perfectly
Texts on notebook
Texts on object label
Texts on digital display
Texts on book page
Texts on signboard
Texts on poster / ad
So, what do want to do
with the perfect recognition results?
130
Poor recognition results
In fact, our real goal should NOT be
perfect recognition results
Real goals
Ultimate application
by using perfect
recognition results
Scientific discovery
by analyzing perfect
recognition results
Perfect recognition results
Tentative goal
131
What will you do
in the world beyond 100%?
Ultimate application
 Education
 “Total-recall” for perfect
information search
 Welfare
 Alarm, translation,
information complement
 “Life log”-related apps
 Summary, log compression,
captioning, question
answering, behavior
prediction, reminder
Scientific discovery
 With social science
 Interaction between scene
text and human
 Text statistics
 With design science
 Font shape and impression
 Discovering typographic
knowledge
 With humanities
 Historical knowledge
 Semiology
132132
Another direction:
Use characters to understand ML
Simple binary and stroke-structured pattern
Less background clutter
Small size (ex. 32x32)
Big data (ex. 80,000 samples / class)
Predefined classes (ex. 10 classes for digits)
ML has achieved near-human performance
Very good “testbed” for
not only evaluating but also understanding ML
133
The last message...
... and please do NOT become an accuracist,
parameter-tuner, and libraholic!

More Related Content

What's hot

Human Activity Recognition
Human Activity RecognitionHuman Activity Recognition
Human Activity Recognition
AshwinGill1
 
Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)nikhilus85
 
Supervised learning and unsupervised learning
Supervised learning and unsupervised learningSupervised learning and unsupervised learning
Supervised learning and unsupervised learning
ArunakumariAkula1
 
Moving object detection in video surveillance
Moving object detection in video surveillanceMoving object detection in video surveillance
Moving object detection in video surveillance
Ashfaqul Haque John
 
An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep Learning
Julien SIMON
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Simplilearn
 
Natural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionNatural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - Introduction
Aritra Mukherjee
 
CNN Machine learning DeepLearning
CNN Machine learning DeepLearningCNN Machine learning DeepLearning
CNN Machine learning DeepLearning
Abhishek Sharma
 
Object detection
Object detectionObject detection
Object detection
Jksuryawanshi
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)
Jeong-Gwan Lee
 
Hidden Markov Model
Hidden Markov Model Hidden Markov Model
Hidden Markov Model
Mahmoud El-tayeb
 
Matrix Factorization
Matrix FactorizationMatrix Factorization
Matrix Factorization
Yusuke Yamamoto
 
Feature selection
Feature selectionFeature selection
Feature selection
Dong Guo
 
Machine learning
Machine learningMachine learning
Machine learning
Shailja Tripathi
 
Object tracking presentation
Object tracking  presentationObject tracking  presentation
Object tracking presentation
MrsShwetaBanait1
 
Supervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine LearningSupervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine Learning
Spotle.ai
 
Feature selection
Feature selectionFeature selection
Feature selection
dkpawar
 
Machine learning in image processing
Machine learning in image processingMachine learning in image processing
Machine learning in image processing
Data Science Thailand
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learning
Antonio Rueda-Toicen
 

What's hot (20)

Human Activity Recognition
Human Activity RecognitionHuman Activity Recognition
Human Activity Recognition
 
Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)
 
Supervised learning and unsupervised learning
Supervised learning and unsupervised learningSupervised learning and unsupervised learning
Supervised learning and unsupervised learning
 
Moving object detection in video surveillance
Moving object detection in video surveillanceMoving object detection in video surveillance
Moving object detection in video surveillance
 
An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep Learning
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
 
Natural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionNatural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - Introduction
 
CNN Machine learning DeepLearning
CNN Machine learning DeepLearningCNN Machine learning DeepLearning
CNN Machine learning DeepLearning
 
Object detection
Object detectionObject detection
Object detection
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)
 
Hidden Markov Model
Hidden Markov Model Hidden Markov Model
Hidden Markov Model
 
Hidden markov model ppt
Hidden markov model pptHidden markov model ppt
Hidden markov model ppt
 
Matrix Factorization
Matrix FactorizationMatrix Factorization
Matrix Factorization
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Machine learning
Machine learningMachine learning
Machine learning
 
Object tracking presentation
Object tracking  presentationObject tracking  presentation
Object tracking presentation
 
Supervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine LearningSupervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine Learning
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Machine learning in image processing
Machine learning in image processingMachine learning in image processing
Machine learning in image processing
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learning
 

Similar to Machine learning for document analysis and understanding

Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
Gaurav Mittal
 
Java and Deep Learning
Java and Deep LearningJava and Deep Learning
Java and Deep Learning
Oswald Campesato
 
Deep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowDeep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlow
Oswald Campesato
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistry
Kenta Oono
 
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Jedha Bootcamp
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
Sujit Pal
 
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
Wuhyun Rico Shin
 
Deep learning unsupervised learning diapo
Deep learning unsupervised learning diapoDeep learning unsupervised learning diapo
Deep learning unsupervised learning diapo
Milton Paja
 
AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...
AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...
AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...
Tomoyuki Suzuki
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural Networks
Databricks
 
Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)
Universitat Politècnica de Catalunya
 
nlp dl 1.pdf
nlp dl 1.pdfnlp dl 1.pdf
nlp dl 1.pdf
nyomans1
 
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
台灣資料科學年會
 
Document Analysis with Deep Learning
Document Analysis with Deep LearningDocument Analysis with Deep Learning
Document Analysis with Deep Learning
aiaioo
 
Citython presentation
Citython presentationCitython presentation
Citython presentation
Ankit Tewari
 
Data Summer Conf 2018, “From the math to the business value: machine learning...
Data Summer Conf 2018, “From the math to the business value: machine learning...Data Summer Conf 2018, “From the math to the business value: machine learning...
Data Summer Conf 2018, “From the math to the business value: machine learning...
Provectus
 
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
KaoNet: Face Recognition and Generation App using Deep Learning
KaoNet: Face Recognition and Generation App using Deep LearningKaoNet: Face Recognition and Generation App using Deep Learning
KaoNet: Face Recognition and Generation App using Deep Learning
Van Huy
 
Machine Learning in Computer Chess: Genetic Programming and KRK
Machine Learning in Computer Chess: Genetic Programming and KRKMachine Learning in Computer Chess: Genetic Programming and KRK
Machine Learning in Computer Chess: Genetic Programming and KRKbutest
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structures
elliando dias
 

Similar to Machine learning for document analysis and understanding (20)

Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Java and Deep Learning
Java and Deep LearningJava and Deep Learning
Java and Deep Learning
 
Deep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowDeep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlow
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistry
 
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
 
Deep learning unsupervised learning diapo
Deep learning unsupervised learning diapoDeep learning unsupervised learning diapo
Deep learning unsupervised learning diapo
 
AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...
AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...
AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural Networks
 
Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)
 
nlp dl 1.pdf
nlp dl 1.pdfnlp dl 1.pdf
nlp dl 1.pdf
 
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
 
Document Analysis with Deep Learning
Document Analysis with Deep LearningDocument Analysis with Deep Learning
Document Analysis with Deep Learning
 
Citython presentation
Citython presentationCitython presentation
Citython presentation
 
Data Summer Conf 2018, “From the math to the business value: machine learning...
Data Summer Conf 2018, “From the math to the business value: machine learning...Data Summer Conf 2018, “From the math to the business value: machine learning...
Data Summer Conf 2018, “From the math to the business value: machine learning...
 
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
 
KaoNet: Face Recognition and Generation App using Deep Learning
KaoNet: Face Recognition and Generation App using Deep LearningKaoNet: Face Recognition and Generation App using Deep Learning
KaoNet: Face Recognition and Generation App using Deep Learning
 
Machine Learning in Computer Chess: Genetic Programming and KRK
Machine Learning in Computer Chess: Genetic Programming and KRKMachine Learning in Computer Chess: Genetic Programming and KRK
Machine Learning in Computer Chess: Genetic Programming and KRK
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structures
 

More from Seiichi Uchida

1 データとデータ分析
1 データとデータ分析1 データとデータ分析
1 データとデータ分析
Seiichi Uchida
 
9 可視化
9 可視化9 可視化
9 可視化
Seiichi Uchida
 
13 分類とパターン認識
13 分類とパターン認識13 分類とパターン認識
13 分類とパターン認識
Seiichi Uchida
 
12 非構造化データ解析
12 非構造化データ解析12 非構造化データ解析
12 非構造化データ解析
Seiichi Uchida
 
0 データサイエンス概論まえがき
0 データサイエンス概論まえがき0 データサイエンス概論まえがき
0 データサイエンス概論まえがき
Seiichi Uchida
 
15 人工知能入門
15 人工知能入門15 人工知能入門
15 人工知能入門
Seiichi Uchida
 
14 データ収集とバイアス
14 データ収集とバイアス14 データ収集とバイアス
14 データ収集とバイアス
Seiichi Uchida
 
10 確率と確率分布
10 確率と確率分布10 確率と確率分布
10 確率と確率分布
Seiichi Uchida
 
8 予測と回帰分析
8 予測と回帰分析8 予測と回帰分析
8 予測と回帰分析
Seiichi Uchida
 
7 主成分分析
7 主成分分析7 主成分分析
7 主成分分析
Seiichi Uchida
 
6 線形代数に基づくデータ解析の基礎
6 線形代数に基づくデータ解析の基礎6 線形代数に基づくデータ解析の基礎
6 線形代数に基づくデータ解析の基礎
Seiichi Uchida
 
5 クラスタリングと異常検出
5 クラスタリングと異常検出5 クラスタリングと異常検出
5 クラスタリングと異常検出
Seiichi Uchida
 
4 データ間の距離と類似度
4 データ間の距離と類似度4 データ間の距離と類似度
4 データ間の距離と類似度
Seiichi Uchida
 
3 平均・分散・相関
3 平均・分散・相関3 平均・分散・相関
3 平均・分散・相関
Seiichi Uchida
 
2 データのベクトル表現と集合
2 データのベクトル表現と集合2 データのベクトル表現と集合
2 データのベクトル表現と集合
Seiichi Uchida
 
「あなたがいま読んでいるものは文字です」~画像情報学から見た文字研究のこれから
「あなたがいま読んでいるものは文字です」~画像情報学から見た文字研究のこれから「あなたがいま読んでいるものは文字です」~画像情報学から見た文字研究のこれから
「あなたがいま読んでいるものは文字です」~画像情報学から見た文字研究のこれから
Seiichi Uchida
 
データサイエンス概論第一=8 パターン認識と深層学習
データサイエンス概論第一=8 パターン認識と深層学習データサイエンス概論第一=8 パターン認識と深層学習
データサイエンス概論第一=8 パターン認識と深層学習
Seiichi Uchida
 
データサイエンス概論第一=7 画像処理
データサイエンス概論第一=7 画像処理データサイエンス概論第一=7 画像処理
データサイエンス概論第一=7 画像処理
Seiichi Uchida
 
An opening talk at ICDAR2017 Future Workshop - Beyond 100%
An opening talk at ICDAR2017 Future Workshop - Beyond 100%An opening talk at ICDAR2017 Future Workshop - Beyond 100%
An opening talk at ICDAR2017 Future Workshop - Beyond 100%
Seiichi Uchida
 
データサイエンス概論第一 6 異常検出
データサイエンス概論第一 6 異常検出データサイエンス概論第一 6 異常検出
データサイエンス概論第一 6 異常検出
Seiichi Uchida
 

More from Seiichi Uchida (20)

1 データとデータ分析
1 データとデータ分析1 データとデータ分析
1 データとデータ分析
 
9 可視化
9 可視化9 可視化
9 可視化
 
13 分類とパターン認識
13 分類とパターン認識13 分類とパターン認識
13 分類とパターン認識
 
12 非構造化データ解析
12 非構造化データ解析12 非構造化データ解析
12 非構造化データ解析
 
0 データサイエンス概論まえがき
0 データサイエンス概論まえがき0 データサイエンス概論まえがき
0 データサイエンス概論まえがき
 
15 人工知能入門
15 人工知能入門15 人工知能入門
15 人工知能入門
 
14 データ収集とバイアス
14 データ収集とバイアス14 データ収集とバイアス
14 データ収集とバイアス
 
10 確率と確率分布
10 確率と確率分布10 確率と確率分布
10 確率と確率分布
 
8 予測と回帰分析
8 予測と回帰分析8 予測と回帰分析
8 予測と回帰分析
 
7 主成分分析
7 主成分分析7 主成分分析
7 主成分分析
 
6 線形代数に基づくデータ解析の基礎
6 線形代数に基づくデータ解析の基礎6 線形代数に基づくデータ解析の基礎
6 線形代数に基づくデータ解析の基礎
 
5 クラスタリングと異常検出
5 クラスタリングと異常検出5 クラスタリングと異常検出
5 クラスタリングと異常検出
 
4 データ間の距離と類似度
4 データ間の距離と類似度4 データ間の距離と類似度
4 データ間の距離と類似度
 
3 平均・分散・相関
3 平均・分散・相関3 平均・分散・相関
3 平均・分散・相関
 
2 データのベクトル表現と集合
2 データのベクトル表現と集合2 データのベクトル表現と集合
2 データのベクトル表現と集合
 
「あなたがいま読んでいるものは文字です」~画像情報学から見た文字研究のこれから
「あなたがいま読んでいるものは文字です」~画像情報学から見た文字研究のこれから「あなたがいま読んでいるものは文字です」~画像情報学から見た文字研究のこれから
「あなたがいま読んでいるものは文字です」~画像情報学から見た文字研究のこれから
 
データサイエンス概論第一=8 パターン認識と深層学習
データサイエンス概論第一=8 パターン認識と深層学習データサイエンス概論第一=8 パターン認識と深層学習
データサイエンス概論第一=8 パターン認識と深層学習
 
データサイエンス概論第一=7 画像処理
データサイエンス概論第一=7 画像処理データサイエンス概論第一=7 画像処理
データサイエンス概論第一=7 画像処理
 
An opening talk at ICDAR2017 Future Workshop - Beyond 100%
An opening talk at ICDAR2017 Future Workshop - Beyond 100%An opening talk at ICDAR2017 Future Workshop - Beyond 100%
An opening talk at ICDAR2017 Future Workshop - Beyond 100%
 
データサイエンス概論第一 6 異常検出
データサイエンス概論第一 6 異常検出データサイエンス概論第一 6 異常検出
データサイエンス概論第一 6 異常検出
 

Recently uploaded

Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 

Recently uploaded (20)

Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 

Machine learning for document analysis and understanding

  • 1. 1 Machine learning for document analysis and understanding TC10/TC11 Summer School on Document Analysis: Traditional Approaches and New Trends @La Rochelle, France. 8:30-10:30, 4th July 2018 Seiichi Uchida, Kyushu University, Japan
  • 2. 2 The Nearest Neighbor Method The simplest ML for pattern recognition; Everything starts from it!
  • 3. 3 The nearest neighbor method: Learning = memorizing Input PorkBeef Orange Watermelon Pineapple Fish Which reference pattern is the most similar? Reference patterns
  • 4. 4 Each pattern is represented as a feature vector Color feature Texture feature Pork=(10, 2.5, 4.3) *Those numbers are just a random example Note: In the classical nearest neighbor method, those features are designed by human
  • 5. 5 A different pattern becomes a different feature vector Beef = (8, 2.6, 0.9) *Those numbers are just a random example Pork=(10, 2.5, 4.3) *Those numbers are just a random example Color feature Texture feature
  • 6. 6 Reference patterns in the feature vector space Color feature Texture feature
  • 7. 7 An input pattern in the feature vector space We want to recognize this input x Color feature Texture feature
  • 8. 8 input x Nearest neighbor method in the feature vector space Nearest neighbor input = orange Color feature Texture feature
  • 9. 99 How do you define “the nearest neighbor”? Distance-based The smallest distance gives the nearest neighbor Ex. • Euclidean distance / Similarity-based The largest similarity gives the nearest neighbor Ex. • Inner product • Cosine similarity 𝐱 𝐲 𝐱 𝐲 x ?
  • 10. 1010 Do you remember an important property of “inner product”? If and are in the similar direction, their inner product becomes larger The inner product evaluates the similarity between and
  • 11. 11 Well, two different types of features (Note: important to understand deep learning) Features defined by the pattern itself  Orange pixels→ Many  Blue pixels → Rare  Roundness → High  Symmetry →High  Texture → Fine … Features defined by the similarity to others  Similarity to ”car” → Low  Similarity to ”apple” → High  Similarity to “monkey”→Low  Similarity to “Kaki” (persimmon) →Very high …
  • 12. 12 The nearest neighbor method with similarity-based feature vectors Similarity to “Kaki” Similarity to “car” Important note: Similarity is used for not only feature extraction but also classification
  • 13. 13 A shallow explanation of neural networks Don’t think it is a black box. If you know “inner-product”, it becomes
  • 14. 14 The neuron – its reality https://commons.wikimedia.org/
  • 15. 15 From reality to computational model https://commons.wikimedia.org/ input g  xgx 1x jx dx 1w jw dw f  xg
  • 16. 16 The neuron by computer Σ  xg 1x jx dx 1 …… b  bf bxwfg T d j jj            xw x     1 )( x 1w jw dw f f: non-linear func. input output
  • 17. 17 The neuron by computer Σ  xg 1x jx dx 1 …… b x 1w jw dw f f: non-linear func. Let’s forget  bf bxwfg T d j jj            xw x     1 )(
  • 18. 18 The neuron by computer Σ  xg 1x jx dx 1 …… b x 1w jw dwLet’s forget   d j jj bxwg 1 )(x
  • 19. 19 The neuron by computer Σ 1x jx dx …… xwT just “inner product” of two vectors x 1w jw dw w xw x T d j jj xwg        1 )(
  • 20. 20 So, a neuron calculate… xwT Σ 1x jx dx …… 1w jw dw xw andbetweensimilarityA =0.9 if they are similar =0.02 if they are dissimilar
  • 21. 21 So, if we have K neurons, we have a K-dimensional similarity-based feature vector …               xw xw xw T K T T  2 11w 2w Kwx 1x jx dx 0.9 0.05 0.75 x
  • 22. 22 K-dimensional similarity-based feature vector by K neurons 0.9 0.05 0.75 input equiv. similarity to similarity to
  • 23. 23 Another function of the inner product Similarity-based classification! (Yes, the nearest neighbor method!) Σ 1x jx dx …… x reference pattern of class k
  • 24. 24 Note: Multiple functions are realized by just combining neurons! Just by layering the neuron elements, we can have a complete recognition system! … Feature extraction 1w Kw 1x jx dx …… 2w Classification AV CV BV Similarity to class A Similarity to class B Similarity to class C Choose max
  • 25. 25 Now the time for deep neural networks 1x jx dx feature extraction layers …… … f f f … classification f f f
  • 26. 26 An example: AlexNet “Deep” neural network called AlexNet A Krizhevsky, NIPS2012 feature extraction layers classification layers
  • 27. 27 Now the time for deep neural networks 1x jx dx feature extraction layers …… … f f f … Classification f f f Why do we need to repeat feature extraction?
  • 28. 28 Why do we need to repeat feature extraction? A D C B E F A difficult classification task
  • 29. 29 Why do we need to repeat feature extraction? A D C B E F 1w2w
  • 30. 30 Why do we need to repeat feature extraction? A D C B E F 1w2w F A B C D E Large similarity to 𝐰 Small similarity to 𝐰 similarity to similarity to𝐰 Note: The lower picture is not very accurate (because it does not use inner-product-based but distance-based space transformation. However I believe that it does not seriously damage the explanation here.
  • 31. 31 Why do we need to repeat feature extraction? A D C B E F 1w2w F A B C D E It becomes more separable but still not very separable similarity to similarity to𝐰
  • 32. 32 Why do we need to repeat feature extraction? A D C B E F 1w2w F A B C D E3w 4w similarity to similarity to𝐰
  • 33. 33 Why do we need to repeat feature extraction? A D C B E F 1w2w F A B C D E3w 4w similarity to similarity to𝐰 A D E B C F similarity to similarity to𝐰
  • 34. 34 F A B C D E3w 4w similarity to similarity to𝐰 A D E B C F similarity to similarity to𝐰 Why do we need to repeat feature extraction? A D C B E F 1w2w 3w 4w Wow, they become separable!
  • 35. 35 Why do we need to repeat feature extraction? A D C B E F 1w2w Now two classes become totally separable by 2v 1v A D E B C F similarity to similarity to𝐰 A D E B C F similarity to similarity to𝐰 F A B C D E3w 4w similarity to similarity to𝐰
  • 36. 36 Remembering the non-linear function Σ  xg 1x jx dx 1 …… b x 1w jw dw f f: non-linear func.
  • 37. 37 The typical non-linear function: Rectified linear function (ReLU) Σ  xg 1x jx dx 1 …… b x 1w jw dw f Rectified linear function
  • 38. 3838 How does ReLU affect the similarity-based feature? Minus elements in the feature vector are forced to be zero xwT 1 xwT K f Unchanged Unchanged f
  • 39. 39 How to train neural networks: Super-superficial explanation
  • 40. 40 In order to realize a DNN with an expected “input-output” relation … 1w Kw 1x jx dx …… 2w AV CV BV Similarity to class A Similarity to class B Similarity to class C Those parameters should be tuned 1w AV2w
  • 41. 41 Training DNN; the goal Class B Class A DNN Knobs Perfect classification boundary Note: Actual number of #knobs (=#parameters)
  • 42. 42 Training DNN; error-correcting learning by back propagation NG tuning NG NG NG Initial status tuning OK, end. boundary
  • 43. 4343 Advanced topic: Why (SGD-based) back- propagation works? Many theoretical researches have been done [Choromanska+, PMLR2015] [Wu+, arXiv2017] Under several assumptions, local minima is close to the global minimum. flat basin of loss surface
  • 44. 44 Knob = weight = a pattern for similarity-based feature Σ 1x jx dx …… input weight similarity to similarity to This pattern is automatically derived through training…
  • 45. 45 Optimal feature is extracted automatically through training (Representation learning) Google’s cat https://googleblog.blogspot.jp/2012/06/ similarity to similarity toDetermined automatically
  • 46. 46 DNN for image classification: Convolutional neural networks (CNN)
  • 47. 47 kw How to deal with images by DNN? x xwT k 400million-dim vector 400million-dim vector ①Intractable computations ②Enormous parameters
  • 48. 4848 kw Convolution = Repeating “local inner product” operations = Linear filtering x ji T k ,xw Low-dimensional vector ji,x ①Tractable computations ②Trainable #parameters
  • 49. 49 kw Convolutional layer x ji,x = Use the same weight (filter coefficient) at all locations “Filtered” image
  • 50. 50 kw Pooling layer x ji,x Keep only the maximum value ①Deformation compensation ②Local info aggregation
  • 51. 51 Application to DAR: Isolated character recognition machine printed handwritten designed fonts 95.49% 99.79% 99.99% [Uchida+, ICFHR2016] Near-human performance
  • 52. 5252 Application to DAR: Breaking Captcha 99.8% by 1 million training samples [Goodfellow+, ArXiv, 2014]
  • 53. 53 Application to DAR: Detecting a component in a character imageMulti-part component [Iwana+, ICDAR2017] Q: Can CNN detect complex components accurately?
  • 54. 54 Application to DAR: Font Recognition (DeepFont) [Wang+, ACMMM2015]
  • 56. 56 CNN can be used as a feature extractor 1x jx dx feature extraction layers …… … f f f … classification (discarded) f f f 1x jx dx …… … f f f Another classifier e.g., SVM and LSTM Anomaly detector Clustering great
  • 57. 5757 The current CNN does not “understand” characters yet Adversarial examples [Abe+, unpublished] Motivated by [Nguyen+, CVPR2015] Likelihood values for classes “A” and “B”
  • 58. 58 On the other hand, CNN can learn “math operation” through images input images output “image” showing the sum [Hoshen+, AAAI, 2016]
  • 59. 5959 Visualization for deep learning: DeCAF [Donahue+, arXiv 2013] Visualizing the pattern distribution at each layer Near to the input layer Near to the output layer
  • 60. 6060 Visualization for deep learning: DeepDream and its relations Finding an input image that excites a neuron at a certain layer https://distill.pub/2017/feature-visualization/
  • 61. 6161 Visualization for deep learning: Layer-wise Relevance Propagation (LRP) Finding pixels which contribute the final decision by a backward process http://www.explain-ai.org/
  • 62. 62 Visualization for deep learning: Local sensitivity analysis by making a hole Motivated by [Zeiler+, arXiv, 2013][Ide+, Unpublished] Likelihood of class “0” degrades a lot by making a hole around the pixel
  • 63. 6363 Visualization for deep learning: Grad-CAM [Selvaraju+, arXiv2016] Finding pixels which contribute the final decision by a backward process http://gradcam.cloudcv.org/
  • 64. 6464 tensorflow playground by Google https://playground.tensorflow.org/
  • 66. 6666 Auto encoder (= Nonlinear principal component analysis) Training the network to output the input App: Denoising by convolutional auto-encoder Compact representation of the input wikipedia https://blog.sicara.com/keras-tutorial-content-based-image-retrieval-convolutional-denoising-autoencoder-dc91450cc511
  • 67. 6767 U-Net: Conv-Deconv net that outputs an image [Ronneberger+, MICCAI2015] Skip connection cell image cell boundary image
  • 68. 68 Application to DAR: Scene text eraser [Nakamura+, ICDAR2017]
  • 69. 6969 Application to DAR: Binarization ICDAR-DIBCO2017 Winner (Smart Engines Ltd, Moscow, Russia) used U-net [Pratikakis+, ICDAR2017]
  • 70. 70 Application to DAR: Dewarping [Ma+, CVPR2018] Stacked U-nets
  • 71. 7171 Note: Deep Image Prior [Ulyanov+, CVPR2018] Conv-Deonv structure has an inherent characteristics which is suitable for image completion and other “low-pass” operations train a conv-deconv net just to generate the left image but it results in the right image
  • 72. 7272 Generative Adversarial Networks The battle of two neural networks VS Generate “fake bill” Discriminate fake or real bill Generator Discriminator Fake bill becomes more and more realistic
  • 73. 73 Application to DAR: (Our) Style-consistent font generation [Hayashi+, unpublished]
  • 74. 74 Application to DAR: Oh no.. CVPR2018 was filled by Font-GANs
  • 75. 75 Huge variety of GANs: Just several examples… StackGANCycleGAN Standard GAN (DCGAN) https://www.slideshare.net/YunjeyChoi/generative-adversarial-networks-75916964 condition (class) Conditional GAN
  • 76. 76 Style Transfer [Gatys+, CVPR2016] style image (given) content image (given) generated image
  • 77. 77 Style Transfer [Gatys+, CVPR2016] style image (given) content image (given) generated image similar internal outputs similar internal output
  • 78. 78 Application to DAR: Font Style Transfer [Gantugs+, DAS2018]
  • 79. 79 SSD (Single Shot MultiBox Detector) Fully-Conv Net that outputs bounding boxes [Liu+, ECCV2016]
  • 80. 80 Application to DAR: EAST: An Efficient and Accurate Scene Text Detector [Zhou+, “EAST: An Efficient and Accurate Scene Text Detector”, CVPR2017] Evaluating bounding box shape
  • 81. 81 Long short-term memory(LSTM), which is the most typical Recurrent Neural Networks
  • 82. 82 LSTM (Long short-term memory): A recurrent neural network … … … … Recurrent structure Info from all the past Gate structure Active info selection input vector output vector Also very effective for solving the vanishing gradient problem in t-direction [Graves+, TPAMI2009]
  • 83. 83 LSTM NN Recurrent NN Recurrent structure Info from all the past LSTM NN Gate structure Active info selection input output input output input gate forget gate output gate [Graves+, TPAMI2009]
  • 84. 84 Standard LSTM NN-based HWR Character class Feature vector sequence
  • 85. 85 Extension to Bi-directional LSTM Character class Feature vector sequence combine Output using the past info Output using the future info
  • 86. 86 Deep BLSTM network [Frinken-Uchida, ICDAR2015] Output layer Input layer LSTM layer LSTM layer LSTM layer
  • 87. 87 Application to DAR: Convolutional Recurrent Neural Network (CRNN) An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition, IEEE TPAMI, 2017
  • 88. 88 Image Captioning (CNN+LSTM): Converting an image to a “document” [Vinyals+, arXiv2015]
  • 89. 89 Application to DAR: End-to-end math OCR (Image LaTeX) [Deng+, Image-to-Markup Generation with Coarse-to-Fine Attention, arXiv2017]
  • 90. 90 More conventional machine learning techniques (SVM, -machine, AdaBoost)
  • 91. 91 Support Vector Machines (SVM) Still the best choice when the amount of data is insufficient
  • 92. 92 Linear discriminant function A A AB B x training patterns from class A and B
  • 93. 93 Linear discriminant function A A AB B x bT xw positive =classA negative =classB
  • 94. 94 Linear discriminant function A A AB B x bT xw positive =classA negative =classB misrecognized
  • 95. 95 Linear discriminant function A A AB B x bT xw positive =classA negative =classB no misrecognition!
  • 96. 96 Which one is the best? A A AB B x bT xw positive =classA negative =classB All of those functions can recognize all training patterns…
  • 97. 97 Don’t forget unseen patterns… A A AB B x bT xw positive =classA negative =classB B A We might have those patterns around the class boundary A
  • 98. 98 Max-margin classification A A AB B x bT xw positive =classA negative =classB margin margin
  • 99. 99 A A AB B How can we get it? Minimize the slope under constraints x bT xw 1 -1 For us, the function value should be more than 1 For us, the function value should be more than 1
  • 100. 100 A A AB B How can we get it? Minimize the slope under constraints x bT xw 1 -1 NG OK NG
  • 101. 101 A A AB B How can we get it? Minimize the slope under constraints x bT xw 1 -1 nail
  • 102. 102 A A AB B How can we get it? Minimize the slope under constraints x bT xw 1 -1 the minimum slope satisfying the constraints
  • 103. 103 A A AB B How can we get it? Minimize the slope under constraints x bT xw 1 -1 It also gives the maximum margin classification!
  • 104. 104 A A AB B Support vectors x bT xw 1 -1 SV SV Only those SVs contribute to determine the discriminant function
  • 106. 106 No solution that satisfies the constraints: Not linearly-separable A A AB B x bT xw 1 -1 B
  • 107. 107 A relaxation: Replace the constraint as a penalty A A AB B x bT xw 1 -1 B Penalty Penalty Minimize “slope + penalty”
  • 108. 108 -machine An old partner for linear classifier The idea of “kernel” comes from this
  • 109. 109 Mapping the feature vector space to a higher-dimensional space 1x 2x 1 1 0 Not linearly-separable 1x 21xx 2x Linearly-separable!                  21 2 1 2 1 xx x x x x :
  • 110. 110 What happens in the original space 1x 2x 21xx
  • 111. 111 What happens in the original space 1y 2y 3y dcybyay  321 A plane in 3D space Rewrite
  • 112. 112 What happens in the original space 1x 2x 21xx dxcxbxax  2121 ??? What is this? Revert
  • 113. 113 What happens in the original space 1x 2x 1 1 0 1 1 2 2121 cxb axd x dxcxbxax    識別面: Linear classification in the higher-space corresponds to a non-linear classification in the original space Classification boundary
  • 114. 114 Another example A A B B 1x 2x B 1x 2 2 1 xx  B A A 2x                   2 2 1 2 1 2 1 xx x x x x :
  • 115. 115 What happens in the original space    daxcx cb x dxxcbxax     1 2 12 2 2 121 1 B 1x 2 2 1 xx  B A A 2x A A B B 1x 2x
  • 116. 116116 Notes about -machine Combination with SVM is popular  -function leads “kernel” Choosing a good mapping is not trivial In the past, the choice was done by try-and-error Recently….             i iji ji jiji i ij T i ji jiji i ij T i ji jiji kyy yy yy    xx xx xx , , , ,
  • 117. 117117 Deep neural networks can find a good mapping automatically Feature extraction layer = a mapping The mapping is specified by the weight The weight (i.e, ) is optimized via training It is so-called “representation learning” … …
  • 120. 120 Weighted majority voting 1g cg… … Cg … …1cg Two-class classifier that returns: 1 for class A, -1 for class B input x A A A B 0.7 0.02 0.2 0.15 0.7 -0.2 ifsum>0thenA;elseB A well, how do we decide the weighs?
  • 121. 121 AdaBoost: A set of complementary classifiers 1g 0.7 training patterns 1.Train ifsum>0thenA;elseB 2. Reliability
  • 122. 122 AdaBoost: A set of complementary classifiers 0.7 training patterns ifsum>0thenA;elseB 3. Give a large (small) weight to each sample which is misrecognized (correctly recognized) by
  • 123. 123 training patterns ifsum>0thenA;elseB AdaBoost: A set of complementary classifiers 0.7 0.43 4. Training with the weight (Patterns with larger weight should be recognized correctly) 5. Reliability
  • 124. 124 training patterns ifsum>0thenA;elseB AdaBoost: A set of complementary classifiers 0.7 0.43 6. Give a large (small) weight to each sample which is misrecognized (correctly recognized) by Repeat until convergence of training accuracy
  • 125. 125125 Today I cannot explain the following ML techniques…  Semi-supervised learning methods  ex. constrained clustering, virtual adversarial training,  Weakly-supervised learning methods  ex. Multiple-instance learning  Unsupervised learning methods  Clustering, self-organizing feature maps, intrinsic dimensionality  Ensemble methods  Random forests, ECOC, bagging, random subspace  Robust regression  Hidden Markov models, graphical models  Error-correcting learning (and perceptron)  Statistical inference  Esp. Gaussian mixtures, maximum likelihood, Bayesian estimation
  • 127. 127 Near-human performance has been achieved by big data and neural networks machine printed handwritten designed fonts 95.49% 99.79% 99.99% [Uchida+, ICFHR2016] [Zhou+, CVPR2017] Scene text detection Scene text recognition CRNN [Shi+, TPAMI, 2017] F value=0.8 on ICDAR2015 Incidental scene text 89.6% word recog. rate on ICDAR2013
  • 128. 128 Now we can imagine what we can do in the world
  • 129. 129 Beyond 100% = Computer can detect, read, and collect all text information perfectly Texts on notebook Texts on object label Texts on digital display Texts on book page Texts on signboard Texts on poster / ad So, what do want to do with the perfect recognition results?
  • 130. 130 Poor recognition results In fact, our real goal should NOT be perfect recognition results Real goals Ultimate application by using perfect recognition results Scientific discovery by analyzing perfect recognition results Perfect recognition results Tentative goal
  • 131. 131 What will you do in the world beyond 100%? Ultimate application  Education  “Total-recall” for perfect information search  Welfare  Alarm, translation, information complement  “Life log”-related apps  Summary, log compression, captioning, question answering, behavior prediction, reminder Scientific discovery  With social science  Interaction between scene text and human  Text statistics  With design science  Font shape and impression  Discovering typographic knowledge  With humanities  Historical knowledge  Semiology
  • 132. 132132 Another direction: Use characters to understand ML Simple binary and stroke-structured pattern Less background clutter Small size (ex. 32x32) Big data (ex. 80,000 samples / class) Predefined classes (ex. 10 classes for digits) ML has achieved near-human performance Very good “testbed” for not only evaluating but also understanding ML
  • 133. 133 The last message... ... and please do NOT become an accuracist, parameter-tuner, and libraholic!