The document discusses machine learning and document analysis using neural networks. It begins with an overview of the nearest neighbor method and how neural networks perform similarity-based classification and feature extraction. It then explains how neural networks work by calculating inner products between input and weight vectors. The document outlines how repeating these feature extraction layers allows the network to learn more complex patterns and separate classes. It provides examples of convolutional neural networks for tasks like document image analysis and discusses techniques for training networks and visualizing their representations.
Deep Learning - Overview of my work IIMohamed Loey
Deep Learning Machine Learning MNIST CIFAR 10 Residual Network AlexNet VGGNet GoogleNet Nvidia Deep learning (DL) is a hierarchical structure network which through simulates the human brain’s structure to extract the internal and external input data’s features
발표자: 이인웅 (연세대 박사과정)
발표일: 2017.12.
개요:
영상에서 사람의 행동을 인식하는 방법은 크게 영상에서 직접적으로 행동 라벨을 추출하는 것과 자세 정보를 기반으로 행동 라벨을 추출하는 것으로 나뉠 수 있습니다.
본 발표는 행동 인식에 대한 전반적인 개요를 설명하고 그 중에서도 사람의 자세 정보를 기반으로 하는 행동 인식 기술에 초점을 두고 최근 ICCV 2017 학회에서 발표된 Temporal Sliding LSTM 네트워크를 활용한 행동 인식 기술을 중점적으로 설명합니다. 구체적으로, 스켈레톤 기반 행동 인식 이슈, 제안하는 방법과 실험 결과들이 소개되고 앞으로 나아갈 만한 새로운 연구 이슈들도 추가적으로 설명합니다.
https://telecombcn-dl.github.io/dlmm-2017-dcu/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
Deep Learning - Overview of my work IIMohamed Loey
Deep Learning Machine Learning MNIST CIFAR 10 Residual Network AlexNet VGGNet GoogleNet Nvidia Deep learning (DL) is a hierarchical structure network which through simulates the human brain’s structure to extract the internal and external input data’s features
발표자: 이인웅 (연세대 박사과정)
발표일: 2017.12.
개요:
영상에서 사람의 행동을 인식하는 방법은 크게 영상에서 직접적으로 행동 라벨을 추출하는 것과 자세 정보를 기반으로 행동 라벨을 추출하는 것으로 나뉠 수 있습니다.
본 발표는 행동 인식에 대한 전반적인 개요를 설명하고 그 중에서도 사람의 자세 정보를 기반으로 하는 행동 인식 기술에 초점을 두고 최근 ICCV 2017 학회에서 발표된 Temporal Sliding LSTM 네트워크를 활용한 행동 인식 기술을 중점적으로 설명합니다. 구체적으로, 스켈레톤 기반 행동 인식 이슈, 제안하는 방법과 실험 결과들이 소개되고 앞으로 나아갈 만한 새로운 연구 이슈들도 추가적으로 설명합니다.
https://telecombcn-dl.github.io/dlmm-2017-dcu/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
A presentation on Human Activity Recognition catered to the audience from an HCI or CS background. (Based on research by Bulling, A. et al. 2014. A tutorial on human activity recognition using body-worn inertial sensors. CSUR. 46, 3 (2014), 33.)
Artificial Intelligence, Machine Learning, Deep Learning
The 5 myths of AI
Deep Learning in action
Basics of Deep Learning
NVIDIA Volta V100 and AWS P3
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Simplilearn
This Random Forest Algorithm Presentation will explain how Random Forest algorithm works in Machine Learning. By the end of this video, you will be able to understand what is Machine Learning, what is classification problem, applications of Random Forest, why we need Random Forest, how it works with simple examples and how to implement Random Forest algorithm in Python.
Below are the topics covered in this Machine Learning Presentation:
1. What is Machine Learning?
2. Applications of Random Forest
3. What is Classification?
4. Why Random Forest?
5. Random Forest and Decision Tree
6. Comparing Random Forest and Regression
7. Use case - Iris Flower Analysis
- - - - - - - -
About Simplilearn Machine Learning course:
A form of artificial intelligence, Machine Learning is revolutionizing the world of computing as well as all people’s digital interactions. Machine Learning powers such innovative automated technologies as recommendation engines, facial recognition, fraud protection and even self-driving cars.This Machine Learning course prepares engineers, data scientists and other professionals with knowledge and hands-on skills required for certification and job competency in Machine Learning.
- - - - - - -
Why learn Machine Learning?
Machine Learning is taking over the world- and with that, there is a growing need among companies for professionals to know the ins and outs of Machine Learning
The Machine Learning market size is expected to grow from USD 1.03 Billion in 2016 to USD 8.81 Billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
- - - - - -
What skills will you learn from this Machine Learning course?
By the end of this Machine Learning course, you will be able to:
1. Master the concepts of supervised, unsupervised and reinforcement learning concepts and modeling.
2. Gain practical mastery over principles, algorithms, and applications of Machine Learning through a hands-on approach which includes working on 28 projects and one capstone project.
3. Acquire thorough knowledge of the mathematical and heuristic aspects of Machine Learning.
4. Understand the concepts and operation of support vector machines, kernel SVM, naive Bayes, decision tree classifier, random forest classifier, logistic regression, K-nearest neighbors, K-means clustering and more.
5. Be able to model a wide variety of robust Machine Learning algorithms including deep learning, clustering, and recommendation systems
- - - - - - -
Natural Language Processing (NLP) - IntroductionAritra Mukherjee
This presentation provides a beginner-friendly introduction towards Natural Language Processing in a way that arouses interest in the field. I have made the effort to include as many easy to understand examples as possible.
Computer vision has received great attention over the last two decades.
This research field is important not only in security-related software but also in the advanced interface between people and computers, advanced control methods, and many other areas.
Presentation for the Berlin Computer Vision Group, December 2020 on deep learning methods for image segmentation: Instance segmentation, semantic segmentation, and panoptic segmentation.
A comprehensive tutorial on Convolutional Neural Networks (CNN) which talks about the motivation behind CNNs and Deep Learning in general, followed by a description of the various components involved in a typical CNN layer. It explains the theory involved with the different variants used in practice and also, gives a big picture of the whole network by putting everything together.
Next, there's a discussion of the various state-of-the-art frameworks being used to implement CNNs to tackle real-world classification and regression problems.
Finally, the implementation of the CNNs is demonstrated by implementing the paper 'Age ang Gender Classification Using Convolutional Neural Networks' by Hassner (2015).
An introduction to Deep Learning (DL) concepts, such as neural networks, back propagation, activation functions, CNNs, RNNs (if time permits), and the CLT/AUT/fixed-point theorems, along with code samples in Java and TensorFlow.
A presentation on Human Activity Recognition catered to the audience from an HCI or CS background. (Based on research by Bulling, A. et al. 2014. A tutorial on human activity recognition using body-worn inertial sensors. CSUR. 46, 3 (2014), 33.)
Artificial Intelligence, Machine Learning, Deep Learning
The 5 myths of AI
Deep Learning in action
Basics of Deep Learning
NVIDIA Volta V100 and AWS P3
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Simplilearn
This Random Forest Algorithm Presentation will explain how Random Forest algorithm works in Machine Learning. By the end of this video, you will be able to understand what is Machine Learning, what is classification problem, applications of Random Forest, why we need Random Forest, how it works with simple examples and how to implement Random Forest algorithm in Python.
Below are the topics covered in this Machine Learning Presentation:
1. What is Machine Learning?
2. Applications of Random Forest
3. What is Classification?
4. Why Random Forest?
5. Random Forest and Decision Tree
6. Comparing Random Forest and Regression
7. Use case - Iris Flower Analysis
- - - - - - - -
About Simplilearn Machine Learning course:
A form of artificial intelligence, Machine Learning is revolutionizing the world of computing as well as all people’s digital interactions. Machine Learning powers such innovative automated technologies as recommendation engines, facial recognition, fraud protection and even self-driving cars.This Machine Learning course prepares engineers, data scientists and other professionals with knowledge and hands-on skills required for certification and job competency in Machine Learning.
- - - - - - -
Why learn Machine Learning?
Machine Learning is taking over the world- and with that, there is a growing need among companies for professionals to know the ins and outs of Machine Learning
The Machine Learning market size is expected to grow from USD 1.03 Billion in 2016 to USD 8.81 Billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
- - - - - -
What skills will you learn from this Machine Learning course?
By the end of this Machine Learning course, you will be able to:
1. Master the concepts of supervised, unsupervised and reinforcement learning concepts and modeling.
2. Gain practical mastery over principles, algorithms, and applications of Machine Learning through a hands-on approach which includes working on 28 projects and one capstone project.
3. Acquire thorough knowledge of the mathematical and heuristic aspects of Machine Learning.
4. Understand the concepts and operation of support vector machines, kernel SVM, naive Bayes, decision tree classifier, random forest classifier, logistic regression, K-nearest neighbors, K-means clustering and more.
5. Be able to model a wide variety of robust Machine Learning algorithms including deep learning, clustering, and recommendation systems
- - - - - - -
Natural Language Processing (NLP) - IntroductionAritra Mukherjee
This presentation provides a beginner-friendly introduction towards Natural Language Processing in a way that arouses interest in the field. I have made the effort to include as many easy to understand examples as possible.
Computer vision has received great attention over the last two decades.
This research field is important not only in security-related software but also in the advanced interface between people and computers, advanced control methods, and many other areas.
Presentation for the Berlin Computer Vision Group, December 2020 on deep learning methods for image segmentation: Instance segmentation, semantic segmentation, and panoptic segmentation.
A comprehensive tutorial on Convolutional Neural Networks (CNN) which talks about the motivation behind CNNs and Deep Learning in general, followed by a description of the various components involved in a typical CNN layer. It explains the theory involved with the different variants used in practice and also, gives a big picture of the whole network by putting everything together.
Next, there's a discussion of the various state-of-the-art frameworks being used to implement CNNs to tackle real-world classification and regression problems.
Finally, the implementation of the CNNs is demonstrated by implementing the paper 'Age ang Gender Classification Using Convolutional Neural Networks' by Hassner (2015).
An introduction to Deep Learning (DL) concepts, such as neural networks, back propagation, activation functions, CNNs, RNNs (if time permits), and the CLT/AUT/fixed-point theorems, along with code samples in Java and TensorFlow.
A fast-paced introduction to Deep Learning that starts with a simple yet complete neural network (no frameworks), followed by an overview of activation functions, cost functions, backpropagation, and then a quick dive into CNNs. Next we'll create a neural network using Keras, followed by an introduction to TensorFlow and TensorBoard. For best results, familiarity with basic vectors and matrices, inner (aka "dot") products of vectors, and rudimentary Python is definitely helpful.
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...Jedha Bootcamp
Reconnaissance de visages sur vos photos Facebook, détection de maladies via imagerie médicale, les applications de la reconnaissance d'images grâce à l'intelligence artificielle offrent de vastes possibilités. Lors de cet événement, Cristina & Pierre - Machine Learning Engineers chez Photobox - vous feront une démonstration des outils de reconnaissance d'images via ces algorithmes de Deep Learning.
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
Slides for talk Abhishek Sharma and I gave at the Gennovation tech talks (https://gennovationtalks.com/) at Genesis. The talk was part of outreach for the Deep Learning Enthusiasts meetup group at San Francisco. My part of the talk is covered from slides 19-34.
Applying your Convolutional Neural NetworksDatabricks
Part 3 of the Deep Learning Fundamentals Series, this session starts with a quick primer on activation functions, learning rates, optimizers, and backpropagation. Then it dives deeper into convolutional neural networks discussing convolutions (including kernels, local connectivity, strides, padding, and activation functions), pooling (or subsampling to reduce the image size), and fully connected layer. The session also provides a high-level overview of some CNN architectures. The demos included in these slides are running on Keras with TensorFlow backend on Databricks.
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
On how to change the utility curve of deep learning to make deep learning projects deliver an ROI no matter how accurate the machine learning system is - presented at the Nasscom Analytics Summit 2018.
Data Summer Conf 2018, “From the math to the business value: machine learning...Provectus
Javier will illustrate the “life-cycle” of a machine learning project, particularly the building of a recommender system, from the technical design (business problem and detailed algorithmic solution) to deployment. I will also briefly mentioned other examples (e.g. marketing multi-channel attribution models using Markov-Chain models, or boosted methods and deep learning for risk modelling) trying to emphasise the journey from the code to the business with a couple of histories of success and failure.
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
An opening talk at ICDAR2017 Future Workshop - Beyond 100%Seiichi Uchida
What are the possible future research directions for OCR researchers (when we achieve almost 100% accuracy)? This slide is for a short opening talk to stimulate audiences. Actually, young researchers on OCR or other document processing-related research need to think about their "NEXT".
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Machine learning for document analysis and understanding
1. 1
Machine learning
for document analysis and
understanding
TC10/TC11 Summer School on Document Analysis:
Traditional Approaches and New Trends
@La Rochelle, France. 8:30-10:30, 4th July 2018
Seiichi Uchida, Kyushu University, Japan
2. 2
The Nearest Neighbor Method
The simplest ML for pattern recognition;
Everything starts from it!
3. 3
The nearest neighbor method:
Learning = memorizing
Input PorkBeef
Orange
Watermelon
Pineapple Fish
Which reference
pattern is the most
similar?
Reference
patterns
4. 4
Each pattern is represented
as a feature vector
Color feature
Texture
feature
Pork=(10, 2.5, 4.3)
*Those numbers are just a random example
Note: In the classical
nearest neighbor method,
those features are
designed by human
5. 5
A different pattern becomes a different
feature vector
Beef = (8, 2.6, 0.9)
*Those numbers are just a random example
Pork=(10, 2.5, 4.3)
*Those numbers are just a random example
Color feature
Texture
feature
7. 7
An input pattern
in the feature vector space
We want to
recognize this
input x
Color feature
Texture
feature
8. 8
input x
Nearest neighbor method
in the feature vector space
Nearest
neighbor
input = orange
Color feature
Texture
feature
9. 99
How do you define
“the nearest neighbor”?
Distance-based
The smallest distance gives
the nearest neighbor
Ex.
• Euclidean distance /
Similarity-based
The largest similarity gives
the nearest neighbor
Ex.
• Inner product
• Cosine similarity
𝐱 𝐲
𝐱 𝐲
x
?
10. 1010
Do you remember an important property
of “inner product”?
If and are in the similar direction, their
inner product becomes larger
The inner product evaluates
the similarity between and
11. 11
Well, two different types of features
(Note: important to understand deep learning)
Features defined by the
pattern itself
Orange pixels→ Many
Blue pixels → Rare
Roundness → High
Symmetry →High
Texture → Fine
…
Features defined by the
similarity to others
Similarity to ”car” → Low
Similarity to ”apple” → High
Similarity to “monkey”→Low
Similarity to “Kaki”
(persimmon) →Very high
…
12. 12
The nearest neighbor method with
similarity-based feature vectors
Similarity
to “Kaki”
Similarity to “car”
Important note:
Similarity is used for not
only feature extraction
but also classification
13. 13
A shallow explanation of
neural networks
Don’t think it is a black box.
If you know “inner-product”, it becomes
15. 15
From reality to computational model
https://commons.wikimedia.org/
input g xgx
1x
jx
dx
1w
jw
dw
f
xg
16. 16
The neuron by computer
Σ xg
1x
jx
dx
1
……
b
bf
bxwfg
T
d
j
jj
xw
x
1
)(
x 1w
jw
dw
f
f: non-linear func.
input
output
17. 17
The neuron by computer
Σ xg
1x
jx
dx
1
……
b
x 1w
jw
dw
f
f: non-linear func.
Let’s
forget
bf
bxwfg
T
d
j
jj
xw
x
1
)(
18. 18
The neuron by computer
Σ xg
1x
jx
dx
1
……
b
x 1w
jw
dwLet’s
forget
d
j
jj bxwg
1
)(x
19. 19
The neuron by computer
Σ
1x
jx
dx
……
xwT
just “inner product”
of two vectors
x
1w
jw
dw
w
xw
x
T
d
j
jj xwg
1
)(
20. 20
So, a neuron calculate…
xwT
Σ
1x
jx
dx
……
1w
jw
dw
xw andbetweensimilarityA
=0.9 if they
are similar
=0.02 if they
are dissimilar
21. 21
So, if we have K neurons, we have
a K-dimensional similarity-based feature vector
…
xw
xw
xw
T
K
T
T
2
11w
2w
Kwx
1x
jx
dx
0.9
0.05
0.75
x
23. 23
Another function of the inner product
Similarity-based classification!
(Yes, the nearest neighbor method!)
Σ
1x
jx
dx
……
x
reference
pattern
of class k
24. 24
Note: Multiple functions are realized by just combining neurons!
Just by layering the neuron elements, we
can have a complete recognition system!
…
Feature extraction
1w
Kw
1x
jx
dx
……
2w
Classification
AV
CV
BV
Similarity
to class A
Similarity
to class B
Similarity
to class C
Choose
max
25. 25
Now the time for deep neural networks
1x
jx
dx
feature extraction layers
……
…
f
f
f
…
classification
f
f
f
26. 26
An example: AlexNet
“Deep” neural network called AlexNet
A Krizhevsky, NIPS2012
feature extraction layers
classification
layers
27. 27
Now the time for deep neural networks
1x
jx
dx
feature extraction layers
……
…
f
f
f
…
Classification
f
f
f
Why do we need to repeat
feature extraction?
28. 28
Why do we need to repeat feature
extraction?
A
D
C
B
E
F
A difficult
classification
task
29. 29
Why do we need to repeat feature
extraction?
A
D
C
B
E
F
1w2w
30. 30
Why do we need to repeat feature
extraction?
A
D
C
B
E
F
1w2w
F
A
B
C
D
E
Large similarity to 𝐰
Small similarity to 𝐰
similarity to
similarity
to𝐰
Note: The lower picture is not
very accurate (because it does not
use inner-product-based but
distance-based space
transformation. However I believe
that it does not seriously damage
the explanation here.
31. 31
Why do we need to repeat feature
extraction?
A
D
C
B
E
F
1w2w
F
A
B
C
D
E
It becomes more
separable
but still not
very separable
similarity to
similarity
to𝐰
32. 32
Why do we need to repeat feature
extraction?
A
D
C
B
E
F
1w2w
F
A
B
C
D
E3w
4w
similarity to
similarity
to𝐰
33. 33
Why do we need to repeat feature
extraction?
A
D
C
B
E
F
1w2w
F
A
B
C
D
E3w
4w
similarity to
similarity
to𝐰
A
D
E
B
C
F
similarity to
similarity
to𝐰
35. 35
Why do we need to repeat feature
extraction?
A
D
C
B
E
F
1w2w Now two classes
become totally
separable by
2v
1v
A
D
E
B
C
F
similarity to
similarity
to𝐰
A
D
E
B
C
F
similarity to
similarity
to𝐰
F
A
B
C
D
E3w
4w
similarity to
similarity
to𝐰
37. 37
The typical non-linear function:
Rectified linear function (ReLU)
Σ xg
1x
jx
dx
1
……
b
x 1w
jw
dw
f
Rectified linear function
38. 3838
How does ReLU affect
the similarity-based feature?
Minus elements in the feature vector are
forced to be zero
xwT
1
xwT
K
f
Unchanged
Unchanged
f
39. 39
How to train neural networks:
Super-superficial explanation
40. 40
In order to realize a DNN with
an expected “input-output” relation
…
1w
Kw
1x
jx
dx
……
2w
AV
CV
BV
Similarity to
class A
Similarity to
class B
Similarity to
class C
Those parameters should be tuned
1w AV2w
41. 41
Training DNN; the goal
Class B
Class A
DNN
Knobs
Perfect
classification
boundary
Note: Actual number of #knobs (=#parameters)
43. 4343
Advanced topic: Why (SGD-based) back-
propagation works?
Many theoretical researches have been done
[Choromanska+, PMLR2015] [Wu+, arXiv2017]
Under several assumptions,
local minima is close to the
global minimum.
flat basin of loss surface
44. 44
Knob = weight
= a pattern for similarity-based feature
Σ
1x
jx
dx
……
input weight
similarity
to
similarity to
This pattern is automatically
derived through training…
45. 45
Optimal feature is extracted automatically
through training (Representation learning)
Google’s cat
https://googleblog.blogspot.jp/2012/06/
similarity
to
similarity toDetermined
automatically
46. 46
DNN for image classification:
Convolutional neural networks
(CNN)
47. 47
kw
How to deal with images by DNN?
x
xwT
k
400million-dim vector
400million-dim vector
①Intractable
computations
②Enormous
parameters
48. 4848
kw
Convolution
= Repeating “local inner product” operations
= Linear filtering
x
ji
T
k ,xw
Low-dimensional
vector
ji,x
①Tractable
computations
②Trainable
#parameters
53. 53
Application to DAR:
Detecting a component in a character imageMulti-part
component
[Iwana+, ICDAR2017]
Q: Can CNN detect complex components accurately?
56. 56
CNN can be used as a feature extractor
1x
jx
dx
feature extraction layers
……
…
f
f
f
…
classification
(discarded)
f
f
f
1x
jx
dx
……
…
f
f
f
Another classifier
e.g., SVM and LSTM
Anomaly
detector
Clustering
great
57. 5757
The current CNN does not
“understand” characters yet
Adversarial examples
[Abe+, unpublished]
Motivated by [Nguyen+, CVPR2015]
Likelihood values for
classes “A” and “B”
58. 58
On the other hand, CNN can learn “math
operation” through images
input images output “image”
showing the sum
[Hoshen+, AAAI, 2016]
59. 5959
Visualization for deep learning:
DeCAF [Donahue+, arXiv 2013]
Visualizing the pattern distribution at each
layer
Near to the input layer Near to the output layer
60. 6060
Visualization for deep learning:
DeepDream and its relations
Finding an input image that excites a neuron
at a certain layer
https://distill.pub/2017/feature-visualization/
61. 6161
Visualization for deep learning:
Layer-wise Relevance Propagation (LRP)
Finding pixels which contribute the final
decision by a backward process
http://www.explain-ai.org/
62. 62
Visualization for deep learning:
Local sensitivity analysis by making a hole
Motivated by [Zeiler+, arXiv, 2013][Ide+, Unpublished]
Likelihood of class “0” degrades a lot by making a hole around the pixel
63. 6363
Visualization for deep learning:
Grad-CAM [Selvaraju+, arXiv2016]
Finding pixels which contribute the final
decision by a backward process
http://gradcam.cloudcv.org/
66. 6666
Auto encoder
(= Nonlinear principal component analysis)
Training the network to output the input
App: Denoising by convolutional auto-encoder
Compact
representation
of the input
wikipedia
https://blog.sicara.com/keras-tutorial-content-based-image-retrieval-convolutional-denoising-autoencoder-dc91450cc511
71. 7171
Note: Deep Image Prior
[Ulyanov+, CVPR2018]
Conv-Deonv structure has an inherent
characteristics which is suitable for image
completion and other “low-pass” operations
train a conv-deconv
net just to generate
the left image but it
results in the right
image
72. 7272
Generative Adversarial Networks
The battle of two neural networks
VS
Generate
“fake bill” Discriminate
fake or real bill
Generator Discriminator
Fake bill becomes more and more realistic
75. 75
Huge variety of GANs:
Just several examples…
StackGANCycleGAN
Standard GAN (DCGAN)
https://www.slideshare.net/YunjeyChoi/generative-adversarial-networks-75916964
condition
(class)
Conditional GAN
79. 79
SSD (Single Shot MultiBox Detector)
Fully-Conv Net that outputs bounding boxes
[Liu+, ECCV2016]
80. 80
Application to DAR:
EAST: An Efficient and Accurate Scene Text Detector
[Zhou+, “EAST: An Efficient and Accurate Scene Text Detector”, CVPR2017]
Evaluating bounding box shape
82. 82
LSTM (Long short-term memory):
A recurrent neural network
… …
… …
Recurrent
structure
Info from
all the past
Gate
structure
Active info
selection
input vector
output vector
Also very effective for solving
the vanishing gradient problem
in t-direction
[Graves+, TPAMI2009]
83. 83
LSTM NN
Recurrent NN
Recurrent
structure
Info from
all the past
LSTM NN
Gate
structure
Active info
selection input
output
input
output
input gate
forget gate
output gate
[Graves+, TPAMI2009]
87. 87
Application to DAR:
Convolutional Recurrent Neural Network (CRNN)
An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text
Recognition, IEEE TPAMI, 2017
99. 99
A A AB B
How can we get it?
Minimize the slope under constraints
x
bT
xw
1
-1
For us, the function value
should be more than 1
For us, the function value
should be more than 1
100. 100
A A AB B
How can we get it?
Minimize the slope under constraints
x
bT
xw
1
-1
NG OK NG
101. 101
A A AB B
How can we get it?
Minimize the slope under constraints
x
bT
xw
1
-1
nail
102. 102
A A AB B
How can we get it?
Minimize the slope under constraints
x
bT
xw
1
-1
the minimum slope
satisfying the constraints
103. 103
A A AB B
How can we get it?
Minimize the slope under constraints
x
bT
xw
1
-1
It also gives the maximum
margin classification!
104. 104
A A AB B
Support vectors
x
bT
xw
1
-1
SV
SV
Only those SVs contribute to
determine the discriminant function
109. 109
Mapping the feature vector space
to a higher-dimensional space
1x
2x
1
1
0
Not linearly-separable
1x
21xx
2x
Linearly-separable!
21
2
1
2
1
xx
x
x
x
x
:
111. 111
What happens in the original space
1y
2y
3y
dcybyay 321
A plane in 3D space
Rewrite
112. 112
What happens in the original space
1x
2x
21xx
dxcxbxax 2121
??? What is this?
Revert
113. 113
What happens in the original space
1x
2x
1
1
0
1
1
2
2121
cxb
axd
x
dxcxbxax
識別面:
Linear classification
in the higher-space
corresponds to a
non-linear classification
in the original space
Classification
boundary
115. 115
What happens in the original space
daxcx
cb
x
dxxcbxax
1
2
12
2
2
121
1
B
1x
2
2
1 xx
B
A
A
2x
A
A
B
B
1x
2x
116. 116116
Notes about -machine
Combination with SVM is popular
-function leads “kernel”
Choosing a good mapping is not trivial
In the past, the choice was done by try-and-error
Recently….
i
iji
ji
jiji
i
ij
T
i
ji
jiji
i
ij
T
i
ji
jiji
kyy
yy
yy
xx
xx
xx
,
,
,
,
117. 117117
Deep neural networks can find
a good mapping automatically
Feature extraction layer = a mapping
The mapping is specified by the weight
The weight (i.e, ) is optimized via training
It is so-called “representation learning”
…
…
121. 121
AdaBoost:
A set of complementary classifiers
1g 0.7
training
patterns
1.Train
ifsum>0thenA;elseB
2. Reliability
122. 122
AdaBoost:
A set of complementary classifiers
0.7
training
patterns
ifsum>0thenA;elseB
3. Give a large (small) weight to each
sample which is misrecognized
(correctly recognized) by
124. 124
training
patterns
ifsum>0thenA;elseB
AdaBoost:
A set of complementary classifiers
0.7
0.43
6. Give a large (small)
weight to each sample which
is misrecognized (correctly
recognized) by
Repeat until
convergence of
training accuracy
127. 127
Near-human performance has been
achieved by big data and neural networks
machine printed
handwritten
designed fonts
95.49%
99.79%
99.99%
[Uchida+, ICFHR2016]
[Zhou+, CVPR2017]
Scene text detection
Scene text recognition
CRNN [Shi+, TPAMI, 2017]
F value=0.8 on
ICDAR2015 Incidental scene text
89.6% word recog. rate
on ICDAR2013
129. 129
Beyond 100% = Computer can detect, read,
and collect all text information perfectly
Texts on notebook
Texts on object label
Texts on digital display
Texts on book page
Texts on signboard
Texts on poster / ad
So, what do want to do
with the perfect recognition results?
130. 130
Poor recognition results
In fact, our real goal should NOT be
perfect recognition results
Real goals
Ultimate application
by using perfect
recognition results
Scientific discovery
by analyzing perfect
recognition results
Perfect recognition results
Tentative goal
131. 131
What will you do
in the world beyond 100%?
Ultimate application
Education
“Total-recall” for perfect
information search
Welfare
Alarm, translation,
information complement
“Life log”-related apps
Summary, log compression,
captioning, question
answering, behavior
prediction, reminder
Scientific discovery
With social science
Interaction between scene
text and human
Text statistics
With design science
Font shape and impression
Discovering typographic
knowledge
With humanities
Historical knowledge
Semiology
132. 132132
Another direction:
Use characters to understand ML
Simple binary and stroke-structured pattern
Less background clutter
Small size (ex. 32x32)
Big data (ex. 80,000 samples / class)
Predefined classes (ex. 10 classes for digits)
ML has achieved near-human performance
Very good “testbed” for
not only evaluating but also understanding ML