deep-learning-and-what's-next-with-Chinese-annotation

Copyright © SAS Institute Inc. All rights reserved.
Deep Learning and What’s Next?
深度学习的前世今生与未来
Tao Wang 王涛
CAST-NC, 2/11/2019
农历大年初八，恭祝大家猪年大吉，诸事发发发！
This presentation is based on information in the public domain
Opinions expressed are solely my own, therefore may not represent the views of my employer

Part 1:
Introduction
简介

AI (人工智能) vs. Machine Learning (机器学习)
Take 1
3
Source: https://towardsdatascience.com/cousins-of-artificial-intelligence-dda4edc27b55

Take 2
Machine learning is a “field of study that gives computers the
ability to learn without being explicitly programmed.”
– Arthur Samuel, 1959
4
“Artificial intelligence (AI), sometimes called machine
intelligence, is intelligence demonstrated by machines.”
– Wikipedia, retrieved 2018

5
Take 3

Take 4 – my own version
6
AI: goal
Analytics: business Machine Learning: means

Styles of Machine Learning (机器学习的种类)
机器学习和人类学习一样：要学得好，就要多做题
Other styles from different perspectives (active learning, transfer learning, multi-task learning,
adversarial learning, …)
Supervised Semi-SupervisedUnsupervised Reinforcement
• Training data
has labelled
target
• Predict label
for unseen
data
• Labels are known
for a subset of data
• A blend of
supervised and
unsupervised
learning
• Labels unknown
• Find patterns
and gain
insights from
the data
• An agent selects
actions to
maximize reward
in an
environment
• Face detection,
fraud
detection,
patient
identification
• Customer
clustering
• Association rule
mining
• Pre-processing for
supervised learning
to reduce labelling
cost and enhance
accuracy
• Game AI
• Robotics
7
Source: [5] X. Hunt, et al.

What Can Machine Learning Do (机器学习的用途) ?
And so many other things!
Prediction
Decision and
Policy-Making
Data
Exploration
Rule
Learning
• Classification
• Regression
• Clustering
• Dimension reduction
• Anomaly detection
• Feature engineering
• Identifying
relational rules
within data
• Association rule
mining
• Supervised
learning
• Semi-supervised
learning
• Often unsupervised
learning
• Also supervised and
semi-supervised
learning
• Reinforcement
learning
• Supervised
learning
• Unsupervised
learning
• Semi-supervised
learning
• Learning through
trial and error to
identify best action
• Game playing
• Control problems
8

9
ML and Deep Learning (机器学习和深度学习)
Machine
Learning
Deep
Learning
AlphaGo
Bidirectional Encode
AlphaFold
Image sources: Siliconangle, googleblog, profacgen

Part 2:
Deep Learning = Deep Neural
Networks (DL = DNN)?
深度学习=深度神经网络?
10

11
DL: Model with Depth (深度学习: 模型和深度)
Shallow
Deep
Learning
• Model with one or a
few layers
• Multiple layers, layer-by-layer
processing
• Feature extraction/transformation
• Learn complex structures
Data
Model
Output
Data
Output
Model
Layer
Layer
Layer
Deep Learning (DL) = Deep Neural Networks (DNN), ignoring subtle stuff

Pros and cons (优点和缺点)
• Advantages
1. Requires minimal feature engineering (end-to-end ML)
2. Flexible structures
3. Learning often improves with more data
4. Proven track records in speech/text processing and image/video recognition
• Disadvantages
1. Difficult to interpret – often treated as a “black-box” model
2. Long training time, over-fitting
3. Hard to train, non-repeatable results, numerous architectures/hyper-parameters
4. Requires a large amount of training data to get good models
12

Why so popular (为何如此流行)?
1. End2end/distributed feature learning
2. Advances in algorithms/optimizations (min-batch, drop-out, BN, SGD, etc.)
3. Cloud computing and GPU made it possible to train very deep models
4. Proven track records in speech/text processing and image/video recognition
13
Source: [6] D. Silver

More about DNN (更多关于深度神经网络)
• When should I use DNN?
• Deal with image/video/text/speech
• Works for small-medium data, but prefers big data
• The underlying model is complex and non-linear
• OK with non-interpretability, and/or have cloud/GPU
• Common DNN architectures
• Deep Forward Nets
• Convolutional neural networks (CNN)
• Recurrent neural networks (RNN)
• Stacked auto-encoders
14

Deep Forward Net (深度向前网络)
• A flat architecture
• Regression (回归) and classification (分类)
DNN
architectures
1
15
Source: [4] W. Thompson

Convolutional neural network (CNN)
卷积神经网络
• A feedforward neural net with conv layers
• 3D volumes of neurons
• Feature extraction （特征提取）
• Applications: image/video recognition, NLP
DNN
architectures
2
16

Recurrent neural network (RNN)
循环神经网络
• Contain at least one feed-back connection (昨日重现)
• Time-series forecasting, speech recognition
DNN
architectures
3
delay
h1(t)h1(t-1)
17

Auto-encoder （自动编码器）
• A generative graphical model
• Feature coding, dimension reduction and compression (压缩)
DNN
architectures
4
18

DNN supported by SAS
19
Source: [7] White paper: How to Do Deep Learning With SAS?

SAS platform for Deep Learning
20

SAS® Visual Data Mining and Machine Learning (VDMML)
Visual “drag & drop” GUI
21

Applications of SAS Deep Learning
22
Source: [7] White paper: How to Do Deep Learning With SAS?

Applications (应用)
Input
DNN
Military
Surveillance
Speech
recognition
Fraud
Detection
Image
classification
Autonomous
Vehicles
Patient
Identification
23

Autonomous vehicles (自动驾驶)
An application of DNN
The tipping point: level 3 Partial Autonomy
Source: https://iq.intel.com/autonomous-cars-road-ahead/
Expected Timeline for Full Autonomy?
Source: https://thelastdriverlicenseholder.com/2016/12/29/expected-timeline-for-full-autonomy/
Focus on Level 3 and deliver!
24

Navigant Research Leaderboard (排行榜)
Automated Driving Vehicles
Source: https://www.navigantresearch.com/research/navigant-research-leaderboard-automated-driving-vehicles
25

End to End Learning for Self-Driving Cars
自动驾驶汽车的端到端学习
• arXiv:1604.07316, Apr 2016, from NVIDIA
• Basic idea: behavioral cloning, train the car to drive like you do
• Uses CNN to map images from cameras to steering commands
• Never explicitly train the CNN to detect/follow lanes, path planning, etc.
26
High-level view of the data collection system Training the CNN Self-driving
Source: [1] M. Bojarski, et al.

CNN architecture & core source code (架构和代码)
27
Read it from bottomup. Input layer, normalization layer, 5 conv2D layers: feature extraction. 3
fully-connected layers, output: controller.
27M connections, 250K parameters, 3MB in size. Source: arXiv:1604.07316
Source: github, the NVIDIA 2016 paper implementation

Part 3:
What’s next?展望未来
28
THE POWER OF
THE PACK
群体的威力
AI with
THE POWER OF
DIVERSITY
多样性的威力
AI with
THE POWER OF
TRUST
信任的威力
AI with

29
Rediscover Deep Learning 重新发现深度学习
End to
End
端到端
1
Distributed
Feature
Learning
分布式特
征学习
2
Big Data
Big Model
大数据
大模型
3

30
Source: Yoshua Bengio
Source: Pablo Picasso
Capsule Net: power of the pack胶囊网络:群体的威力
Source: CB Insights, State of AI Source: Forbes

Capsule Network paper
胶囊网络的论文
• S. Sabour, N. Frosst, G. Hinton, Dynamic Routing Between Capsules,
Google Brain, NIPS 2017, https://arxiv.org/abs/1710.09829
• Introduced years ago by Hinton, but was not working properly until now
• Widely considered as the beginning of a new chapter of deep learning
• Some follow-up papers, such as Matrix Capsules With EM Routing
• https://openreview.net/pdf?id=HJWLfGWRb, ICLR 2018
• Introduced capsule convolution layer and more sophisticated routing
31
Source: http://www.cs.toronto.edu/~hinton

Dynamic Routing Between Capsules
• Idea #1: capsule is an encapsulated vector/matrix in the network
• A capsule is a group of neurons that represents the parameters of some specific feature.
• A vector or matrix is extended from a scalar
• The length represents the probability of the presence of a feature or an object
• Each dimension within the capsule represents the detailed information of location, size,
orientation, etc.
• Idea #2: routing by agreement
• Lower-level capsule (which is near input) prefers to send its output to higher-level (which
is near output) capsules with “similar” prediction
• Cosine similarity is used to measure the agreement
32
胶囊之间的动态路由

CapsNet Architecture胶囊网络系统架构
▪ Input: MNIST dataset
▪ ReLU conv1: extracts local features
▪ PrimaryCaps: forms new neural unit (capsule)
▪ DigitCaps: contains 10 capsules (number 0 to 9)
▪ Cosine similarity (routing) is applied between PrimaryCaps and DigitCaps
▪ Reconstruction: a regularization method to encourage the capsules to encode the input digit
Figure 1: A simple CapsNet with 3 layers Figure 2: Reconstruct a digit from the DigitCaps layer representation
source: https://arxiv.org/abs/1710.09829
33

Core source code核心源代码
Source: github, the NIPS 2017 paper implementation
34

Numerical results of the NIPS paper数值结果
source: https://arxiv.org/abs/1710.09829
35

36
𝐸 = 𝐸 − 𝐷
Deep Forest: power of diversity深度森林:多样性的威力

Deep Forest paper series
• Deep Forest [10], using RF to do DL with the “3 key ingredients”:
• In-model feature extraction and transformation, end-to-end machine learning
• Layer by layer processing, distributed representation learning
• Complex model
• AutoEncoder by Forest [11]
• The first tree ensemble based auto-encoder
• Multi-Layered Gradient Boosting Decision Trees [12]
• A variant of target propagation, pseudo-mapping F, pseudo-inverse-mapping G,
pseudo-label Z (F-G-Z framework)
• More to come?
37
深度森林论文系列

Deep Forest paper
深度森林论文
• IJCAI 2017 paper [10], by Zhou and his student
• DeepForest = Forest ensemble, double-happiness (ensemble of
ensembles)
1. Multi-grain scanning, sliding window to extract features
2. Cascade of multiple random forests layers, for prediction
• Very few hyper-parameters (how nice!) & as good as DNN
• Default settings are good for many applications
• Non-differentiable model, no back propagation
38
Source: https://en.wikipedia.org/wiki/Zhi-Hua_Zhou

Deep Forest paper深度森林论文
Problems of DNN深度神经网络的问题
• Too many hyper-parameters (like an art rather than science)
• Does not work well for small data
• Model architecture/complexity is determined in advance (via tuning)
• Often overly complicated
• Shortcut connection, pruning, binarization, etc. are often applied
39

Deep Forest paper 深度森林论文
Why deep forest? Motivations? 动机
• Decision trees
• Architecture learning (grow/split until done)
• Data driven
• Almost unbeatable on tabular data in Kaggle
• Motivations
• DL = DNN?
• Can we do DL with non-differentiable models (no back-propagation)?
• Maybe repeatable results (think SGD)?
40

Inspiration from DNN 来自深度神经网络的灵感
• Distributed representation learning (end to end, in-model feature trans.)
• Layer-by-layer processing
• Model complexity
41
Source: [10] Z. Zhou and J. Feng

Multi-Grained Scanning for Feature Engineering 基于多粒度扫描的特征抓取
42
• Sequential
relationships
are
important
• Spatial
relationships
are
important

Cascade Forest Structure for Prediction 用于预测的梯级森林结构
• Ensemble of
ensembles
• K-fold cross
validation
• Architecture
learning (stop
growing
when
satisfied)
43

Class Vector Generation 分类矢量的生成
44

Overall Architecture 整体架构
45

Hyper-parameters and default settings 参数和默认设置
46

Experimental results 实验结果
47
Image Categorization Face Recognition
Music Classification Hand Movement Recognition

More experimental results 更多实验结果
48
Sentiment Classification
Low-Dimensional Data
High-Dimensional Data
(hard to beat successful method
at its killer-app with
a brand-new algorithm)

Running time 运行时间
• PC with 2 Intel E5 2695 v4 CPUs (18 cores)
• IMDB dataset (25,000 examples, with 5,000 features)
• Deep Forest: 40 minutes
• DNN: can take over 60 minutes
49

Hyper-parameter sensitivity 参数的敏感度
50

51
AI
Analytics Machine Learning
Blockchain: power of trust 区块链: 信任的威力
Source: pixabay

A Unified Analytical Framework for Trustable Machine Learning and
Automation Running with Blockchain 利用区块链运行可信机器学
习和自动化的统一分析学框架
52
Source: [14] T. Wang

Further reading list 阅读清单
• Reinforcement learning (play, explore, control, interact) 强化学习
• An agent selects actions to maximize reward in an environment
• AI = Deep RL (D. Silver, 2016) vs. RL does not really work (I. Goodfellow, 2018)
• Generative adversarial networks (GAN) [9] 生成性对抗性神经网络
• Unsupervised learning using supervised learning as sampling model
• Infers models in a competing game with Generator (G) and Discriminator (D)
• Provides an attractive alternative to maximum likelihood techniques.
• Y. LeCun: “…There are many interesting development in deep learning…The most important one, …, is adversarial training….”
• Adaptive Neural Trees (ANT), https://arxiv.org/abs/1807.06699, 自适应神经树
- NN: end2end/distributed representation learning with pre-specified architecture, image/sequence
- DT: architecture learning with pre-specified features, tabular data
• BERT – Bidirectional Encoding model 双向编码器模型，2018年人工智能的最大亮点?
• AlphaFold 预测蛋白质结构
53

Very fast iterations in research研究的快速迭代升级
Source: https://pythonawesome.com/a-paper-list-of-object-detection-using-deep-learning/
54

Super-human level performance 超越人类的能力
55
Source: https://towardsdatascience.com/the-science-behind-alphastar-714bd7824d4b

AI winter is coming 人工智能的寒冬将至?
56
Source: https://blog.piekniewski.info/2018/05/28/ai-winter-is-well-on-its-way/
Source: Google trend

Closing Remarks 结束语
AI and machine learning are very hard – just keep trying!
57

Closing Remarks 结束语
Human’s advantage & Text Mining‘s nightmare 人类的优势和文本挖掘的噩梦？
58

Selected References 部分参考文献
• [1] M. Bojarski, et al., End to End Learning for Self-Driving Cars, arXiv:1604.07316, 2016.
• [2] S. Sabour, N. Frosst, G. Hinton, Dynamic Routing Between Capsules, Google Brain, NIPS 2017, https://arxiv.org/abs/1710.09829
• [3] D. Silver, A. Huang, et, al. (2016). "Mastering the game of Go with deep neural networks and tree search". Nature 529 (7587): 484–
489.
• [4] W. Thompson, et al., Introduction to Deep learning, SAS, 2016.
• [5] X. Hunt, et al., Machine Learning Landscape, SAS, 2017.
• [6] D. Silver, Tutorial: Deep Reinforcement Learning, 2017.
• [7] White paper: How to Do Deep Learning With SAS? 2018.
• [8] Y. LeCun, et al., Deep learning, Nature, 2015.
• [9] I. Goodfellow, et al., Generative Adversarial Net, https://arxiv.org/abs/1406.2661
• [10] Z. Zhou and J. Feng, Deep Forest, IJCAI 2017.
• [11] J. Feng and Z. Zhou, AutoEncoder by Forest, AAAI 2018.
• [12] J. Feng, Y. Yu, Z. Zhou, Multi-Layered Gradient Boosting Decision Trees, https://arxiv.org/abs/1806.00007, 2018
• [13] R. Tanno, et al., Adaptive Neural Trees, https://arxiv.org/abs/1807.06699, 17 Jul 2018.
• [14] T. Wang, A Unified Analytical Framework for Trustable Machine Learning and Automation Running with Blockchain, IEEE Big Data
Workshops, 2018.
59

Upcoming Events 接下来一些的活动
Shameless ads 广告时间
60
• Running for 2019 ACM SIGAI Vice-Chair
• Vote for Tao Wang
• ACM local chapter on AI & Machine Learning
• AutoML 2019 workshop, recruiting PC
• https://sites.google.com/view/automl2019-workshop
• 3/13, AI-Now meetup, Blockchain and Machine Learning
• https://www.meetup.com/AI-Now/

deep-learning-and-what's-next-with-Chinese-annotation

Recommended

Recommended

More Related Content

Similar to deep-learning-and-what's-next-with-Chinese-annotation

Similar to deep-learning-and-what's-next-with-Chinese-annotation (20)

Recently uploaded

Recently uploaded (20)

deep-learning-and-what's-next-with-Chinese-annotation