Deep Learning Made Easy with Deep Features

Deep learning
Made Easy with Deep
Features
Piotr Teterwak
Dato, Machine Learning Engineer

2
Hello, my name is…
Piotr Teterwak
Machine Learning Engineer,
Dato

Graphlab Create: Production ML Pipeline
DATA
YourWebServiceor
IntelligentApp
ML
Algorithm
Data
cleaning
&
feature
eng
Offline
eval &
Parameter
search
Deploy
model
Data engineering Data intelligence Deployment
Goal: Platform to help implement, manage, optimize entire pipeline

Today’s talk
Features in ML
Deep Neural
Networks learn
features
Using those
learned features
for new tasks
Productionizing

Features are key to machine learning

9
Simple example: Spam filtering
• A user opens an email…
- Will she thinks its spam?
• What’s the probability email is spam?
Text of email
User info
Source info
Input: x
MODEL
Yes!
No
Output:
Probability of y

10
Feature engineering:
the painful black art of transforming raw inputs
into useful inputs for ML algorithm
• E.g., important words, complex transformation of input,…
MODEL
Yes!
No
Output:
Probability of y
Feature
extraction
Features: Φ(x)
Text of email
User info
Source info
Input: x

Deep Learning for Learning Features

12
Linear classifiers
• Most common classifier
- Logistic regression
- SVMs
- …
• Decision correspond to
hyperplane:
- Line in high dimensional
space
w0 + w1 x1 + w2 x2 > 0 w0 + w1 x1 + w2 x2 < 0

13
What can a simple linear classifier represent?
AND
0
0
1
1

14
What can a simple linear classifier represent?
OR
0
0
1
1

15
What can’t a simple linear classifier represent?
XOR
0
0
1
1
Need non-linear features

16
Non-linear feature embedding
0
0
1
1

17
Graph representation of classifier:
Useful for defining neural networks
x
1
x
2
x
d
y
…
1
w2 w0 + w1 x1 + w2 x2 + … + wd xd
> 0, output 1
< 0, output 0
Input Output

18
What can a linear classifier represent?
x1 OR x2 x1 AND x2
x
1
x
2
1
y
-0.5
1
1
x
1
x
2
1
y
-1.5
1
1

Solving the XOR problem: Adding a layer
XOR = x1 AND NOT x2 OR NOT x1 AND x2
z
1
-0.5
1
-1
z1 z2
z
2
-0.5
-1
1
x
1
x
2
1
y
1 -0.5
1
1
Thresholded to 0 or 1

20
http://deeplearning.stanford.edu/wiki/images/4/40/Network3322.png
Deep Neural Networks
P(cat|x)
P(dog|x)

21
Deep Neural Networks
• Can model any function with enough hidden units.
• This is tremendously powerful: given enough units, it is
possible to train a neural network to solve arbitrarily
difficult problems.
• But also very difficult to train, too many parameters
means too much memory+computation time.

22
Neural Nets and GPU’s
• Many operations in Neural Net training can happen in
parallel
• Reduces to matrix operations, many of which can be
easily parallelized on a GPU.

24
Convolutional Neural Nets
• Strategic removal of edges
Input Layer
Hidden Layer

25
Input Layer
Hidden Layer

26
Input Layer
Hidden Layer

27
Input Layer
Hidden Layer

28
Input Layer
Hidden Layer

29
Input Layer
Hidden Layer

30
http://ufldl.stanford.edu/wiki/images/6/6c/Convolution_schematic.gif

31
Pooling layer
Ranzato, LSVR tutorial @ CVPR, 2014. www.cs.toronto.edu/~ranzato

32
Pooling layer
http://ufldl.stanford.edu/wiki/images/6/6c/Pooling_schematic.gif

33
Final Network
Krizhevsky et al.
‘12

Applications to computer vision

35
Image features
• Features = local detectors
- Combined to make prediction
- (in reality, features are more low-level)
Face!
Eye
Eye
Nose
Mouth

36
Standard image classification approach
Input
Computer$vision$features$
SIFT$ Spin$image$
HoG$ RIFT$
Textons$ GLOH$
Slide$Credit:$Honglak$Lee$
Extract features Use simple classifier
e.g., logistic regression, SVMs
Face

37
Many hand crafted features exist…
SIFT$ Spin$image$
HoG$ RIFT$
Textons$ GLOH$
… but very painful to design

38
Change image classification approach?
Input
SIFT$ Spin$image$
HoG$ RIFT$
Textons$ GLOH$
Extract features Use simple classifier
FaceCan we learn features
from data?

39
Use neural network to learn features
Input
Learned hierarchy
Output
Lee et al. ‘Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations’ ICML 2009

Sample results
• Traffic sign recognition
(GTSRB)
- 99.2% accuracy
• House number recognition
(Google)
- 94.3% accuracy
40

Krizhevsky et al. ’12:
60M parameters, won 2012 ImageNet competition
41

42
ImageNet 2012 competition: 1.2M images, 1000 categories
42

43
Application to scene parsing
©Carlos Guestrin 2005-2014
Y LeCun
MA Ranzato
Semantic Labeling:
Labeling every pixel with the object it belongs to
[ Farabet et al. ICML 2012, PAMI 2013]
Would help identify obstacles, targets, landing sites, dangerous areas
Would help line up depth map with edge maps

Deep learning score card
Pros
• Enables learning of features rather
than hand tuning
• Impressive performance gains on
- Computer vision
- Speech recognition
- Some text analysis
• Potential for much more impact
Cons

Deep learning workflow
Lots of
labeled data
Training set
Validation set
80%
20%
Learn deep
neural net
model
Validate

Deep learning score card
Pros
• Enables learning of features rather
than hand tuning
• Impressive performance gains on
- Computer vision
- Speech recognition
- Some text analysis
• Potential for much more impact
Cons
• Computationally really expensive
• Requires a lot of data for high
accuracy
• Extremely hard to tune
- Choice of architecture
- Parameter types
- Hyperparameters
- Learning algorithm
- …
• Computational + so many choices =
incredibly hard to tune

49
Can we do better?
Input
Learned hierarchy
Output
Lee et al. ‘Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations’ ICML 2009

Deep features:
Deep learning
+
Transfer learning

51
Transfer learning:
Use data from one domain to help learn on another
Lots of data:
Learn
neural net
Great
accuracy
Some data: Neural net as
feature extractor
+
Simple classifier
Great accuracy on
new problem
Old idea, explored for deep learning by Donahue et al. ’14

52
What’s learned in a neural net
Neural net trained for Task 1
Very specific to Task 1More generic
Can be used as feature extractor
vs.

53
Transfer learning in more detail…
Neural net trained for Task 1
Very specific to Task 1More generic
Can be used as feature extractor
Keep weights fixed!
For Task 2, learn only end part
Use simple classifier
Class?

54
Using ImageNet-trained network as extractor for
general features
• Using classic AlexNet architechture pioneered by Alex Krizhevsky
et. al in ImageNet Classification with Deep Convolutional Neural
Networks
• It turns out that a neural network trained on ~1 million images of
about 1000 classes makes a surprisingly general feature extractor
• First illustrated by Donahue et al in DeCAF: A Deep Convolutional
Activation Feature for Generic Visual Recognition
54

Transfer learning with deep features
Training set
Validation set
80%
20%
Learn
simple
model
Some
labeled data
Extract
features with
neural net
trained on
different task
Validate
Deploy in
production

What else can we do with Deep Features?
59

Simple text classification with bag of words
aardvark 0
about 2
all 2
Africa 1
apple 0
anxious 0
...
gas 1
...
oil 1
…
Zaire 0
Class
?
One “feature” per word

Word2Vec: Neural network for finding word
representation Mikolov et al. ‘13
Skip-gram Model: From a word, predict nearby words in sentence
dog
A went for a walk
Neural net
Viewed as deep
features

Word2Vec: Neural network for finding high
dimensional representation per word Mikolov et al. ‘13
http://www.folgertkarsdorp.nl/word2vec-an-introduction/

65
Related words placed nearby high dim space
Projecting 300 dim space into 2 dim with PCA (Mikolov et al. ’13)

Blog corpus
Haha
Yea
Hahaha
Hahah
Lisxc
Umm
Hehe
laughingoutloud
LOL
Closest words
in 300 dim
Predicts gender of author with 79% accuracy

ML in production
(Or how this is relevant to data scientists)

2015: Production ML pipeline
DATA
YourWebServiceor
IntelligentApp
ML
Algorithm
Data
cleaning
&
feature
eng
Offline
eval &
Parameter
search
Deploy
model
Data engineering Data intelligence Deployment
Using deep learning
Goal: Platform to help implement, manage, optimize entire pipeline

71
Take Home Message
Class?
Deep Features are remarkable!

73
Dato Office Hours @ Galvanize SF
• Bring your laptop & some data & we’ll help you get started
• When: Thurs (tomorrow) 2:30p-5p followed by beers
• Where: Galvanize – 44 Tehama St. (SOMA) in SF
• Talk to me/email me: piotr@dato.com
+

Get the software: dato.com/download
Learn: dato.com/learn
Learn more: blog.dato.com
Join us: we’re hiring lots!
Contact me: piotr@dato.com

75
Go create something! [with Dato]
Data
Engineering
Data
Intelligence
Deployment
• Fast & scalable
• Rich data type support
• Visualization
• App-oriented ML
• Supporting utils
• Extensibility
• Batch & always-on
• RESTful interface
• Elastic & robust

Deep Learning Made Easy with Deep Features

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to Deep Learning Made Easy with Deep Features

Similar to Deep Learning Made Easy with Deep Features (20)

More from Turi, Inc.

More from Turi, Inc. (20)

Recently uploaded

Recently uploaded (20)

Deep Learning Made Easy with Deep Features