SlideShare a Scribd company logo
1 of 78
Download to read offline
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Danny Bickson, Co-founder DATO
CMP305
Deep Learning on AWS
Made Easy
October 2015
2
Who is Dato?
Seattle-based Machine Learning Company
45+ and growing fast!
Deep learning example
4
Image classification
Input: x
Image pixels
Output: y
Predicted object
Neural networks

Learning *very* non-linear features
6
Linear classifiers (binary)
Score(x) > 0 Score(x) < 0
Score(x) = w0 + w1 x1 + w2 x2 + … + wd xd
7
Graph representation of classifier:
useful for defining neural networks
x1
x2
xd
y
…
1
w2
> 0, output 1
< 0, output 0
Input Output
Score(x) =
w0 + w1 x1 + w2 x2 + … + wd xd
8
What can a linear classifier represent?
x1 OR x2 x1 AND x2
x1
x2
1
y x1
x2
1
y1
1
-0.5
1
1
-1.5
9
What can’t a simple linear
classifier represent?
XOR
the counterexample
to everything
Need non-linear features
Solving the XOR problem:
Adding a layer
XOR = x1 AND NOT x2 OR NOT x1 AND x2
z1
-0.5
1
-1
z1 z2
z2
-0.5
-1
1
x1
x2
1
y
1 -0.5
1
1
Thresholded to 0 or 1
11
A neural network
• Layers and layers and layers of
linear models and non-linear transformations
• Around for about 50 years
• In last few years, big resurgence
- Impressive accuracy on several benchmark problems
- Advanced in hardware allows computation (i.e. aws g2
instances)
x
1
x
2
1
z
1
z
2
1
y
Application of deep learning
to computer vision
13
Feature detection – traditional approach
• Features = local detectors
- Combined to make prediction
- (in reality, features are more low-level)
Face!
Eye
Eye
Nose
Mouth
14
SIFT [Lowe ‘99]
•Spin Images
[Johnson & Herbert ‘99]
•Textons
[Malik et al. ‘99]
•RIFT
[Lazebnik ’04]
•GLOH
[Mikolajczyk & Schmid ‘05]
•HoG
[Dalal & Triggs ‘05]
•…
Many hand created features exist
for finding interest points…
15
Standard image
classification approach
Input Use simple classifier
e.g., logistic regression, SVMs
Face?
Extract features
Hand-created
features
16
SIFT [Lowe
‘99]
•Spin Images
[Johnson & Herbert ‘99]
•Textons
[Malik et al. ‘99]
•RIFT
[Lazebnik ’04]
•GLOH
[Mikolajczyk & Schmid ‘05]
•HoG
[Dalal & Triggs ‘05]
•…
Many hand created features exist
for finding interest points…
Hand-created
features
… but very painful to design
17
Deep learning:
implicitly learns features
Layer 1 Layer 2 Layer 3 Prediction
Example
detectors
learned
Example
interest points
detected
[Zeiler & Fergus ‘13]
Deep learning performance
Deep learning accuracy
• German traffic sign
recognition benchmark
- 99.5% accuracy (IDSIA
team)
• House number recognition
- 97.8% accuracy per character
[Goodfellow et al. ’13]
ImageNet 2012 competition:
1.2M training images, 1000 categories
0
0.05
0.1
0.15
0.2
0.25
0.3
SuperVision ISI OXFORD_VGG
Error(bestof5guesses)
Huge
gain
Exploited hand-coded features like SIFT
Top 3 teams
ImageNet 2012 competition:
1.2M training images, 1000 categories
Winning entry: SuperVision
8 layers, 60M parameters [Krizhevsky et al. ’12]
Achieving these amazing results required:
• New learning algorithms
• GPU implementation
Deep learning performance
• ImageNet: 1.2M images
0
10
20
30
40
50
60
g2.xlarge g2.8xlarge
Running time (hours)
Deep learning in computer vision
Scene parsing with deep learning
[Farabet et al. ‘13]
Retrieving similar images
Input Image Nearest neighbors
Deep learning usability
Designed a simple user interface
#training the model
model = graphlab.neuralnet.create(train_images)
#predicting classes for new images
outcome = model.predict(test_images)
Deep learning demo
Challenges of deep learning
Deep learning score card
Pros
• Enables learning of features
rather than hand tuning
• Impressive performance
gains
- Computer vision
- Speech recognition
- Some text analysis
• Potential for more impact
Deep learning workflow
Lots
of
labeled
data
Training
set
Validation
set
Learn
deep
neural net
Validate
Adjust
parameters,
network
architecture,…
32
Many tricks needed to work well…
Different types of layers, connections,…
needed for high accuracy
[Krizhevsky et al. ’12]
Deep learning score card
Pros
• Enables learning of features
rather than hand tuning
• Impressive performance
gains
- Computer vision
- Speech recognition
- Some text analysis
• Potential for more impact
Cons
• Requires a lot of data for
high accuracy
• Computationally
really expensive
• Extremely hard to tune
- Choice of architecture
- Parameter types
- Hyperparameters
- Learning algorithm
- …
Computational cost+ so many
choices
=
incredibly hard to tune
Deep features:
Deep learning
+
Transfer learning
35
Standard image
classification approach
Input Use simple classifier
e.g., logistic regression, SVMs
Face?
Extract features
Hand-created
features
Can we learn features
from data, even when
we don’t have data or
time?
36
What’s learned in a neural net
Very specific
to Task 1
Should be ignored
for other tasks
More generic
Can be used as feature extractor
vs.
Neural net trained for Task 1: cat vs. dog
37
Transfer learning in more detail…
Very specific
to Task 1
Should be ignored
for other tasks
More generic
Can be used as feature extractor
For Task 2, predicting 101 categories,
learn only end part of neural net
Use simple classifier
e.g., logistic regression,
SVMs, nearest neighbor,…
Class?
Keep weights fixed!
Neural net trained for Task 1: cat vs. dog
38
Careful where you cut:
latter layers may be too task specific
Layer 1 Layer 2 Layer 3 Prediction
Example
detectors
learned
Example
interest points
detected
[Zeiler & Fergus ‘13]
Too specific
for new task
Use these!
Transfer learning with deep features workflow
Some
labeled
data
Extract
features
with
neural net
trained on
different
task
Learn
simple
classifier
Validate
Training
set
Validation
set
How general are deep features?
Barcelona Buildings
Architectural transition
Deep learning in production on
AWS
44
How to use deep learning in
production?
PredictiveUnderstands input &
takes actions or
makes decisions
InteractiveResponds in real time
LearningImproves its
performance
with experience
Intelligent service at the core…
46
Yourintelligentapplication
Intelligent
backend
service
Real-time
data
Predictions &
decisions
Historical
data
Machine
learning
model
Predictions &
decisions
Most ML
research here…
But ML research useless
without great
solution here…
47
Essential ingredients of intelligent service
Responsive
Intelligent applications
are interactive

Need low latency,
high throughput &
high availability
Adaptive
ML models out-of-date the
moment learning is done

Need to constantly
understand & improve
end-to-end performance
Manageable
Many thousands of models,
created by hundreds of people

Need versioning,
attribution, provenance &
reproducibility
Responsive: Now and Always
Responsive
Intelligent applications
are interactive

Need low latency,
high throughput &
high availability
Adaptive
ML models out-of-date the
moment learning is done

Need to constantly
understand & improve
end-to-end performance
Manageable
Many thousands of models,
created by hundreds of people

Need versioning,
attribution, provenance &
reproducibility
Addressing latency
50
Challenge: Scoring Latency
Compute predictions in < 20ms for complex
all while under heavy query load
Models Queries
TopK
Features
SELECT * FROM
users JOIN items,
click_logs, pages
WHERE …
51
The Common Solutions to Latency
Faster Online
Model Scoring
“Execute Predict(query) in
real-time as queries arrive”
Pre-Materialization
and Lookup
“Pre-compute Predict(query)
for all queries and lookup
answer at query time”Dato Predictive Services does Both
52
Faster Online Model Scoring:
Highly optimized machine learning
• SFrame: Native code, optimized data frame
- Available open-source (BSD)
• Model querying acceleration with native code,
e.g.,
- TopK and Nearest Neighbor eval:
• LSH, Ball Trees,…
53
The Common Solutions to Latency
Faster Online
Model Scoring
“Execute Predict(query) in
real-time as queries arrive”
Pre-Materialization
and Lookup
“Pre-compute Predict(query)
for all queries and lookup
answer at query time”Dato Predictive Services does Both
54
Smart Materialization  Caching
Unique Queries
QueryFrequency
Example: top 10% of all unique queries cover
90% of all queries performed.
Caching a small number of unique
queries has a very large impact.
55
Distributed shared caching
Distributed Shared Cache (Redis)
Cache:
Model query results
Common features (e.g., product info)
Scale-out improves
throughput and latency
56
Dato Latency by the numbers
Easy Case: cache hit ~2ms
Hard Case: cache miss
• Simple Linear Models: 5-6ms
• Complex Random Forests: 7-8ms
- P99: ~ 15ms
[using aws m3.xlarge instance]
57
Challenge: Availability
Heavy load substantial delays
Frequent model updates  cache misses
Machine failures
58
Scale-Out availability under load
Heavy Load
Elastic Load Balancing load balancer
Adaptive:
Accounting for Constant Change
Responsive
Intelligent applications
are interactive

Need low latency,
high throughput &
high availability
Adaptive
ML models out-of-date the
moment learning is done

Need to constantly
understand & improve
end-to-end performance
Manageable
Many thousands of models,
created by hundreds of people

Need versioning,
attribution, provenance &
reproducibility
60
Change at Different Scales and Rates
Shopping
for Mom
Shopping
for Me
Months Rate of Change Minutes
Population Granularity of Change Session
61
Months Rate of Change Minutes
Population Granularity of Change SessionIndividual and Session Level Change
Small Data
Online learning
Bandits to Assess Models
Shopping
for Mom
Shopping
for Me
Change at Different Scales and Rates
62
The Dangerous Feedback Loop
I once looked at cameras on
Amazon …
Bags
Similar cameras
and
accessories
If this is all they showed how would they
learn that I also like bikes, and shoes?
63
Exploration / Exploitation Tradeoff
Systems that can take actions can
adversely affect future data
Exploration
Exploitation
Best
Action
Random
Action
Learn more about
what is good and bad
Make the best use
of what we believe is good.
64
Dato Solution to Adaptivity
Rapid offline learning with GraphLab Create
Online bandit adaptation in Predictive Services
• Demo
Manageable:
Unification and simplification
Responsive
Intelligent applications
are interactive

Need low latency,
high throughput &
high availability
Adaptive
ML models out-of-date the
moment learning is done

Need to constantly
understand & improve
end-to-end performance
Manageable
Many thousands of models,
created by hundreds of people

Need versioning,
attribution, provenance &
reproducibility
66
Ecosystem of Intelligent Services
Data
Infrastructure MySQL
MySQL
Serving
Data Science
ModelA ModelB
TableA
TableB
Service A
Service B
Complicated!
Many systems, with overlapping roles,
no single source of truth for Intelligent Service.
67
Dato Predictive
Services
Responsive Adaptive Manageable
68
Model Management  like code management,
but for life cycle of intelligent applications
Provenance &
Reproducibility
• Track changes &
rollback
• Cover code,
model type,
parameters,
data…
Collaboration
• Review, blame

• Share
• Common feature
engineering
pipelines
Continuous
Integration
• Deploy & update
• Measure &
improve
• Avoid down time
and impact on
end-users
69
Dato Predictive
Services
Responsive Adaptive Manageable
Dato Predictive
Services
Serving Models and Managing the
Machine Learning Lifecycle
GraphLab
Create
Accurate, Robust, and Scalable
Model Training
GraphLab Create:
Sophisticated machine learning made easy
High-level
ML toolkits
AutoML
tune params, model
selection,…

so you can focus on
creative parts
Reusable
features
transferrable feature
engineering

accuracy with less data &
less effort
71
High-level ML toolkits
get started with 4 lines of code,
then modify, blend, add yours…
Recommender
Image
search
Sentiment
analysis
Data
matching
Auto
tagging
Churn predictor
Object detector
Product
sentiment
Click
prediction
Fraud detection
User
segmentation
Data
completion
Anomaly
detection
Document
clustering
Forecasting
Search
ranking
Summarization …
import graphlab as gl
data = gl.SFrame.read_csv('my_data.csv')
model = gl.recommender.create(data,
user_id='user',
item_id='movie’,
target='rating')
recommendations = model.recommend(k=5)
SFrame ❤️ all ML tools SGraph
SFrame:
Sophisticated machine learning made
scalable
Opportunity for Out-of-Core ML
Capacity 1 TB
0.5 GB/s
10 TB
0.1 GB/s
0.1 TB
1 GB/sThroughput
Fast, but significantly
limits data sizeOpportunity for big data on 1 machine
For sequential reads only!
Random access very slow
Out-of-core ML
opportunity is huge
Usual design → Lots of
random access →
Slow
Design to maximize
sequential access for
ML algo patterns
GraphChi early example
SFrame data frame for ML
Performance of SFrame/SGraph
70 sec
251 sec
200 sec
2,128 sec
0 750 1500 2250
GraphLab Create
GraphX
Giraph
Spark
Connected components in Twitter graph
Source(s): Gonzalez et. al. (OSDI 2014)
Twitter: 41 million Nodes, 1.4 billion Edges
SGraph
16 machines
1 machine
75
SFrame & SGraph
Optimized
out-of-core
computation for ML
High Performance
1 machine can handle:
TBs of data
100s Billions of edges
Optimized for ML
. Columnar transformation
. Create features
. Iterators
. Filter, join, group-by, aggregate
. User-defined functions
. Easily extended through SDK
Tables,
graphs, text,
images
Open-
source ❤️
BSD
license
76
The Dato Machine Learning Platform
Predictive
Services
Serve Models and Manage the
Machine Learning Lifecycle
GraphLab Create
Train Accurate, Robust,
and Scalable models
77
Our customers
(CMP305) Deep Learning on AWS Made EasyCmp305

More Related Content

What's hot

Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
Lukas Masuch
 

What's hot (20)

Squeezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesSqueezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile Phones
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
 
Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)
 
Distributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNetDistributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNet
 
An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep Learning
 
The deep learning tour - Q1 2017
The deep learning tour - Q1 2017 The deep learning tour - Q1 2017
The deep learning tour - Q1 2017
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
 
AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)
AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)
AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)
 
Using Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsUsing Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical Applications
 
Deep learning
Deep learningDeep learning
Deep learning
 
Synthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep LearningSynthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep Learning
 
Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)
 
Scalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNetScalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNet
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
From Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptxFrom Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptx
 
Introduction of Deep Learning
Introduction of Deep LearningIntroduction of Deep Learning
Introduction of Deep Learning
 
Deep Learning and Reinforcement Learning
Deep Learning and Reinforcement LearningDeep Learning and Reinforcement Learning
Deep Learning and Reinforcement Learning
 
Deep Learning Primer: A First-Principles Approach
Deep Learning Primer: A First-Principles ApproachDeep Learning Primer: A First-Principles Approach
Deep Learning Primer: A First-Principles Approach
 
Machine Learning and Deep Learning with R
Machine Learning and Deep Learning with RMachine Learning and Deep Learning with R
Machine Learning and Deep Learning with R
 

Viewers also liked

Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Spark Summit
 

Viewers also liked (6)

AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverabi...
AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverabi...AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverabi...
AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverabi...
 
AWS re:Invent 2016: Predicting Customer Churn with Amazon Machine Learning (M...
AWS re:Invent 2016: Predicting Customer Churn with Amazon Machine Learning (M...AWS re:Invent 2016: Predicting Customer Churn with Amazon Machine Learning (M...
AWS re:Invent 2016: Predicting Customer Churn with Amazon Machine Learning (M...
 
AWS re:Invent 2016: Zillow Group: Developing Classification and Recommendatio...
AWS re:Invent 2016: Zillow Group: Developing Classification and Recommendatio...AWS re:Invent 2016: Zillow Group: Developing Classification and Recommendatio...
AWS re:Invent 2016: Zillow Group: Developing Classification and Recommendatio...
 
AWS re:Invent 2016: Deep Learning in Alexa (MAC202)
AWS re:Invent 2016: Deep Learning in Alexa (MAC202)AWS re:Invent 2016: Deep Learning in Alexa (MAC202)
AWS re:Invent 2016: Deep Learning in Alexa (MAC202)
 
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
 
[系列活動] 機器學習速遊
[系列活動] 機器學習速遊[系列活動] 機器學習速遊
[系列活動] 機器學習速遊
 

Similar to (CMP305) Deep Learning on AWS Made EasyCmp305

Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
Justin Basilico
 

Similar to (CMP305) Deep Learning on AWS Made EasyCmp305 (20)

Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015
 
MXNet Workshop
MXNet WorkshopMXNet Workshop
MXNet Workshop
 
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
 
Android Malware 2020 (CCCS-CIC-AndMal-2020)
Android Malware 2020 (CCCS-CIC-AndMal-2020)Android Malware 2020 (CCCS-CIC-AndMal-2020)
Android Malware 2020 (CCCS-CIC-AndMal-2020)
 
Deep Turnover Forecast - meetup Lille
Deep Turnover Forecast - meetup LilleDeep Turnover Forecast - meetup Lille
Deep Turnover Forecast - meetup Lille
 
PhD Thesis Proposal
PhD Thesis Proposal PhD Thesis Proposal
PhD Thesis Proposal
 
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
Introduction to Deep Learning and Tensorflow
Introduction to Deep Learning and TensorflowIntroduction to Deep Learning and Tensorflow
Introduction to Deep Learning and Tensorflow
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
 
Keras on tensorflow in R & Python
Keras on tensorflow in R & PythonKeras on tensorflow in R & Python
Keras on tensorflow in R & Python
 
Machine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxMachine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackbox
 
Distributed Deep Learning with Docker at Salesforce
Distributed Deep Learning with Docker at SalesforceDistributed Deep Learning with Docker at Salesforce
Distributed Deep Learning with Docker at Salesforce
 
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and TensorflowArtificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
 
Deep-learning-for-computer-vision-applications-using-matlab.pdf
Deep-learning-for-computer-vision-applications-using-matlab.pdfDeep-learning-for-computer-vision-applications-using-matlab.pdf
Deep-learning-for-computer-vision-applications-using-matlab.pdf
 
Introduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksIntroduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural Networks
 
Deep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformDeep Learning on Qubole Data Platform
Deep Learning on Qubole Data Platform
 
Google Big Data Expo
Google Big Data ExpoGoogle Big Data Expo
Google Big Data Expo
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
 

More from Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 

(CMP305) Deep Learning on AWS Made EasyCmp305

  • 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Danny Bickson, Co-founder DATO CMP305 Deep Learning on AWS Made Easy October 2015
  • 2. 2 Who is Dato? Seattle-based Machine Learning Company 45+ and growing fast!
  • 4. 4 Image classification Input: x Image pixels Output: y Predicted object
  • 6. 6 Linear classifiers (binary) Score(x) > 0 Score(x) < 0 Score(x) = w0 + w1 x1 + w2 x2 + … + wd xd
  • 7. 7 Graph representation of classifier: useful for defining neural networks x1 x2 xd y … 1 w2 > 0, output 1 < 0, output 0 Input Output Score(x) = w0 + w1 x1 + w2 x2 + … + wd xd
  • 8. 8 What can a linear classifier represent? x1 OR x2 x1 AND x2 x1 x2 1 y x1 x2 1 y1 1 -0.5 1 1 -1.5
  • 9. 9 What can’t a simple linear classifier represent? XOR the counterexample to everything Need non-linear features
  • 10. Solving the XOR problem: Adding a layer XOR = x1 AND NOT x2 OR NOT x1 AND x2 z1 -0.5 1 -1 z1 z2 z2 -0.5 -1 1 x1 x2 1 y 1 -0.5 1 1 Thresholded to 0 or 1
  • 11. 11 A neural network • Layers and layers and layers of linear models and non-linear transformations • Around for about 50 years • In last few years, big resurgence - Impressive accuracy on several benchmark problems - Advanced in hardware allows computation (i.e. aws g2 instances) x 1 x 2 1 z 1 z 2 1 y
  • 12. Application of deep learning to computer vision
  • 13. 13 Feature detection – traditional approach • Features = local detectors - Combined to make prediction - (in reality, features are more low-level) Face! Eye Eye Nose Mouth
  • 14. 14 SIFT [Lowe ‘99] •Spin Images [Johnson & Herbert ‘99] •Textons [Malik et al. ‘99] •RIFT [Lazebnik ’04] •GLOH [Mikolajczyk & Schmid ‘05] •HoG [Dalal & Triggs ‘05] •… Many hand created features exist for finding interest points…
  • 15. 15 Standard image classification approach Input Use simple classifier e.g., logistic regression, SVMs Face? Extract features Hand-created features
  • 16. 16 SIFT [Lowe ‘99] •Spin Images [Johnson & Herbert ‘99] •Textons [Malik et al. ‘99] •RIFT [Lazebnik ’04] •GLOH [Mikolajczyk & Schmid ‘05] •HoG [Dalal & Triggs ‘05] •… Many hand created features exist for finding interest points… Hand-created features … but very painful to design
  • 17. 17 Deep learning: implicitly learns features Layer 1 Layer 2 Layer 3 Prediction Example detectors learned Example interest points detected [Zeiler & Fergus ‘13]
  • 19. Deep learning accuracy • German traffic sign recognition benchmark - 99.5% accuracy (IDSIA team) • House number recognition - 97.8% accuracy per character [Goodfellow et al. ’13]
  • 20. ImageNet 2012 competition: 1.2M training images, 1000 categories 0 0.05 0.1 0.15 0.2 0.25 0.3 SuperVision ISI OXFORD_VGG Error(bestof5guesses) Huge gain Exploited hand-coded features like SIFT Top 3 teams
  • 21. ImageNet 2012 competition: 1.2M training images, 1000 categories Winning entry: SuperVision 8 layers, 60M parameters [Krizhevsky et al. ’12] Achieving these amazing results required: • New learning algorithms • GPU implementation
  • 22. Deep learning performance • ImageNet: 1.2M images 0 10 20 30 40 50 60 g2.xlarge g2.8xlarge Running time (hours)
  • 23. Deep learning in computer vision
  • 24. Scene parsing with deep learning [Farabet et al. ‘13]
  • 25. Retrieving similar images Input Image Nearest neighbors
  • 27. Designed a simple user interface #training the model model = graphlab.neuralnet.create(train_images) #predicting classes for new images outcome = model.predict(test_images)
  • 29. Challenges of deep learning
  • 30. Deep learning score card Pros • Enables learning of features rather than hand tuning • Impressive performance gains - Computer vision - Speech recognition - Some text analysis • Potential for more impact
  • 31. Deep learning workflow Lots of labeled data Training set Validation set Learn deep neural net Validate Adjust parameters, network architecture,…
  • 32. 32 Many tricks needed to work well… Different types of layers, connections,… needed for high accuracy [Krizhevsky et al. ’12]
  • 33. Deep learning score card Pros • Enables learning of features rather than hand tuning • Impressive performance gains - Computer vision - Speech recognition - Some text analysis • Potential for more impact Cons • Requires a lot of data for high accuracy • Computationally really expensive • Extremely hard to tune - Choice of architecture - Parameter types - Hyperparameters - Learning algorithm - … Computational cost+ so many choices = incredibly hard to tune
  • 35. 35 Standard image classification approach Input Use simple classifier e.g., logistic regression, SVMs Face? Extract features Hand-created features Can we learn features from data, even when we don’t have data or time?
  • 36. 36 What’s learned in a neural net Very specific to Task 1 Should be ignored for other tasks More generic Can be used as feature extractor vs. Neural net trained for Task 1: cat vs. dog
  • 37. 37 Transfer learning in more detail… Very specific to Task 1 Should be ignored for other tasks More generic Can be used as feature extractor For Task 2, predicting 101 categories, learn only end part of neural net Use simple classifier e.g., logistic regression, SVMs, nearest neighbor,… Class? Keep weights fixed! Neural net trained for Task 1: cat vs. dog
  • 38. 38 Careful where you cut: latter layers may be too task specific Layer 1 Layer 2 Layer 3 Prediction Example detectors learned Example interest points detected [Zeiler & Fergus ‘13] Too specific for new task Use these!
  • 39. Transfer learning with deep features workflow Some labeled data Extract features with neural net trained on different task Learn simple classifier Validate Training set Validation set
  • 40. How general are deep features?
  • 43. Deep learning in production on AWS
  • 44. 44 How to use deep learning in production? PredictiveUnderstands input & takes actions or makes decisions InteractiveResponds in real time LearningImproves its performance with experience
  • 45. Intelligent service at the core…
  • 47. 47 Essential ingredients of intelligent service Responsive Intelligent applications are interactive  Need low latency, high throughput & high availability Adaptive ML models out-of-date the moment learning is done  Need to constantly understand & improve end-to-end performance Manageable Many thousands of models, created by hundreds of people  Need versioning, attribution, provenance & reproducibility
  • 48. Responsive: Now and Always Responsive Intelligent applications are interactive  Need low latency, high throughput & high availability Adaptive ML models out-of-date the moment learning is done  Need to constantly understand & improve end-to-end performance Manageable Many thousands of models, created by hundreds of people  Need versioning, attribution, provenance & reproducibility
  • 50. 50 Challenge: Scoring Latency Compute predictions in < 20ms for complex all while under heavy query load Models Queries TopK Features SELECT * FROM users JOIN items, click_logs, pages WHERE …
  • 51. 51 The Common Solutions to Latency Faster Online Model Scoring “Execute Predict(query) in real-time as queries arrive” Pre-Materialization and Lookup “Pre-compute Predict(query) for all queries and lookup answer at query time”Dato Predictive Services does Both
  • 52. 52 Faster Online Model Scoring: Highly optimized machine learning • SFrame: Native code, optimized data frame - Available open-source (BSD) • Model querying acceleration with native code, e.g., - TopK and Nearest Neighbor eval: • LSH, Ball Trees,…
  • 53. 53 The Common Solutions to Latency Faster Online Model Scoring “Execute Predict(query) in real-time as queries arrive” Pre-Materialization and Lookup “Pre-compute Predict(query) for all queries and lookup answer at query time”Dato Predictive Services does Both
  • 54. 54 Smart Materialization  Caching Unique Queries QueryFrequency Example: top 10% of all unique queries cover 90% of all queries performed. Caching a small number of unique queries has a very large impact.
  • 55. 55 Distributed shared caching Distributed Shared Cache (Redis) Cache: Model query results Common features (e.g., product info) Scale-out improves throughput and latency
  • 56. 56 Dato Latency by the numbers Easy Case: cache hit ~2ms Hard Case: cache miss • Simple Linear Models: 5-6ms • Complex Random Forests: 7-8ms - P99: ~ 15ms [using aws m3.xlarge instance]
  • 57. 57 Challenge: Availability Heavy load substantial delays Frequent model updates  cache misses Machine failures
  • 58. 58 Scale-Out availability under load Heavy Load Elastic Load Balancing load balancer
  • 59. Adaptive: Accounting for Constant Change Responsive Intelligent applications are interactive  Need low latency, high throughput & high availability Adaptive ML models out-of-date the moment learning is done  Need to constantly understand & improve end-to-end performance Manageable Many thousands of models, created by hundreds of people  Need versioning, attribution, provenance & reproducibility
  • 60. 60 Change at Different Scales and Rates Shopping for Mom Shopping for Me Months Rate of Change Minutes Population Granularity of Change Session
  • 61. 61 Months Rate of Change Minutes Population Granularity of Change SessionIndividual and Session Level Change Small Data Online learning Bandits to Assess Models Shopping for Mom Shopping for Me Change at Different Scales and Rates
  • 62. 62 The Dangerous Feedback Loop I once looked at cameras on Amazon … Bags Similar cameras and accessories If this is all they showed how would they learn that I also like bikes, and shoes?
  • 63. 63 Exploration / Exploitation Tradeoff Systems that can take actions can adversely affect future data Exploration Exploitation Best Action Random Action Learn more about what is good and bad Make the best use of what we believe is good.
  • 64. 64 Dato Solution to Adaptivity Rapid offline learning with GraphLab Create Online bandit adaptation in Predictive Services • Demo
  • 65. Manageable: Unification and simplification Responsive Intelligent applications are interactive  Need low latency, high throughput & high availability Adaptive ML models out-of-date the moment learning is done  Need to constantly understand & improve end-to-end performance Manageable Many thousands of models, created by hundreds of people  Need versioning, attribution, provenance & reproducibility
  • 66. 66 Ecosystem of Intelligent Services Data Infrastructure MySQL MySQL Serving Data Science ModelA ModelB TableA TableB Service A Service B Complicated! Many systems, with overlapping roles, no single source of truth for Intelligent Service.
  • 68. 68 Model Management  like code management, but for life cycle of intelligent applications Provenance & Reproducibility • Track changes & rollback • Cover code, model type, parameters, data… Collaboration • Review, blame  • Share • Common feature engineering pipelines Continuous Integration • Deploy & update • Measure & improve • Avoid down time and impact on end-users
  • 69. 69 Dato Predictive Services Responsive Adaptive Manageable Dato Predictive Services Serving Models and Managing the Machine Learning Lifecycle GraphLab Create Accurate, Robust, and Scalable Model Training
  • 70. GraphLab Create: Sophisticated machine learning made easy High-level ML toolkits AutoML tune params, model selection,…  so you can focus on creative parts Reusable features transferrable feature engineering  accuracy with less data & less effort
  • 71. 71 High-level ML toolkits get started with 4 lines of code, then modify, blend, add yours… Recommender Image search Sentiment analysis Data matching Auto tagging Churn predictor Object detector Product sentiment Click prediction Fraud detection User segmentation Data completion Anomaly detection Document clustering Forecasting Search ranking Summarization … import graphlab as gl data = gl.SFrame.read_csv('my_data.csv') model = gl.recommender.create(data, user_id='user', item_id='movie’, target='rating') recommendations = model.recommend(k=5)
  • 72. SFrame ❤️ all ML tools SGraph SFrame: Sophisticated machine learning made scalable
  • 73. Opportunity for Out-of-Core ML Capacity 1 TB 0.5 GB/s 10 TB 0.1 GB/s 0.1 TB 1 GB/sThroughput Fast, but significantly limits data sizeOpportunity for big data on 1 machine For sequential reads only! Random access very slow Out-of-core ML opportunity is huge Usual design → Lots of random access → Slow Design to maximize sequential access for ML algo patterns GraphChi early example SFrame data frame for ML
  • 74. Performance of SFrame/SGraph 70 sec 251 sec 200 sec 2,128 sec 0 750 1500 2250 GraphLab Create GraphX Giraph Spark Connected components in Twitter graph Source(s): Gonzalez et. al. (OSDI 2014) Twitter: 41 million Nodes, 1.4 billion Edges SGraph 16 machines 1 machine
  • 75. 75 SFrame & SGraph Optimized out-of-core computation for ML High Performance 1 machine can handle: TBs of data 100s Billions of edges Optimized for ML . Columnar transformation . Create features . Iterators . Filter, join, group-by, aggregate . User-defined functions . Easily extended through SDK Tables, graphs, text, images Open- source ❤️ BSD license
  • 76. 76 The Dato Machine Learning Platform Predictive Services Serve Models and Manage the Machine Learning Lifecycle GraphLab Create Train Accurate, Robust, and Scalable models