© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Julien Simon
Principal AI/ML Evangelist, Amazon Web Services
Speed up your Machine Learning
workflows with built-in algorithms
@julsimon
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
One-click training for
ML, DL, and custom
algorithms
Easier training with
hyperparameter
optimization
Highly-optimized
machine learning
algorithms
Deployment
without engineering
effort
Fully-managed
hosting at scale
Build
Pre-built notebook
instances
Deploy
Train
Amazon SageMaker
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Training
code
• Matrix Factorization
• Regression
• Principal Component Analysis
• K-Means Clustering
• Gradient Boosted Trees
• And More!
Amazon provided Algorithms
Bring Your Own Container
Amazon SageMaker: model options
Bring Your Own Script
IM Estimators in
Apache Spark
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Streaming datasets, for
cheaper training
Train faster, in a single
pass
Greater reliability on
extremely large
datasets
Choice of several ML
algorithms
Amazon SageMaker: 10x better algorithms
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Infinitely scalable algorithms
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Streaming
GPU State
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Streaming
Data Size
Memory
Data Size
Time/Cost
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Distributed
GPU State
GPU State
GPU State
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Shared State
GPU
GPU
GPU Local
State
Shared
State
Local
State
Local
State
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Cost vs. Time
$$$$
$$$
$$
$
Minutes Hours Days Weeks Months
Best Alternative
Amazon SageMaker
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Linear Learner
Regression (mean squared error)
SageMaker Other
1.02 1.06
1.09 1.02
0.332 0.183
0.086 0.129
83.3 84.5
Classification (F1 Score)
SageMaker Other
0.980 0.981
0.870 0.930
0.997 0.997
0.978 0.964
0.914 0.859
0.470 0.472
0.903 0.908
0.508 0.508
30 GB datasets for web-spam and web-url classification
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30
CostinDollars
Billable time in Minutes
sagemaker-url sagemaker-spam other-url other-spam
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Factorization Machines
Log_loss F1 Score Seconds
SageMaker 0.494 0.277 820
Other (10 Iter) 0.516 0.190 650
Other (20 Iter) 0.507 0.254 1300
Other (50 Iter) 0.481 0.313 3250
Click Prediction 1 TB advertising dataset,
m4.4xlarge machines, perfect scaling.
$-
$20.00
$40.00
$60.00
$80.00
$100.00
$120.00
$140.00
$160.00
$180.00
$200.00
1 2 3 4 5 6 7 8CostinDollars
Billable Time in Hours
10
machines
20
machines
30
machines
4050
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Demo: building a movie recommender with
Factorization Machines
h t t p s : / / m e d i u m . c o m / @ j u l s i m o n / b u i l d i n g - a - m o v i e - r e c o m m e n d e r - w i t h - f a c t o r i z a t i o n -
m a c h i n e s - o n - a m a z o n - s a g e m a k e r - c e d b f c 8 c 9 3 d 8
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
0
1
2
3
4
5
6
7
8
10 100 500
BillableTimeinMinutes Number of Clusters
sagemaker other
K-Means Clustering
k SageMaker Other
Text
1.2GB
10 1.18E3 1.18E3
100 1.00E3 9.77E2
500 9.18.E2 9.03E2
Images
9GB
10 3.29E2 3.28E2
100 2.72E2 2.71E2
500 2.17E2 Failed
Videos
27GB
10 2.19E2 2.18E2
100 2.03E2 2.02E2
500 1.86E2 1.85E2
Advertising
127GB
10 1.72E7 Failed
100 1.30E7 Failed
500 1.03E7 Failed
Synthetic
1100GB
10 3.81E7 Failed
100 3.51E7 Failed
500 2.81E7 Failed
Running Time vs. Number of Clusters
~10x Faster!
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Principal Component Analysis (PCA)
More than 10x faster
at a fraction the cost!
0.00
20.00
40.00
60.00
80.00
100.00
120.00
8 10 20
Mb/Sec/Machine
Number of Machines
other sagemaker-deterministic sagemaker-randomized
Cost vs. Time Throughput and Scalability
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0 10 20 30 40 50
CostinDollars
Billable time in Minutes
other sagemaker-deterministic sagemaker-randomized
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Neural Topic Modeling
Perplexity vs. Number of Topic
Encoder: feedforward net
Input term counts vector
Document
Posterior
Sampled Document
Representation
Decoder:
Softmax
Output term counts vector
0
2000
4000
6000
8000
10000
12000
0 50 100 150 200
Perplexity
Number of Topics
NTM Other
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
DeepAR: Time Series Forecasting
Mean absolute
percentage error
P90 Loss
DeepAR R DeepAR R
traffic
Hourly occupancy rate of 963
Bay Area freeways
0.14 0.27 0.13 0.24
electricity
Electricity use of 370 homes
over time
0.07 0.11 0.08 0.09
pageviews
Page view hits of
websites
10k 0.32 0.32 0.44 0.31
180k 0.32 0.34 0.29 NA
One hour on p2.xlarge, $1
Input
Network
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
DeepAR
https://arxiv.org/abs/1704.04110
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Demo: predicting world temperature
with DeepAR
h t t p s : / / m e d i u m . c o m / @ j u l s i m o n / p r e d i c t i n g - w o r l d - t e m p e r a t u r e - w i t h - t i m e - s e r i e s -
a n d - d e e p a r - o n - a m a z o n - s a g e m a k e r - e 3 7 1 c f 9 4 d d b 5
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
More built-in algorithms
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Spectral LDA
Training Time vs. Number of Topics
0
50
100
150
200
250
0 20 40 60 80 100TrainingTimeinMinutes
Number of Topics
lda-data-a lda-data-b other-data-a other-data-b
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Boosted Decision Trees
Throughput vs. Number of Machines
XGBoost is one of the most
commonly used classifiers.
0
200
400
600
800
1000
1200
1400
0 10 20 30 40 50 60 70
ThroughputinMB/Sec
Number of Machines (C4.8xLarge)
https://github.com/dmlc/xgboost
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Sequence to Sequence
English-German Translation
0
5
10
15
20
25
0 5 10 15 20 25
BLEUScore
Billable Time in Hours
P2.16x P2.8x P2.x
Best known result!
• Based on Sockeye
and Apache MXNet.
• Multi-GPU.
• Can be used for Neural
Machine Translation.
• Supports both RNN/CNN as
encoder/decoder
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
https://arxiv.org/abs/1712.05690
https://github.com/awslabs/sockeye
Sockeye
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Image Classification
• ResNet implementation
with Apache MXNet.
• More networks to come.
• Transfer learning: begin
with a model already
trained on ImageNet!
0
0.5
1
1.5
2
2.5
3
3.5
0 1 2 3 4 5
Speedup
Number of Machines (P2)
Linear Speedup with Horizontal Scaling
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Demo: fine-tuning an image classification
model
h t t p s : / / m e d i u m . c o m / @ j u l s i m o n / i m a g e - c l a s s i f i c a t i o n - o n - a m a z o n - s a g e m a k e r -
9 b 6 6 1 9 3 c 8 b 5 4
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Latest addition: Blazing Text
https://dl.acm.org/citation.cfm?id=3146354
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Resources
https://aws.amazon.com/machine-learning
https://aws.amazon.com/blogs/ai
https://aws.amazon.com/sagemaker (free tier available)
https://github.com/awslabs/amazon-sagemaker-examples
An overview of Amazon SageMaker https://www.youtube.com/watch?v=ym7NEYEx9x4
https://medium.com/@julsimon
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Thank you!
Julien Simon
Principal AI/ML Evangelist, Amazon Web Services
@julsimon

Speed up your Machine Learning workflows with build-in algorithms

  • 1.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Julien Simon Principal AI/ML Evangelist, Amazon Web Services Speed up your Machine Learning workflows with built-in algorithms @julsimon
  • 2.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. One-click training for ML, DL, and custom algorithms Easier training with hyperparameter optimization Highly-optimized machine learning algorithms Deployment without engineering effort Fully-managed hosting at scale Build Pre-built notebook instances Deploy Train Amazon SageMaker
  • 3.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Training code • Matrix Factorization • Regression • Principal Component Analysis • K-Means Clustering • Gradient Boosted Trees • And More! Amazon provided Algorithms Bring Your Own Container Amazon SageMaker: model options Bring Your Own Script IM Estimators in Apache Spark
  • 4.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Streaming datasets, for cheaper training Train faster, in a single pass Greater reliability on extremely large datasets Choice of several ML algorithms Amazon SageMaker: 10x better algorithms
  • 5.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Infinitely scalable algorithms
  • 6.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Streaming GPU State
  • 7.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Streaming Data Size Memory Data Size Time/Cost
  • 8.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Distributed GPU State GPU State GPU State
  • 9.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Shared State GPU GPU GPU Local State Shared State Local State Local State
  • 10.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Cost vs. Time $$$$ $$$ $$ $ Minutes Hours Days Weeks Months Best Alternative Amazon SageMaker
  • 11.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Linear Learner Regression (mean squared error) SageMaker Other 1.02 1.06 1.09 1.02 0.332 0.183 0.086 0.129 83.3 84.5 Classification (F1 Score) SageMaker Other 0.980 0.981 0.870 0.930 0.997 0.997 0.978 0.964 0.914 0.859 0.470 0.472 0.903 0.908 0.508 0.508 30 GB datasets for web-spam and web-url classification 0 0.2 0.4 0.6 0.8 1 1.2 0 5 10 15 20 25 30 CostinDollars Billable time in Minutes sagemaker-url sagemaker-spam other-url other-spam
  • 12.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Factorization Machines Log_loss F1 Score Seconds SageMaker 0.494 0.277 820 Other (10 Iter) 0.516 0.190 650 Other (20 Iter) 0.507 0.254 1300 Other (50 Iter) 0.481 0.313 3250 Click Prediction 1 TB advertising dataset, m4.4xlarge machines, perfect scaling. $- $20.00 $40.00 $60.00 $80.00 $100.00 $120.00 $140.00 $160.00 $180.00 $200.00 1 2 3 4 5 6 7 8CostinDollars Billable Time in Hours 10 machines 20 machines 30 machines 4050
  • 13.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Demo: building a movie recommender with Factorization Machines h t t p s : / / m e d i u m . c o m / @ j u l s i m o n / b u i l d i n g - a - m o v i e - r e c o m m e n d e r - w i t h - f a c t o r i z a t i o n - m a c h i n e s - o n - a m a z o n - s a g e m a k e r - c e d b f c 8 c 9 3 d 8
  • 14.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. 0 1 2 3 4 5 6 7 8 10 100 500 BillableTimeinMinutes Number of Clusters sagemaker other K-Means Clustering k SageMaker Other Text 1.2GB 10 1.18E3 1.18E3 100 1.00E3 9.77E2 500 9.18.E2 9.03E2 Images 9GB 10 3.29E2 3.28E2 100 2.72E2 2.71E2 500 2.17E2 Failed Videos 27GB 10 2.19E2 2.18E2 100 2.03E2 2.02E2 500 1.86E2 1.85E2 Advertising 127GB 10 1.72E7 Failed 100 1.30E7 Failed 500 1.03E7 Failed Synthetic 1100GB 10 3.81E7 Failed 100 3.51E7 Failed 500 2.81E7 Failed Running Time vs. Number of Clusters ~10x Faster!
  • 15.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Principal Component Analysis (PCA) More than 10x faster at a fraction the cost! 0.00 20.00 40.00 60.00 80.00 100.00 120.00 8 10 20 Mb/Sec/Machine Number of Machines other sagemaker-deterministic sagemaker-randomized Cost vs. Time Throughput and Scalability 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 10 20 30 40 50 CostinDollars Billable time in Minutes other sagemaker-deterministic sagemaker-randomized
  • 16.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Neural Topic Modeling Perplexity vs. Number of Topic Encoder: feedforward net Input term counts vector Document Posterior Sampled Document Representation Decoder: Softmax Output term counts vector 0 2000 4000 6000 8000 10000 12000 0 50 100 150 200 Perplexity Number of Topics NTM Other
  • 17.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. DeepAR: Time Series Forecasting Mean absolute percentage error P90 Loss DeepAR R DeepAR R traffic Hourly occupancy rate of 963 Bay Area freeways 0.14 0.27 0.13 0.24 electricity Electricity use of 370 homes over time 0.07 0.11 0.08 0.09 pageviews Page view hits of websites 10k 0.32 0.32 0.44 0.31 180k 0.32 0.34 0.29 NA One hour on p2.xlarge, $1 Input Network
  • 18.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. DeepAR https://arxiv.org/abs/1704.04110
  • 19.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Demo: predicting world temperature with DeepAR h t t p s : / / m e d i u m . c o m / @ j u l s i m o n / p r e d i c t i n g - w o r l d - t e m p e r a t u r e - w i t h - t i m e - s e r i e s - a n d - d e e p a r - o n - a m a z o n - s a g e m a k e r - e 3 7 1 c f 9 4 d d b 5
  • 20.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. More built-in algorithms
  • 21.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Spectral LDA Training Time vs. Number of Topics 0 50 100 150 200 250 0 20 40 60 80 100TrainingTimeinMinutes Number of Topics lda-data-a lda-data-b other-data-a other-data-b
  • 22.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Boosted Decision Trees Throughput vs. Number of Machines XGBoost is one of the most commonly used classifiers. 0 200 400 600 800 1000 1200 1400 0 10 20 30 40 50 60 70 ThroughputinMB/Sec Number of Machines (C4.8xLarge) https://github.com/dmlc/xgboost
  • 23.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Sequence to Sequence English-German Translation 0 5 10 15 20 25 0 5 10 15 20 25 BLEUScore Billable Time in Hours P2.16x P2.8x P2.x Best known result! • Based on Sockeye and Apache MXNet. • Multi-GPU. • Can be used for Neural Machine Translation. • Supports both RNN/CNN as encoder/decoder
  • 24.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. https://arxiv.org/abs/1712.05690 https://github.com/awslabs/sockeye Sockeye
  • 25.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Image Classification • ResNet implementation with Apache MXNet. • More networks to come. • Transfer learning: begin with a model already trained on ImageNet! 0 0.5 1 1.5 2 2.5 3 3.5 0 1 2 3 4 5 Speedup Number of Machines (P2) Linear Speedup with Horizontal Scaling
  • 26.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Demo: fine-tuning an image classification model h t t p s : / / m e d i u m . c o m / @ j u l s i m o n / i m a g e - c l a s s i f i c a t i o n - o n - a m a z o n - s a g e m a k e r - 9 b 6 6 1 9 3 c 8 b 5 4
  • 27.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Latest addition: Blazing Text https://dl.acm.org/citation.cfm?id=3146354
  • 28.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Resources https://aws.amazon.com/machine-learning https://aws.amazon.com/blogs/ai https://aws.amazon.com/sagemaker (free tier available) https://github.com/awslabs/amazon-sagemaker-examples An overview of Amazon SageMaker https://www.youtube.com/watch?v=ym7NEYEx9x4 https://medium.com/@julsimon
  • 29.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Thank you! Julien Simon Principal AI/ML Evangelist, Amazon Web Services @julsimon