Amazon SageMaker 內建機器學習演算法 (Level 400)

© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
James Chiang (蔣宗恩)
Solutions Architect, Amazon Web Services
AWS AI & ML:
Overview of the 10 Most Common ML
Algorithms

Agenda
• AI Quick Introduction
• Supervised Learning Algorithm
• Unsupervised Learning Algorithm
• How Sagemaker Run Algorithm

I Will Be Back !!!!

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AI, Machine Learning, and Deep Learning

Fast Compute
Ubiquitous Data
Advanced Learning Algorithms
Key Drivers for Deep Learning

Types of Machine Learning
Supervised
Human Intervention and validation required
Photo classification or tagging
Vs.
Unsupervised
No human intervention required
Auto-classification of documents based on context

Supervised Learning
• Training data provides “examples” and “outcomes”
• The machine learns to predict the outcome of new data based on the past
examples

Supervised Learning
Training data has one feature that is the “outcome”
Split the data into a training and test set
Model the training set / Predict the test test
Compares the predictions to the known values
Algorithms
Model / Ensemble
Logistic Regression
Time Series
Sometimes referred to as the “label” or “objective”
Goal is to build a model which can predict the outcome
If categorical: model is a “classification” or model is a “regression”
Because the data has a known value, model can be evaluated

Unsupervised Learning
Training data provides “examples” - no specific “outcome”
The machine tries to find “interesting” patterns in the data

Unsupervised Learning
Training data has only examples and no specific “outcome”
This is common - labels are typically expensive
Goal is to perform discovery, find patterns, etc
Tends to be more diﬃcult
Algorithms
Clusters
Anomaly Detection
Association Discovery
Topic Models
Because the data has no “outcome”, can not be evaluated
Each discovery method has it’s own quality measures

Data Visualization &
Analysis
Business Problem –
ML problem framing Data Collection
Data Integration
Data Preparation &
Cleaning
Feature Engineering
Model Training &
Parameter Tuning
Model Evaluation
Are Business
Goals met?
Model Deployment
Monitoring &
Debugging
– Predictions
YesNo
DataAugmentation
Feature
Augmentation
The Machine Learning Process
Re-training

Introducing: Amazon SageMaker
A managed service
that provides the quickest and easiest way for
your data scientists and developers to get
ML models from idea to production.

Data Visualization &
Analysis
Business Problem –
ML problem framing Data Collection
Data Integration
Data Preparation &
Cleaning
Feature Engineering
Model Training &
Parameter Tuning
Model Evaluation
Are Business
Goals met?
Model Deployment
Monitoring &
Debugging
– Predictions
YesNo
DataAugmentation
Feature
Augmentation
Retraining
• Setup and manage
Notebook Environments
• Setup and manage
Training Clusters
• Write Data Connectors
• Scale ML algorithms to
large datasets
• Distribute ML training
algorithm to multiple
machines
• Secure Model artifacts
Why We built Amazon SageMaker: The Model Training Undifferentiated Heavy Lifting

Built-in ML Algorithm
h t t p s : / / d o c s . a w s . a m a z o n . c o m / s a g e m a k e r / l a t e s t / d g / a l g o s . h t m l
Problem Algorithm Learning Typ
Discrete Classification,
Regression
Linear Learner Supervised
XGBoost Algorithm Supervised
Discrete Recommendations Factorization Machines Supervised
Image Classification Image Classification Algorithm Supervised, CNN
Neural Machine Translation Sequence to Sequence Supervised, seq2seq
Time-series Prediction DeepAR Supervised, RNN
Discrete Groupings K-Means Algorithm Unsupervised
Dimensionality Reduction PCA (Principal Component Analysis) Unsupervised
Topic Determination Latent Dirichlet Allocation (LDA) Unsupervised
Neural Topic Model (NTM) Unsupervised,
Neural Network Based

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Supervised Learning Algorithm

Linear Learner
Regression:
Estimate a real valuedfunction
Binary classification:
Predict a 0/1 class

Train
Fit thresholds
and select
Select modelwith best validation performance
Linear Learner
More than 8x speed-up over naive parallel training!

Linear Learner
Regression (mean squarederror)
SageMaker Other
1.02 1.06
1.09 1.02
0.332 0.183
0.086 0.129
83.3 84.5
Classification (F1 Score)
SageMaker Other
0.980 0.981
0.870 0.930
0.997 0.997
0.978 0.964
0.914 0.859
0.470 0.472
0.903 0.908
0.508 0.508
30-GB datasets for web-spam and web-URL classification
1.2
1
0.8
0.6
0.4
0.2
0
0 5 10 20 25 30
CostinDollars
15
Billable time inMinutes
sagemaker-url sagemaker-spam other-url other-spam

Boosted Decision Trees
Throughput vs. number of machinesXGBoost is one of the most
commonly used
implementations of boosted
decision trees in theworld.
It is now available inAmazon
SageMaker!
400
200
0
600
800
1000
1200
1400
0 10 60 70
ThroughputinMB/Sec
20 30 40 50
Number of machines (C4.8xLarge)

When Trees forms a Forest(Tree Ensembles)

split
weight
Training Loss measures how
Well model fit on training data
Regularization, measures
complexity of trees
XGboots introduction

XGboots introduction

Factorization Machines
Log_loss F1 Score Seconds
SageMaker 0.494 0.277 820
Other (10 Iter) 0.516 0.190 650
Other (20 Iter) 0.507 0.254 1300
Other (50 Iter) 0.481 0.313 3250
Click Prediction 1-TBadvertising dataset,
m4.4xlarge machines, perfect scaling
$200.00
$180.00
$160.00
$140.00
$120.00
$100.00
$80.00
$60.00
$40.00
$20.00
$-
1 2 3 6 7 8
Costindollars
4 5
Billable time in hours
sa
50 40 30 20 10
m chines machine s machin
e

Where factorization is useful
A user movie rating table

Predictions for recommendations
Star Wars Inception Godfather Notebook
U1 4.97 2.98 2.18 0.98
U2 3.97 2.4 1.97 0.99
U3 1.02 0.93 5.02 4.93
U4 1.00 0.85 4.59 3.93
U5 1.36 1.07 4.89 4.12

Sequence to Sequence problems
0
5
10
15
20
25
0 5 10 15 20 25
BLEUScore
Billable Time in HoursP2.16x P2.8x P2.x
Best known result!
Based on Sockeye and Apache
incubated MxNet, Multi-GPU,
and can be used for Neural
Machine Translation.
Supports both RNN/CNN
as encoder/decoder

RNN Introduction
Ex: Arrive Taipei at November 2nd
Leave Taipei at November 2nd
LSTM

Sequence to Sequence problems
(1) Vanilla mode of processing without RNN, from fixed-sized input to fixed-sized output (e.g. image classification).
(2) Sequence output (e.g. image captioning takes an image and outputs a sentence of words).
(3) Sequence input (e.g. sentiment analysis where a given sentence is classified as expressing positive
or negative sentiment).
(4) Sequence input and sequence output (e.g. Machine Translation: an RNN reads a sentence in English and
then outputs a sentence in French).
(5) Synced sequence input and output (e.g. video classification where we wish to label each frame of the video).

Image Classification
Implementation in MxNet of
ResNet.
Other networks such as
DenseNet and Inception will
be added in the future.
Transfer learning: begin with
a model already trained on
ImageNet!
0
0.5
1
1.5
2
2.5
3
3.5
0 1 2 3 4 5
Speedup
Number of Machine (P2)
Speedup with Horizontal Scaling

Image Classification problem
Resnet

IMAGENET Challenge
11 million labelled image
across 11000 categories!

DeepAR
1. DeepAR model effectively learns a global model from related
time series
2. Is able to lean complex patterns such as seasonality and
uncertainty growth over time from the data
3. Interestingly, the method works with little or no hyper
parameters tuning on wide variety of dataset, and in is
applicable to medium-size dataset containing only few hundred
time series
4. Scales up to datasets comprising 100000+ time series

Time series global model for related time series
Related time series
e.g. demand for various
Product sold by you

Time Series Forecasting
Mean absolute
percentage error
P90 Loss
DeepAR R DeepAR R
traffic
Hourly occupancy rate of 963
bay area freeways
0.14 0.27 0.13 0.24
electricity
Electricity use of 370
homes over time
0.07 0.11 0.08 0.09
pageviews
Page view hits
of websites
10k 0.32 0.32 0.44 0.31
180k 0.32 0.34 0.29 NA
One hour on p2.xlarge, $1
Input
Network

Types of forecasts
1. Cold start forecasting
2. Probability forecasts

Unsupervised Learning Algorithm

K-Means Clustering

K-Means Clustering
Method Accurate? Passes Efficient
tuning
Comments
Lloyds [1] Yes* 5–10 No
K-Means ++ [2] Yes k+5 to k+10 No scikit-learn
K-Means|| [3] Yes 7–12 No spark.ml
Online [4] No 1 No
Streaming [5,6] No 1 No Impractical
Webscale [7] No 1 No Spark streaming
Coresets [8] No 1 Yes Impractical
Amazon SageMaker Yes 1 Yes
1 Lloyd, IEEE TIT, 1982
2 Arthur et. al. ACM-SIAM, 2007
3 Bahmani et.al.,VLDB, 2012
4 Liberty et. al., 2015
5 Shindler et. al, NIPS, 2011
6 Guha et. al, IEEE Trans. Knowl. Data Eng. 2003
7 Sculley, WWW, 2010
8 Feldman et. al.

2
1
0
3
6
5
4
7
8
10 500
BillableTimeinMinutes
100
Number of clusters
sagemaker other
K-Means Clustering
9.77E2
Running time vs. Number ofclustersk SageMaker Other
Text
1.2 GB
10 1.18E3 1.18E3
100 1.00E3
500 9.18.E2 9.03E2
Images
9 GB
10 3.29E2 3.28E2
100 2.72E2 2.71E2
500 2.17E2 Failed
Videos
27 GB
10 2.19E2 2.18E2
100 2.03E2 2.02E2
500 1.86E2 1.85E2
Advertising
127 GB
10 1.72E7 Failed
100 1.30E7 Failed
500 1.03E7 Failed
Synthetic
1,100 GB
10 3.81E7 Failed
100 3.51E7 Failed
500 2.81E7 Failed
~10x faster!

Principal Component Analysis (PCA)

Data Distribution

The variation for PC1 = 15,
The variation for PC2 = 3

More than 10x
fa
ster
at afractionthe cost!
0.00
20.00
40.00
60.00
80.00
100.00
120.00
8 20
Mb/Sec/Machine
10
Number of machines
other sagemaker-deterministic sagemaker-randomized
Cost vs.time Throughput and scalability
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0 5 10 35 40 45
CostinDollars
other
15 20 25 30
Billable time inminutes
sagemaker-deterministic sagemaker-randomized

Spectral LDA
Training Time vs. Number of Topics
0
50
100
150
200
250
0 20 40 60 80 100TrainingTimeinMinutes
Number of Topics
lda-data-a lda-data-b other-data-a other-data-b

Topic eat sleep play meow bark
Topic 1 0.1 0.3 0.2 0.4 0.0
Topic 2 0.2 0.1 0.4 0.0 0.3
Sample Output
About cats, as cats like
To meow and sleep
About dogs, as dogs like
To play and bark

Neural Topic Modeling
• Perplexity vs. number of topics
Input term counts vector
Document
posterior
Sampled document
representation
Decoder:
Softmax
Output term counts vector
6000
4000
2000
0
8000
10000
12000
0 50 150 200
Perplexity
100
Number oftopics
NTM Other

Using Amazon SageMaker

Input data
From the Command Line
profile=<your_profile>
arn_role=<your_arn_role>
training_image=382416733822.dkr.ecr.us-east-1.amazonaws.com/kmeans:1
training_job_name=clustering_text_documents_`date '+%Y_%m_%d_%H_%M_%S'` aws --
profile $profile
--region us-east-1
sagemaker create-training-job
--training-job-name $training_job_name
--algorithm-specification TrainingImage=$training_image,TrainingInputMode=File
--hyper-parameters k=10,feature_dim=1024,mini_batch_size=1000
--role-arn $arn_role
--input-data-config '{"ChannelName": "train", "DataSource":
{"S3DataSource":{"S3DataType": "S3Prefix", "S3Uri": "s3://kmeans_demo/train",
"S3DataDistributionType": "ShardedByS3Key"}}, "CompressionType": "None", "RecordWrapperType": "None"}'

--output-data-config S3OutputPath=s3://training_output/$training_job_name
--resource-config InstanceCount=2,InstanceType=ml.c4.8xlarge,VolumeSizeInGB=50
--stopping-condition MaxRuntimeInSeconds=3600
Hardware
Algorithm

From Amazon EMR
Hardware
Parameters
Start training
Apply model

From Amazon SageMaker Notebooks
Hardware
Parameters
Start training
Host model

ML compute by the
second starting
at $0.0464/hr
ML storage by the
second
at $0.14
per GB-month
Data processed in
notebooks and hosting
at $0.016 per GB
Free trial to get
started quickly
Pay-as -You-Go (Inexpensive, Too)

Amazon SageMaker – Try It Free
aws.amazon.com/sagemaker

Amazon ML Lab
Lots of companies
doing Machine
Learning
Unable to unlock
business potential
Brainstorming Modeling Teaching
Lack ML
expertise
Leverage Amazon experts with decades of ML
experience with technologies like Amazon Echo,
Amazon Alexa, Prime Air, and Amazon Go
Amazon ML Lab
provides the missing
ML expertise

Thank you!

Amazon SageMaker 內建機器學習演算法 (Level 400)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Amazon SageMaker 內建機器學習演算法 (Level 400)

Similar to Amazon SageMaker 內建機器學習演算法 (Level 400) (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Amazon SageMaker 內建機器學習演算法 (Level 400)