8. Supervised Learning
• Training data provides “examples” and “outcomes”
• The machine learns to predict the outcome of new data based on the past
examples
9. Supervised Learning
Training data has one feature that is the “outcome”
Split the data into a training and test set
Model the training set / Predict the test test
Compares the predictions to the known values
Algorithms
Model / Ensemble
Logistic Regression
Time Series
Sometimes referred to as the “label” or “objective”
Goal is to build a model which can predict the outcome
If categorical: model is a “classification” or model is a “regression”
Because the data has a known value, model can be evaluated
29. Sequence to Sequence problems
0
5
10
15
20
25
0 5 10 15 20 25
BLEUScore
Billable Time in HoursP2.16x P2.8x P2.x
Best known result!
Based on Sockeye and Apache
incubated MxNet, Multi-GPU,
and can be used for Neural
Machine Translation.
Supports both RNN/CNN
as encoder/decoder
31. Sequence to Sequence problems
(1) Vanilla mode of processing without RNN, from fixed-sized input to fixed-sized output (e.g. image classification).
(2) Sequence output (e.g. image captioning takes an image and outputs a sentence of words).
(3) Sequence input (e.g. sentiment analysis where a given sentence is classified as expressing positive
or negative sentiment).
(4) Sequence input and sequence output (e.g. Machine Translation: an RNN reads a sentence in English and
then outputs a sentence in French).
(5) Synced sequence input and output (e.g. video classification where we wish to label each frame of the video).
32. Image Classification
Implementation in MxNet of
ResNet.
Other networks such as
DenseNet and Inception will
be added in the future.
Transfer learning: begin with
a model already trained on
ImageNet!
0
0.5
1
1.5
2
2.5
3
3.5
0 1 2 3 4 5
Speedup
Number of Machine (P2)
Speedup with Horizontal Scaling
36. DeepAR
1. DeepAR model effectively learns a global model from related
time series
2. Is able to lean complex patterns such as seasonality and
uncertainty growth over time from the data
3. Interestingly, the method works with little or no hyper
parameters tuning on wide variety of dataset, and in is
applicable to medium-size dataset containing only few hundred
time series
4. Scales up to datasets comprising 100000+ time series
37. Time series global model for related time series
Related time series
e.g. demand for various
Product sold by you
38. Time Series Forecasting
Mean absolute
percentage error
P90 Loss
DeepAR R DeepAR R
traffic
Hourly occupancy rate of 963
bay area freeways
0.14 0.27 0.13 0.24
electricity
Electricity use of 370
homes over time
0.07 0.11 0.08 0.09
pageviews
Page view hits
of websites
10k 0.32 0.32 0.44 0.31
180k 0.32 0.34 0.29 NA
One hour on p2.xlarge, $1
Input
Network
54. Spectral LDA
Training Time vs. Number of Topics
0
50
100
150
200
250
0 20 40 60 80 100TrainingTimeinMinutes
Number of Topics
lda-data-a lda-data-b other-data-a other-data-b
66. Amazon ML Lab
Lots of companies
doing Machine
Learning
Unable to unlock
business potential
Brainstorming Modeling Teaching
Lack ML
expertise
Leverage Amazon experts with decades of ML
experience with technologies like Amazon Echo,
Amazon Alexa, Prime Air, and Amazon Go
Amazon ML Lab
provides the missing
ML expertise