Amazon SageMaker is a fully-managed platform that lets developers and data scientists build and scale machine learning solutions. First, we'll show you how SageMaker Ground Truth helps you label large training datasets. Then, using Jupyter notebooks, we'll show you how to build, train and deploy models using built-in algorithms and frameworks (TensorFlow, Apache MXNet, etc). Finally, we'll show you how to use 3rd-party models from the AWS marketplace.
1. ยฉ 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Build, train and deploy Machine
Learning models at scale
Julien Simon
Global Evangelist, AI & Machine Learning
@julsimon
2. Put Machine Learning in the hands
of every developer and data scientist
Our mission
3. M L F R A M E W O R K S &
I N F R A S T R U C T U R E
The Amazon ML Stack: Broadest & Deepest Set of Capabilities
A I S E R V I C E S
R E K O G N I T I O N
I M A G E
P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D
C O M P R E H E N D
M E D I C A L
L E XR E K O G N I T I O N
V I D E O
Vision Speech Chatbots
A M A Z O N S A G E M A K E R
B U I L D T R A I N
F O R E C A S TT E X T R A C T P E R S O N A L I Z E
D E P L O Y
Pre-built algorithms & notebooks
Data labeling (G R O U N D T R U T H )
One-click model training & tuning
Optimization ( N E O )
One-click deployment & hosting
M L S E R V I C E S
F r a m e w o r k s I n t e r f a c e s I n f r a s t r u c t u r e
E C 2 P 3
& P 3 d n
E C 2 C 5 F P G A s G R E E N G R A S S E L A S T I C
I N F E R E N C E
Models without training data (REINFORCEMENT LEARNING)Algorithms & models ( A W S M A R K E T P L A C E )
Language Forecasting Recommendations
NEW NEWNEW
NEW
NEW
NEWNEW
NEW
NEW
4. Amazon SageMaker:
Build, Train, and Deploy ML Models at Scale
Collect and prepare
training data
Choose and optimize
your
ML algorithm
Train and
Tune ML Models
Set up and
manage
environments
for training
Deploy models
in production
Scale and manage
the production
environment
1
2
3
5. Machine learning cycle
Business
Problem
ML problem
framing
Data collection
Data integration
Data preparation
and cleaning
Data visualization
and analysis
Feature engineering
Model training and
parameter tuning
Model evaluation
Monitoring and
debugging
Model deployment
Predictions
Are business
goals
met?
YESNO
Dataaugmentation
Feature
augmentation
Re-training
6. Manage data on AWS
Business
Problem
ML problem
framing
Data collection
Data integration
Data preparation
and cleaning
Data visualization
and analysis
Feature engineering
Model training and
parameter tuning
Model evaluation
Monitoring and
debugging
Model deployment
Predictions
Are business
goals
met?
YESNO
Dataaugmentation
Feature
augmentation
Re-training
7. Build and train models using SageMaker
Business
Problem
ML problem
framing
Data collection
Data integration
Data preparation
and cleaning
Data visualization
and analysis
Feature engineering
Model training and
parameter tuning
Model evaluation
Monitoring and
debugging
Model deployment
Predictions
Are business
goals
met?
YESNO
Dataaugmentation
Feature
augmentation
Re-training
8. Deploy models using SageMaker
Business
Problem
ML problem
framing
Data collection
Data integration
Data preparation
and cleaning
Data visualization
and analysis
Feature engineering
Model training and
parameter tuning
Model evaluation
Monitoring and
debugging
Model deployment
Predictions
Are business
goals
met?
YESNO
Dataaugmentation
Feature
augmentation
Re-training
9. Amazon SageMaker
Fully managed
hosting with auto-
scaling
One-click
deployment
Pre-built
notebooks for
common
problems
Built-in, high-
performance
algorithms
and frameworks
One-click
training
Hyperparameter
optimization
DeployTrainBuild
Model compilation
Elastic inference
Inference pipelines
P3DN, C5N
TensorFlow on 256 GPUs
Resume HPO tuning job
New built-in algorithms
scikit-learn environment
Model marketplace
Search
Git integration
Elastic inference
11. ยฉ 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
12. The Amazon SageMaker API
โข Python SDK orchestrating all Amazon SageMaker activity
โข High-level objects for algorithm selection, training, deploying,
automatic model tuning, etc.
โข Spark SDK (Python & Scala)
โข AWS CLI: โaws sagemakerโ
โข AWS SDK: boto3, etc.
13. Model Training (on EC2)
Model Hosting (on EC2)
Trainingdata
Modelartifacts
Training code Helper code
Helper codeInference code
GroundTruth
Client application
Inference code
Training code
Inference requestInference response
Inference Endpoint
14. Training code
Factorization Machines
Linear Learner
Principal Component Analysis
K-Means Clustering
XGBoost
And more
Built-in Algorithms Bring Your Own ContainerBring Your Own Script
Model options
15. ยฉ 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
16. Built-in algorithms
orange: supervised, yellow: unsupervised
Linear Learner: regression, classification Image Classification: Deep Learning (ResNet)
Factorization Machines: regression, classification,
recommendation
Object Detection (SSD): Deep Learning
(VGG or ResNet)
K-Nearest Neighbors: non-parametric regression and
classification
Neural Topic Model: topic modeling
XGBoost: regression, classification, ranking
https://github.com/dmlc/xgboost
Latent Dirichlet Allocation: topic modeling (mostly)
K-Means: clustering Blazing Text: GPU-based Word2Vec,
and text classification
Principal Component Analysis: dimensionality
reduction
Sequence to Sequence: machine translation, speech
to text and more
Random Cut Forest: anomaly detection DeepAR: time-series forecasting (RNN)
Object2Vec: general-purpose embedding IP Insights: usage patterns for IP addresses
Semantic Segmentation: Deep Learning
18. ยฉ 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Blazing Text
https://dl.acm.org/citation.cfm?id=3146354
19. Demo:
Text Classification with BlazingText
https://github.com/awslabs/amazon-sagemaker-
examples/tree/master/introduction_to_amazon_algorithms/blazingtext_text_classification_dbpedia
20. XGBoost
โข Open Source project
โข Popular tree-based algorithm
for regression, classification
and ranking
โข Builds a collection of trees.
โข Handles missing values
and sparse data
โข Supports distributed training
โข Can work with data sets larger
than RAM
https://github.com/dmlc/xgboost
https://xgboost.readthedocs.io/en/latest/
https://arxiv.org/abs/1603.02754
22. ยฉ 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
23. Demo:
Keras/TensorFlow CNN on CIFAR-10
https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-
sdk/tensorflow_keras_cifar10/tensorflow_keras_CIFAR10.ipynb
24. Demo:
Sentiment analysis with Apache MXNet
https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-
sdk/mxnet_sentiment_analysis_with_gluon.ipynb
25. Amazon SageMaker
Fully managed
hosting with auto-
scaling
One-click
deployment
Pre-built
notebooks for
common
problems
Built-in, high-
performance
algorithms
and frameworks
One-click
training
Hyperparameter
optimization
Build Train Deploy
26. ยฉ 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Getting started
http://aws.amazon.com/free
https://ml.aws
https://aws.amazon.com/sagemaker
https://github.com/aws/sagemaker-python-sdk
https://github.com/aws/sagemaker-spark
https://github.com/awslabs/amazon-sagemaker-examples
https://gitlab.com/juliensimon/ent321
https://medium.com/@julsimon
https://gitlab.com/juliensimon/dlnotebooks
27. Thank you!
ยฉ 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Julien Simon
Global Evangelist, AI & Machine Learning
@julsimon
https://medium.com/@julsimon