An Introduction
to Amazon SageMaker
Julien Simon
Principal Technical Evangelist, AI and Machine Learning, AWS
@julsimon
October 2018
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Put Machine Learning in the hands
of every developer and data scientist
Our mission
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Application
Services
Platform
Services
Frameworks
& Infrastructure
API-driven services:Vision, Language & Speech Services, Chatbots
AWS ML Stack
Deploy machine learning models with high-performance machine learning algorithms,
broad framework support, and one-click training, tuning, and inference.
Develop sophisticated models with any framework, create managed, auto-scaling
clusters of GPUs for large scale training, or run prediction
on trained models.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Application
Services
Platform
Services
Frameworks
& Infrastructure
API-driven services:Vision, Language & Speech Services, Chatbots
AWS ML Stack
Deploy machine learning models with high-performance machine learning algorithms,
broad framework support, and one-click training, tuning, and inference.
Develop sophisticated models with any framework, create managed, auto-scaling
clusters of GPUs for large scale training, or run prediction
on trained models.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Data Visualization &
Analysis
Business Problem
ML problem framing Data Collection
Data Integration
Data Preparation &
Cleaning
Feature Engineering
Model Training &
Parameter Tuning
Model Evaluation
Are Business
Goals met?
Model Deployment
Monitoring &
Debugging
YesNo
DataAugmentation
Feature
Augmentation
The Machine Learning Process
Re-training
Predictions
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
ML is still too complicated for everyday developers
Collect and prepare
training data
Choose and optimize
your ML algorithm
Set up and manage
environments for
training
Train and tune model
(trial and error)
Deploy model
in production
Scale and manage the
production
environment
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Amazon SageMaker
Pre-built
notebooks for
common
problems
K-MeansClustering
Principal Component Analysis
Neural TopicModelling
FactorizationMachines
Linear Learner
XGBoost
Latent Dirichlet Allocation
ImageClassification
Seq2Seq,
And more!
ALGORITHMS
Apache MXNet, Chainer
TensorFlow, PyTorch
Caffe2, CNTK,
Torch
FRAMEWORKS Set up and manage
environments for training
Train and tune
model (trial and
error)
Deploy model
in production
Scale and manage the
production environment
Built-in, high-
performance
algorithms
Build
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Amazon SageMaker
Pre-built
notebooks for
common
problems
Built-in, high-
performance
algorithms
One-click
training
Hyperparameter
optimization
Build Train
Deploy model
in production
Scale and manage the
production
environment
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Amazon SageMaker
Fully managed
hosting with auto-
scaling
One-click
deployment
Pre-built
notebooks for
common
problems
Built-in, high-
performance
algorithms
One-click
training
Hyperparameter
optimization
Build Train Deploy
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Selected Amazon SageMaker customers
Working with Amazon SageMaker
TheAmazon SageMakerAPI
• Python SDK orchestrating all Amazon SageMaker activity
• High-level objects for algorithm selection, training, deploying, automatic
model tuning, etc.
• There’s also a Spark SDK (Python & Scala) which we won’t cover today.
• AWS CLI: ‘aws sagemaker’
• AWS SDK: boto3, etc.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Model Training (on EC2)
Model Hosting (on EC2)
Trainingdata
Modelartifacts
Training code Helper code
Helper codeInference code
GroundTruth
Client application
Inference code
Training code
Inference requestInference
response
Inference Endpoint
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Training code
Factorization Machines
Linear Learner
PrincipalComponent Analysis
K-Means Clustering
XGBoost
And more
Built-inAlgorithms BringYour Own ContainerBringYour Own Script
Model options
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Amazon EC2 C5 / C5d instances
AVX 512
72 vCPUs
“Skylake”
144 GiB memory
C5
12 Gbps to EBS
2X vCPUs
2X performance
3X throughput
2.4X memory
C4
36 vCPUs
“Haswell”
4 Gbps to EBS
60 GiB memory
C5: N e x t G e n e r at ion
Compu t e -Opt imize d
I n st an ces wit h
I n t e l® Xe on ® Sc alab le Pr oc e ssor
AWS Compu t e opt imize d in st an c e s
su ppor t t he n e w I n t e l® AV X -51 2
ad van ced in st r uct ion set , en ab lin g
you t o mor e e f f ic ie n t ly r u n ve c t or
pr oc e ssin g wor k load s wit h sin gle
an d d ou b le f loat in g poin t
pr e c ision , su c h as A I /mac hin e
le ar n in g or vid e o pr oc e ssin g.
25% improvement in
price/performance over C4
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
FasterTensorFlow training on C5
https://aws.amazon.com/blogs/machine-learning/faster-training-with-optimized-tensorflow-1-6-on-
amazon-ec2-c5-and-p3-instances/
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Amazon EC2 P3 Instances
• P3.2xlarge, P3.8xlarge, P3.16xlarge
• Up to eight NVIDIATeslaV100 GPUs in a single instance
• 40,960 CUDA cores, 5120Tensor cores
• 128GB of GPU memory
• 1 PetaFLOPs of computational performance – 14x better than P2
• 300 GB/s GPU-to-GPU communication (NVLink) – 9x better than P2
T h e f a s t e s t , m o s t p o w e r f u l G P U i n s t a n c e s i n t h e c l o u d
Built-in Algorithms
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Scalable algorithms implemented by Amazon
orange: supervised, yellow: unsupervised
Linear Learner: regression, classification Image Classification: Deep Learning (ResNet)
Factorization Machines: regression,
classification, recommendation
Object Detection: Deep Learning
(VGG or ResNet)
K-Nearest Neighbors: non-parametric
regression and classification
NeuralTopic Model: topic modeling
XGBoost: regression, classification, ranking
https://github.com/dmlc/xgboost
Latent Dirichlet Allocation: topic modeling
(mostly)
K-Means: clustering BlazingText: GPU-based Word2Vec,
and text classification
Principal Component Analysis:
reduction
Sequence to Sequence: machine translation,
speech to text and more
Random Cut Forest: anomaly detection DeepAR: time-series forecasting (RNN)
XGBoost
• Very popular tree-based
algorithm for regression,
classification and ranking
• Handles missing values
automatically
• Works well on sparse data
• Trains fast
• “Easy” to inspect and control
https://github.com/dmlc/xgboost
https://xgboost.readthedocs.io/en/latest/
https://arxiv.org/abs/1603.02754
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Demo:
Customer classification with XGBoost
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Blazing Text
https://dl.acm.org/citation.cfm?id=3146354
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Demo:
Text Classification with BlazingText
Automatic ModelTuning
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Automatic Model Tuning
Finding the optimal set of hyper parameters
1. Manual Search (”I know what I’m doing”)
2. Random Search (“Spray and pray”)
3. Grid Search (“X marks the spot”)
• Typically training hundreds of models
• Slow and expensive
4. HPO: use Machine Learning
• Training fewer models
• Gaussian Process Regression and Bayesian Optimization,
https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-how-it-works.html
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Demo:
Customer classification with XGBoost
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Amazon SageMaker
Fully managed
hosting with auto-
scaling
One-click
deployment
Pre-built
notebooks for
common
problems
Built-in, high-
performance
algorithms
One-click
training
Hyperparameter
optimization
Build Train Deploy
Resources
http://aws.amazon.com/free
https://ml.aws
https://aws.amazon.com/sagemaker
https://github.com/aws/sagemaker-python-sdk
https://github.com/aws/sagemaker-spark
https://github.com/awslabs/amazon-sagemaker-examples
https://medium.com/@julsimon
https://gitlab.com/juliensimon/dlnotebooks
Thank you!
Julien Simon
PrincipalTechnical Evangelist, AI and Machine Learning
@julsimon

An Introduction to Amazon SageMaker (October 2018)

  • 1.
    An Introduction to AmazonSageMaker Julien Simon Principal Technical Evangelist, AI and Machine Learning, AWS @julsimon October 2018
  • 2.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Put Machine Learning in the hands of every developer and data scientist Our mission
  • 3.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Application Services Platform Services Frameworks & Infrastructure API-driven services:Vision, Language & Speech Services, Chatbots AWS ML Stack Deploy machine learning models with high-performance machine learning algorithms, broad framework support, and one-click training, tuning, and inference. Develop sophisticated models with any framework, create managed, auto-scaling clusters of GPUs for large scale training, or run prediction on trained models.
  • 4.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Application Services Platform Services Frameworks & Infrastructure API-driven services:Vision, Language & Speech Services, Chatbots AWS ML Stack Deploy machine learning models with high-performance machine learning algorithms, broad framework support, and one-click training, tuning, and inference. Develop sophisticated models with any framework, create managed, auto-scaling clusters of GPUs for large scale training, or run prediction on trained models.
  • 5.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Data Visualization & Analysis Business Problem ML problem framing Data Collection Data Integration Data Preparation & Cleaning Feature Engineering Model Training & Parameter Tuning Model Evaluation Are Business Goals met? Model Deployment Monitoring & Debugging YesNo DataAugmentation Feature Augmentation The Machine Learning Process Re-training Predictions
  • 6.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. ML is still too complicated for everyday developers Collect and prepare training data Choose and optimize your ML algorithm Set up and manage environments for training Train and tune model (trial and error) Deploy model in production Scale and manage the production environment
  • 7.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Amazon SageMaker Pre-built notebooks for common problems K-MeansClustering Principal Component Analysis Neural TopicModelling FactorizationMachines Linear Learner XGBoost Latent Dirichlet Allocation ImageClassification Seq2Seq, And more! ALGORITHMS Apache MXNet, Chainer TensorFlow, PyTorch Caffe2, CNTK, Torch FRAMEWORKS Set up and manage environments for training Train and tune model (trial and error) Deploy model in production Scale and manage the production environment Built-in, high- performance algorithms Build
  • 8.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Amazon SageMaker Pre-built notebooks for common problems Built-in, high- performance algorithms One-click training Hyperparameter optimization Build Train Deploy model in production Scale and manage the production environment
  • 9.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Amazon SageMaker Fully managed hosting with auto- scaling One-click deployment Pre-built notebooks for common problems Built-in, high- performance algorithms One-click training Hyperparameter optimization Build Train Deploy
  • 10.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Selected Amazon SageMaker customers
  • 11.
  • 12.
    TheAmazon SageMakerAPI • PythonSDK orchestrating all Amazon SageMaker activity • High-level objects for algorithm selection, training, deploying, automatic model tuning, etc. • There’s also a Spark SDK (Python & Scala) which we won’t cover today. • AWS CLI: ‘aws sagemaker’ • AWS SDK: boto3, etc.
  • 13.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Model Training (on EC2) Model Hosting (on EC2) Trainingdata Modelartifacts Training code Helper code Helper codeInference code GroundTruth Client application Inference code Training code Inference requestInference response Inference Endpoint
  • 14.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Training code Factorization Machines Linear Learner PrincipalComponent Analysis K-Means Clustering XGBoost And more Built-inAlgorithms BringYour Own ContainerBringYour Own Script Model options
  • 15.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Amazon EC2 C5 / C5d instances AVX 512 72 vCPUs “Skylake” 144 GiB memory C5 12 Gbps to EBS 2X vCPUs 2X performance 3X throughput 2.4X memory C4 36 vCPUs “Haswell” 4 Gbps to EBS 60 GiB memory C5: N e x t G e n e r at ion Compu t e -Opt imize d I n st an ces wit h I n t e l® Xe on ® Sc alab le Pr oc e ssor AWS Compu t e opt imize d in st an c e s su ppor t t he n e w I n t e l® AV X -51 2 ad van ced in st r uct ion set , en ab lin g you t o mor e e f f ic ie n t ly r u n ve c t or pr oc e ssin g wor k load s wit h sin gle an d d ou b le f loat in g poin t pr e c ision , su c h as A I /mac hin e le ar n in g or vid e o pr oc e ssin g. 25% improvement in price/performance over C4
  • 16.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. FasterTensorFlow training on C5 https://aws.amazon.com/blogs/machine-learning/faster-training-with-optimized-tensorflow-1-6-on- amazon-ec2-c5-and-p3-instances/
  • 17.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Amazon EC2 P3 Instances • P3.2xlarge, P3.8xlarge, P3.16xlarge • Up to eight NVIDIATeslaV100 GPUs in a single instance • 40,960 CUDA cores, 5120Tensor cores • 128GB of GPU memory • 1 PetaFLOPs of computational performance – 14x better than P2 • 300 GB/s GPU-to-GPU communication (NVLink) – 9x better than P2 T h e f a s t e s t , m o s t p o w e r f u l G P U i n s t a n c e s i n t h e c l o u d
  • 18.
  • 19.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Scalable algorithms implemented by Amazon orange: supervised, yellow: unsupervised Linear Learner: regression, classification Image Classification: Deep Learning (ResNet) Factorization Machines: regression, classification, recommendation Object Detection: Deep Learning (VGG or ResNet) K-Nearest Neighbors: non-parametric regression and classification NeuralTopic Model: topic modeling XGBoost: regression, classification, ranking https://github.com/dmlc/xgboost Latent Dirichlet Allocation: topic modeling (mostly) K-Means: clustering BlazingText: GPU-based Word2Vec, and text classification Principal Component Analysis: reduction Sequence to Sequence: machine translation, speech to text and more Random Cut Forest: anomaly detection DeepAR: time-series forecasting (RNN)
  • 20.
    XGBoost • Very populartree-based algorithm for regression, classification and ranking • Handles missing values automatically • Works well on sparse data • Trains fast • “Easy” to inspect and control https://github.com/dmlc/xgboost https://xgboost.readthedocs.io/en/latest/ https://arxiv.org/abs/1603.02754
  • 21.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Demo: Customer classification with XGBoost
  • 22.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Blazing Text https://dl.acm.org/citation.cfm?id=3146354
  • 23.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Demo: Text Classification with BlazingText
  • 24.
  • 25.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Automatic Model Tuning Finding the optimal set of hyper parameters 1. Manual Search (”I know what I’m doing”) 2. Random Search (“Spray and pray”) 3. Grid Search (“X marks the spot”) • Typically training hundreds of models • Slow and expensive 4. HPO: use Machine Learning • Training fewer models • Gaussian Process Regression and Bayesian Optimization, https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-how-it-works.html
  • 26.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Demo: Customer classification with XGBoost
  • 27.
    © 2018, AmazonWeb Services, Inc. or Its Affiliates. All rights reserved. Amazon SageMaker Fully managed hosting with auto- scaling One-click deployment Pre-built notebooks for common problems Built-in, high- performance algorithms One-click training Hyperparameter optimization Build Train Deploy
  • 28.
  • 29.
    Thank you! Julien Simon PrincipalTechnicalEvangelist, AI and Machine Learning @julsimon