This session will introduce you the features of Amazon SageMaker, including a one-click training environment, highly-optimized machine learning algorithms with built-in model tuning, and deployment without engineering effort. With zero-setup required, Amazon SageMaker significantly decreases your training time and overall cost of building production machine learning systems. You'll also hear how and why Intuit is using Amazon SageMaker on AWS for real-time fraud detection.
2. Agenda
• Why did we build Amazon SageMaker?
• What is Amazon SageMaker?
• How do I get started using Amazon SageMaker?
• Amazon SageMaker customer use cases
• How Amazon SageMaker works with other AWS AI
Services
• Q&A
4. Data is part of the fabric of applications
Frontend and UX Mobile Backend
and operations
Data and
analytics
5. Three types of data-driven development
Retrospective
analysis and
reporting
Here-and-now
real-time processing
and dashboards
Inferences
to enable smart
applications
Amazon Kinesis
Amazon EC2
AWS Lambda
Amazon Redshift
Amazon RDS
Amazon S3
Amazon EMR
Amazon Deep Learning AMI
Amazon Machine Learning
Amazon SageMaker
6. Machine Learning Process is Hard…
Fetch data
Clean &
format data
Prepare &
transform
data
Train model
Evaluate
model
Integrate
with prod
Monitor /
debug /
refresh
Data wrangling
• Set up and manage
Notebook environments
• Get data to notebooks
securely
Experimentation
• Setup and manage
clusters
• Scale/distribute ML
algorithms
Deployment
• Setup and manage
inference clusters
• Manage and auto
scale inference
APIs
• Testing,
versioning, and
monitoring
7. Machine Learning Process is Hard…
Fetch data
Clean &
format data
Prepare &
transform
data
Train model
Evaluate
model
Integrate
with prod
Monitor /
debug /
refresh
Data wrangling
• Set up and manage
Notebook environments
• Get data to notebooks
securely
Experimentation
• Set up and manage
clusters
• Scale/distribute ML
algorithms
Deployment
• Setup and manage
inference clusters
• Manage and auto
scale inference
APIs
• Testing,
versioning, and
monitoring
8. Machine Learning Process is Hard…
Fetch data
Clean &
format data
Prepare &
transform
data
Train model
Evaluate
model
Integrate
with prod
Monitor /
debug /
refresh
Data wrangling
• Set up and manage
Notebook environments
• Get data to notebooks
securely
Experimentation
• Set up and manage
clusters
• Scale/distribute ML
algorithms
Deployment
• Set up and
manage inference
clusters
• Manage and auto
scale inference
APIs
• Testing,
versioning, and
monitoring
9. … and time consuming ...
Fetch data
Clean &
format data
Prepare &
transform
data
Train model
Evaluate
model
Integrate
with prod
Monitor /
debug /
refresh
6-18
months
10. … but full of potential
”Machine learning and AI is a horizontal enabling layer. It will empower
and improve every business, every government organization, every
philanthropy — basically there’s no institution in the world that cannot
be improved with machine learning…
We’re in a great position, because of the success of Amazon Web
Services, to be able to put energy into making those techniques easy
and accessible. ”
--Jeff Bezos
12. A managed service
that provides the quickest and easiest way for
your data scientists and developers to get
ML models from idea to production.
Amazon SageMaker
15. Resizable as you
need
Common tools
pre-installed
Easy access to
your data sources
No servers to
manage
1) Zero setup for data exploration
“Just add data”
16. Streaming
datasets, for
cheaper training
Train faster, in a
single pass
Greater reliability
on extremely
large datasets
Choice of several
ML algorithms
2) Algorithms designed for huge datasets
17. XGBoost, FM,
and Linear for
classification and
regression
Kmeans and PCA
for clustering and
dimensionality
reduction
Image
classification with
convolutional
neural networks
LDA and NTM for
topic modeling,
seq2seq for
translation
More than just general purpose algorithms
18. New: Time Series Forecasting with
DeepAR
Input
Network
Mean absolute
percentage error
P90 Loss
DeepAR R DeepAR R
traffic
Hourly occupancy rate of 963
bay area freeways
0.14 0.27 0.13 0.24
electricity
Electricity use of 370
homes over time
0.07 0.11 0.08 0.09
pageviews
Page view hits of
websites
10k 0.32 0.32 0.44 0.31
180k 0.32 0.34 0.29 NA
One hour on p2.xlarge, $1
19. Amazon-
optimized
algorithms using
the AWS SDK…
… or Apache
Spark SageMaker
Estimators
Bring your own
deep learning
script…
… or your custom
algorithm Docker
image
3) Distributed training that works with you
20. One step
deployment
Low latency, high
throughput, and
high reliability
A/B testing Use your own
model
4) Quickly deploy in production
21. Modular ar c hitec ture s o you c an us e w hat you need
Past
Data
Training
algorithm
Model
artifacts
Inference
code
Client
application
Model
Data
Inference
Ground
truth
Amazon SageMaker
22. ML compute by
the second
starting
at $0.0464/hr
ML storage by the
second
at $0.14
per GB-month
Data processed in
notebooks and
hosting
at $0.016 per GB
Free trial to get
started quickly
Pay as you go and inexpensive
23. How do you get started using Amazon
SageMaker?
31. Some AI/ML use cases at Intuit:
Customer Care and
Expert Advice
Fraud Detection and
Prevention
Smart Products
32. Designed to keep fraudsters out of systems
and data
Strive to stay several moves ahead of them by leveraging machine
learning-generated insights from data
Near real-time fraud detection in TurboTax:
• Account take-over detection
• Identity theft detection
33. Model Hosting
(Amazon SageMaker)
Near real-time fraud detection in AWS
using Amazon SageMaker
Calculate
Features
Reader
Cleanser
Processor
Data
Lookup
Training
Feature Store Model Training
(Amazon SageMaker)
Model
Client Service
34. Key benefits of Amazon SageMaker @ Intuit
Ad hoc setup and management
of notebook environments
Limited choices for model
deployment
Competing compute resources
across teams
Easy data exploration in
Amazon SageMaker notebooks
Building around virtualization
for flexibility
Auto-scalable model hosting
environment
From To
35. Amazon SageMaker: Launch Customer
“As the world’s leading provider of high-resolution Earth
imagery, data and analysis, DigitalGlobe works with enormous
amounts of data every day. DigitalGlobe is making it easier for
people to find, access, and run compute against our entire
100PB image library, which is stored in AWS’s cloud, to apply
deep learning to satellite imagery. We plan to use Amazon
SageMaker to train models against petabytes of Earth
observation imagery datasets using hosted Jupyter
notebooks, so DigitalGlobe's Geospatial Big Data Platform
(GBDX) users can just push a button, create a model, and
deploy it all within one scalable distributed environment at
scale.
”
- Dr. Walter Scott, CTO of Maxar Technologies and founder of
DigitalGlobe
36. Amazon SageMaker: Launch Customer
“We’re focused on making it faster and easier than ever to hire
and get hired, training our machine learning algorithms against
hundreds of millions of historical transactional activities in order
to deliver highly relevant job matches as quickly as possible.
Amazon SageMaker provided us with an answer to problems we
had with ML workflow management, allowing us to train,
evaluate and deploy models in a flexible way. In addition,
Amazon SageMaker's modularity provides the ability to build and
create models independently, which is a compelling feature for
ZipRecruiter.
”
- Avi Golan, VP of Engineering, ZipRecruiter
38. The Amazon machine learning stack
PLATFORM SERVICES
APPLICATION SERVICES
FRAMEWORKS & INTERFACES
Caffe2 CNTK
Apache
MXNet
PyTorch
TensorFlo
w
Torch Keras Gluon
AWS Deep Learning AMIs
Amazon SageMaker AWS DeepLens
Rekognition Transcribe Translate Polly Comprehend Lex
Amazon Mechanical Turk Amazon ML
39. Amazon EC2 P3 Instances
The fastest, most powerful GPU instances in the cloud
• Up to eight NVIDIA Tesla V100 GPUs
• 1 PetaFLOP of computational
performance – 14x better than P2
• 300 GB/s GPU-to-GPU communication
(NVLink) – 9X better than P2
• 16GB GPU memory with 900 GB/sec
peak GPU memory bandwidth