Introducing Amazon SageMaker

Introduction to Amazon SageMaker
Brent Rabowsky,
Solutions Architect
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Agenda
• Why did we build Amazon SageMaker?
• What is Amazon SageMaker?
• How do I get started using Amazon SageMaker?
• Amazon SageMaker customer use cases
• How Amazon SageMaker works with other AWS AI
Services
• Q&A

Why did we build Amazon SageMaker?

Data is part of the fabric of applications
Frontend and UX Mobile Backend
and operations
Data and
analytics

Three types of data-driven development
Retrospective
analysis and
reporting
Here-and-now
real-time processing
and dashboards
Inferences
to enable smart
applications
Amazon Kinesis
Amazon EC2
AWS Lambda
Amazon Redshift
Amazon RDS
Amazon S3
Amazon EMR
Amazon Deep Learning AMI
Amazon Machine Learning
Amazon SageMaker

Machine Learning Process is Hard…
Fetch data
Clean &
format data
Prepare &
transform
data
Train model
Evaluate
model
Integrate
with prod
Monitor /
debug /
refresh
Data wrangling
• Set up and manage
Notebook environments
• Get data to notebooks
securely
Experimentation
• Setup and manage
clusters
• Scale/distribute ML
algorithms
Deployment
inference clusters
• Manage and auto
scale inference
APIs
• Testing,
versioning, and
monitoring

Fetch data
Clean &
format data
Prepare &
transform
data
Train model
Evaluate
model
Integrate
with prod
Monitor /
debug /
refresh
Data wrangling
securely
Experimentation
clusters
algorithms
Deployment
inference clusters
• Manage and auto
scale inference
APIs
• Testing,
versioning, and
monitoring

Fetch data
Clean &
format data
Prepare &
transform
data
Train model
Evaluate
model
Integrate
with prod
Monitor /
debug /
refresh
Data wrangling
securely
Experimentation
clusters
algorithms
Deployment
• Set up and
manage inference
clusters
• Manage and auto
scale inference
APIs
• Testing,
versioning, and
monitoring

… and time consuming ...
Fetch data
Clean &
format data
Prepare &
transform
data
Train model
Evaluate
model
Integrate
with prod
Monitor /
debug /
refresh
6-18
months

… but full of potential
”Machine learning and AI is a horizontal enabling layer. It will empower
and improve every business, every government organization, every
philanthropy — basically there’s no institution in the world that cannot
be improved with machine learning…
We’re in a great position, because of the success of Amazon Web
Services, to be able to put energy into making those techniques easy
and accessible. ”
--Jeff Bezos

A managed service
that provides the quickest and easiest way for
your data scientists and developers to get
ML models from idea to production.
Amazon SageMaker

End-to-end
Machine Learning
Platform
Zero setup Flexible model
training
Pay by the
second
Introducing Amazon SageMaker

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon SageMaker’s Four Components:
1 2 3 4
I I I I
Notebook Instances Algorithms ML Training Service ML Hosting Service

Resizable as you
need
Common tools
pre-installed
Easy access to
your data sources
No servers to
manage
1) Zero setup for data exploration
“Just add data”

Streaming
datasets, for
cheaper training
Train faster, in a
single pass
Greater reliability
on extremely
large datasets
Choice of several
ML algorithms
2) Algorithms designed for huge datasets

XGBoost, FM,
and Linear for
classification and
regression
Kmeans and PCA
for clustering and
dimensionality
reduction
Image
classification with
convolutional
neural networks
LDA and NTM for
topic modeling,
seq2seq for
translation
More than just general purpose algorithms

New: Time Series Forecasting with
DeepAR
Input
Network
Mean absolute
percentage error
P90 Loss
DeepAR R DeepAR R
traffic
Hourly occupancy rate of 963
bay area freeways
0.14 0.27 0.13 0.24
electricity
Electricity use of 370
homes over time
0.07 0.11 0.08 0.09
pageviews
Page view hits of
websites
10k 0.32 0.32 0.44 0.31
180k 0.32 0.34 0.29 NA
One hour on p2.xlarge, $1

Amazon-
optimized
algorithms using
the AWS SDK…
… or Apache
Spark SageMaker
Estimators
Bring your own
deep learning
script…
… or your custom
algorithm Docker
image
3) Distributed training that works with you

One step
deployment
Low latency, high
throughput, and
high reliability
A/B testing Use your own
model
4) Quickly deploy in production

Modular ar c hitec ture s o you c an us e w hat you need
Past
Data
Training
algorithm
Model
artifacts
Inference
code
Client
application
Model
Data
Inference
Ground
truth
Amazon SageMaker

ML compute by
the second
starting
at $0.0464/hr
ML storage by the
second
at $0.14
per GB-month
Data processed in
notebooks and
hosting
at $0.016 per GB
Free trial to get
started quickly
Pay as you go and inexpensive

How do you get started using Amazon
SageMaker?

Modify to access your data sources

Amazon SageMaker: Launch Customer

Some AI/ML use cases at Intuit:
Customer Care and
Expert Advice
Fraud Detection and
Prevention
Smart Products

Designed to keep fraudsters out of systems
and data
Strive to stay several moves ahead of them by leveraging machine
learning-generated insights from data
Near real-time fraud detection in TurboTax:
• Account take-over detection
• Identity theft detection

Model Hosting
(Amazon SageMaker)
Near real-time fraud detection in AWS
using Amazon SageMaker
Calculate
Features
Reader
Cleanser
Processor
Data
Lookup
Training
Feature Store Model Training
(Amazon SageMaker)
Model
Client Service

Key benefits of Amazon SageMaker @ Intuit
Ad hoc setup and management
of notebook environments
Limited choices for model
deployment
Competing compute resources
across teams
Easy data exploration in
Amazon SageMaker notebooks
Building around virtualization
for flexibility
Auto-scalable model hosting
environment
From To

“As the world’s leading provider of high-resolution Earth
imagery, data and analysis, DigitalGlobe works with enormous
amounts of data every day. DigitalGlobe is making it easier for
people to find, access, and run compute against our entire
100PB image library, which is stored in AWS’s cloud, to apply
deep learning to satellite imagery. We plan to use Amazon
SageMaker to train models against petabytes of Earth
observation imagery datasets using hosted Jupyter
notebooks, so DigitalGlobe's Geospatial Big Data Platform
(GBDX) users can just push a button, create a model, and
deploy it all within one scalable distributed environment at
scale.
”
- Dr. Walter Scott, CTO of Maxar Technologies and founder of
DigitalGlobe

“We’re focused on making it faster and easier than ever to hire
and get hired, training our machine learning algorithms against
hundreds of millions of historical transactional activities in order
to deliver highly relevant job matches as quickly as possible.
Amazon SageMaker provided us with an answer to problems we
had with ML workflow management, allowing us to train,
evaluate and deploy models in a flexible way. In addition,
Amazon SageMaker's modularity provides the ability to build and
create models independently, which is a compelling feature for
ZipRecruiter.
”
- Avi Golan, VP of Engineering, ZipRecruiter

How Amazon SageMaker works with other
AWS AI Services

The Amazon machine learning stack
PLATFORM SERVICES
APPLICATION SERVICES
FRAMEWORKS & INTERFACES
Caffe2 CNTK
Apache
MXNet
PyTorch
TensorFlo
w
Torch Keras Gluon
AWS Deep Learning AMIs
Amazon SageMaker AWS DeepLens
Rekognition Transcribe Translate Polly Comprehend Lex
Amazon Mechanical Turk Amazon ML

Amazon EC2 P3 Instances
The fastest, most powerful GPU instances in the cloud
• Up to eight NVIDIA Tesla V100 GPUs
• 1 PetaFLOP of computational
performance – 14x better than P2
• 300 GB/s GPU-to-GPU communication
(NVLink) – 9X better than P2
• 16GB GPU memory with 900 GB/sec
peak GPU memory bandwidth

Q&A
(at the Ask an Architect bar)

Introducing Amazon SageMaker

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introducing Amazon SageMaker

Similar to Introducing Amazon SageMaker (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Introducing Amazon SageMaker