© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Developing Large Scale Machine Learning Algorithms
on Amazon SageMaker
Amir Sadoughi
Senior Software Engineer

Amazon AI Labs
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
M L F R A M E W O R K S &
I N F R A S T R U C T U R E
The Amazon ML stack: Broadest & deepest set of capabilities
A I S E R V I C E S
R E K O G N I T I O N
I M A G E
P O L L Y T R A N S C R I B E T R A N S L A T E L E XR E K O G N I T I O N
V I D E O
Vision Speech Chatbots
M L S E R V I C E S
F r a m e w o r k s I n t e r f a c e s I n f r a s t r u c t u r e
E C 2 P 3 

& P 3 d n
E C 2 C 5 F P G A s G R E E N G R A S S
E L A S T I C 

I N F E R E N C E
Language Forecasting Recommendations
T E X T R A C T
New
C O M P R E H E N D &
C O M P R E H E N D
M E D I C A L New
NewNew
F O R E C A S T P E R S O N A L I Z E
A M A Z O N
S A G E M A K E R
G R O U N D T R U T H
New
N O T E B O O K S
A W S M A R K E T P L A C E
New
A L G O R I T H M S
R E I N F O R C E M E N T 

L E A R N I N G
New
T R A I N I N G
O P T I M I Z A T I O N
( N E O ) New
D E P L O Y M E N T
H O S T I N G
New
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Machine Learning
Amazon SageMaker
Amazon SageMaker provides every
developer and data scientist with the ability
to build, train, and deploy machine learning
models quickly.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
M L F R A M E W O R K S &
I N F R A S T R U C T U R E
The Amazon ML stack: Broadest & deepest set of capabilities
A I S E R V I C E S
R E K O G N I T I O N
I M A G E
P O L L Y T R A N S C R I B E T R A N S L A T E L E XR E K O G N I T I O N
V I D E O
Vision Speech Chatbots
M L S E R V I C E S
F r a m e w o r k s I n t e r f a c e s I n f r a s t r u c t u r e
E C 2 P 3 

& P 3 d n
E C 2 C 5 F P G A s G R E E N G R A S S
E L A S T I C 

I N F E R E N C E
Language Forecasting Recommendations
T E X T R A C T
New
C O M P R E H E N D &
C O M P R E H E N D
M E D I C A L New
NewNew
F O R E C A S T P E R S O N A L I Z E
A M A Z O N
S A G E M A K E R
G R O U N D T R U T H
New
N O T E B O O K S
A W S M A R K E T P L A C E
New
A L G O R I T H M S
R E I N F O R C E M E N T 

L E A R N I N G
New
T R A I N I N G
O P T I M I Z A T I O N
( N E O ) New
D E P L O Y M E N T
H O S T I N G
New
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon SageMaker: Algorithms
• Built-in algorithms
• NLP
• Computer Vision
• Supervised
• Unsupervised
• AWS Marketplace for Machine Learning
• Bring Your Own Algorithm
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Algorithm
Development
Lifecycle
Interface design
System design
Testing
Communications
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Interface
Design
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
System design
• Storage
• Compute
• Network
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
System design: Storage
• Tiers: Amazon S3, Amazon EBS, GPU mem., CPU mem., CPU cache
• Access patterns
• Capacities: Throughput, Latency
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
System design: Storage
File mode
• easy to implement
• faster for many epochs
• initial download time
• increased size for data disk space
• maxes out at 16 TB
Pipe mode
• harder to implement
• faster for single pass
• downloads each epoch
• sizing only for model disk space
• no limit on length
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
System design: Compute
• CPU
• GPU
• Multi-GPU
• Elastic inference
• Mobile
• IoT
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
System design: Network
• Training: Single machine or distributed across many machines
• Inference: number of concurrent requests, size of payload
• Throughput
• Latency
• Jitter
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Testing: traditional testing
• Unit tests
• Functional tests
• Integration tests
• Load testing
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. !19
Testing: traditional testing
• Unit tests
• Functional tests
• Integration tests
• Load testing
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Testing: benchmarking
• Measure time, cost, accuracy
• DAWNBench
• Training: end-to-end throughput
• Inference: end-to-end latency
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
System design: performance optimizations
• Beware of the tradeoffs
• Training
• Low or mixed precision
• Increase batch size
• Optimize communication between workers
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
System design: performance optimizations
• Inference
• Caching
• Queueing
• Low precision
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
System design: EMA example
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
CreateAlgorithm: EMA example
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Marketplace for Machine Learning
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Training data
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Training data: train/test split
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Hyperparameter tuning job: EMA example
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Hyperparameter tuning job: EMA example
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Hyperparameter tuning job: EMA example
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
M L F R A M E W O R K S &
I N F R A S T R U C T U R E
The Amazon ML stack: Broadest & deepest set of capabilities
A I S E R V I C E S
R E K O G N I T I O N
I M A G E
P O L L Y T R A N S C R I B E T R A N S L A T E L E XR E K O G N I T I O N
V I D E O
Vision Speech Chatbots
M L S E R V I C E S
F r a m e w o r k s I n t e r f a c e s I n f r a s t r u c t u r e
E C 2 P 3 

& P 3 d n
E C 2 C 5 F P G A s G R E E N G R A S S
E L A S T I C 

I N F E R E N C E
Language Forecasting Recommendations
T E X T R A C T
New
C O M P R E H E N D &
C O M P R E H E N D
M E D I C A L New
NewNew
F O R E C A S T P E R S O N A L I Z E
A M A Z O N
S A G E M A K E R
G R O U N D T R U T H
New
N O T E B O O K S
A W S M A R K E T P L A C E
New
A L G O R I T H M S
R E I N F O R C E M E N T 

L E A R N I N G
New
T R A I N I N G
O P T I M I Z A T I O N
( N E O ) New
D E P L O Y M E N T
H O S T I N G
New
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
!33
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. !34
Resources
• SageMaker Product Page
• SageMaker Console
• Ground Truth Product Page
• Neo Product Page
• SageMaker RL Documentation
• SageMaker 10-Minute Tutorial
• SageMaker Related Blogs
• Ground Truth Webinar (Dec 2018)

Amir sadoughi developing large-scale machine learning algorithms on amazon sage maker

  • 1.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Developing Large Scale Machine Learning Algorithms on Amazon SageMaker Amir Sadoughi Senior Software Engineer
 Amazon AI Labs
  • 2.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark M L F R A M E W O R K S & I N F R A S T R U C T U R E The Amazon ML stack: Broadest & deepest set of capabilities A I S E R V I C E S R E K O G N I T I O N I M A G E P O L L Y T R A N S C R I B E T R A N S L A T E L E XR E K O G N I T I O N V I D E O Vision Speech Chatbots M L S E R V I C E S F r a m e w o r k s I n t e r f a c e s I n f r a s t r u c t u r e E C 2 P 3 
 & P 3 d n E C 2 C 5 F P G A s G R E E N G R A S S E L A S T I C 
 I N F E R E N C E Language Forecasting Recommendations T E X T R A C T New C O M P R E H E N D & C O M P R E H E N D M E D I C A L New NewNew F O R E C A S T P E R S O N A L I Z E A M A Z O N S A G E M A K E R G R O U N D T R U T H New N O T E B O O K S A W S M A R K E T P L A C E New A L G O R I T H M S R E I N F O R C E M E N T 
 L E A R N I N G New T R A I N I N G O P T I M I Z A T I O N ( N E O ) New D E P L O Y M E N T H O S T I N G New
  • 3.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Machine Learning Amazon SageMaker Amazon SageMaker provides every developer and data scientist with the ability to build, train, and deploy machine learning models quickly.
  • 4.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark M L F R A M E W O R K S & I N F R A S T R U C T U R E The Amazon ML stack: Broadest & deepest set of capabilities A I S E R V I C E S R E K O G N I T I O N I M A G E P O L L Y T R A N S C R I B E T R A N S L A T E L E XR E K O G N I T I O N V I D E O Vision Speech Chatbots M L S E R V I C E S F r a m e w o r k s I n t e r f a c e s I n f r a s t r u c t u r e E C 2 P 3 
 & P 3 d n E C 2 C 5 F P G A s G R E E N G R A S S E L A S T I C 
 I N F E R E N C E Language Forecasting Recommendations T E X T R A C T New C O M P R E H E N D & C O M P R E H E N D M E D I C A L New NewNew F O R E C A S T P E R S O N A L I Z E A M A Z O N S A G E M A K E R G R O U N D T R U T H New N O T E B O O K S A W S M A R K E T P L A C E New A L G O R I T H M S R E I N F O R C E M E N T 
 L E A R N I N G New T R A I N I N G O P T I M I Z A T I O N ( N E O ) New D E P L O Y M E N T H O S T I N G New
  • 5.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Amazon SageMaker: Algorithms • Built-in algorithms • NLP • Computer Vision • Supervised • Unsupervised • AWS Marketplace for Machine Learning • Bring Your Own Algorithm
  • 6.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Algorithm Development Lifecycle Interface design System design Testing Communications
  • 7.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Interface Design
  • 8.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.
  • 9.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.
  • 10.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.
  • 11.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.
  • 12.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.
  • 13.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. System design • Storage • Compute • Network
  • 14.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. System design: Storage • Tiers: Amazon S3, Amazon EBS, GPU mem., CPU mem., CPU cache • Access patterns • Capacities: Throughput, Latency
  • 15.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. System design: Storage File mode • easy to implement • faster for many epochs • initial download time • increased size for data disk space • maxes out at 16 TB Pipe mode • harder to implement • faster for single pass • downloads each epoch • sizing only for model disk space • no limit on length
  • 16.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. System design: Compute • CPU • GPU • Multi-GPU • Elastic inference • Mobile • IoT
  • 17.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. System design: Network • Training: Single machine or distributed across many machines • Inference: number of concurrent requests, size of payload • Throughput • Latency • Jitter
  • 18.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Testing: traditional testing • Unit tests • Functional tests • Integration tests • Load testing
  • 19.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. !19 Testing: traditional testing • Unit tests • Functional tests • Integration tests • Load testing
  • 20.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Testing: benchmarking • Measure time, cost, accuracy • DAWNBench • Training: end-to-end throughput • Inference: end-to-end latency
  • 21.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.
  • 22.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. System design: performance optimizations • Beware of the tradeoffs • Training • Low or mixed precision • Increase batch size • Optimize communication between workers
  • 23.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. System design: performance optimizations • Inference • Caching • Queueing • Low precision
  • 24.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. System design: EMA example
  • 25.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. CreateAlgorithm: EMA example
  • 26.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS Marketplace for Machine Learning
  • 27.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Training data
  • 28.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Training data: train/test split
  • 29.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Hyperparameter tuning job: EMA example
  • 30.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Hyperparameter tuning job: EMA example
  • 31.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Hyperparameter tuning job: EMA example
  • 32.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark M L F R A M E W O R K S & I N F R A S T R U C T U R E The Amazon ML stack: Broadest & deepest set of capabilities A I S E R V I C E S R E K O G N I T I O N I M A G E P O L L Y T R A N S C R I B E T R A N S L A T E L E XR E K O G N I T I O N V I D E O Vision Speech Chatbots M L S E R V I C E S F r a m e w o r k s I n t e r f a c e s I n f r a s t r u c t u r e E C 2 P 3 
 & P 3 d n E C 2 C 5 F P G A s G R E E N G R A S S E L A S T I C 
 I N F E R E N C E Language Forecasting Recommendations T E X T R A C T New C O M P R E H E N D & C O M P R E H E N D M E D I C A L New NewNew F O R E C A S T P E R S O N A L I Z E A M A Z O N S A G E M A K E R G R O U N D T R U T H New N O T E B O O K S A W S M A R K E T P L A C E New A L G O R I T H M S R E I N F O R C E M E N T 
 L E A R N I N G New T R A I N I N G O P T I M I Z A T I O N ( N E O ) New D E P L O Y M E N T H O S T I N G New
  • 33.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Thank you! !33
  • 34.
    © 2019, AmazonWeb Services, Inc. or its affiliates. All rights reserved. !34 Resources • SageMaker Product Page • SageMaker Console • Ground Truth Product Page • Neo Product Page • SageMaker RL Documentation • SageMaker 10-Minute Tutorial • SageMaker Related Blogs • Ground Truth Webinar (Dec 2018)