SlideShare a Scribd company logo
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Scale Machine Learning from zero to
millions of users
Julien Simon
GlobalTechnical Evangelist, AI & Machine Learning, AWS
@julsimon
Rationale
How to train ML models and deploy them in production, from
humble beginnings to world domination
Try to take reasonable and justified steps
Longer, more opinionated version: https://medium.com/@julsimon/scaling-machine-
learning-from-0-to-millions-of-users-part-1-a2d36a5e849
And so itbegins
• You’ve trained a model on a local machine, using a popular open source library.
• You’ve measured the model’s accuracy, and things look good.
• Now you’d like to deploy it to check its actual behaviour, to run A/B tests, etc.
• You’ve embedded the model in your business application.
• You’ve deployed everything to a single virtual machine in the cloud.
• Everything works, you’re serving predictions, life is good!
Score card
Single EC2 instance
Infrastructure effort C’mon, it’s just one instance
ML setup effort pip install tensorflow
CI/CD integration Not needed
Build models DIY
Train models python train.py
Deploy models (at scale) python predict.py
Scale/HA inference Not needed
Optimize costs Not needed
Security Not needed
A fewinstancesand models later…
• Life is not that good
• Too much manual work
• Time-consuming and error-prone
• Dependency hell
• No cost optimization
• Monolithic architecture
• Deployment hell
• Multiple apps can’t share the same model
• Apps and models scale differently
Use AWS-maintained tools
• Deep Learning Amazon Machine Images
• Deep Learning containers
Dockerize
Create a prediction service
• Model servers
• Bespoke API (Flask?)
AWS Deep LearningAMIs andContainers
Optimized environments on Amazon Linux or Ubuntu
Conda AMI
For developers who want pre-
installed pip packages of DL
frameworks in separate virtual
environments.
Base AMI
For developers who want a clean
slate to set up private DL engine
repositories or custom builds of DL
engines.
Containers
For developers who want pre-
installed containers for DL
frameworks (TensorFlow, PyTorch,
Apache MXNet)
Scaling alert!
• More customers, more team members, more models, woohoo!
• Scalability, high availability & security are now a thing
• Scaling up is a losing proposition.You need to scale out
• Only automation can save you:
IaC, CI/CD and all that good DevOps stuff
• What are your options?
Option 1:virtualmachines
• Definitely possible, but:
• Why? Seriously, I want to know.
• Operational and financial issues await if you don’t automate extensively
• Training
• Build on-demand clusters with CloudFormation,Terraform, etc.
• Distributed training is a pain to set up
• Prediction
• Automate deployement with CI/CD
• Scale with Auto Scaling, Load Balancers, etc.
• Spot, spot, spot
Score card
More EC2 instances
Infrastructure effort Lots
ML setup effort Some (DLAMI)
CI/CD integration No change
Build models DIY
Train models DIY
Deploy models DIY (model servers)
Scale/HA inference DIY (Auto Scaling, LB)
Optimize costs DIY (Spot, automation)
Security DIY (IAM,VPC, KMS)
Option 2:Docker clusters
• This makes a lot of sense if you’re already deploying apps to Docker
• No change to the dev experience: same workflows, same CI/CD, etc.
• Deploy prediction services on the same infrastructure as business apps.
• Amazon ECS and Amazon EKS
• Lots of flexibility: mixed instance types (including GPUs), placement constraints, etc.
• Both come with AWS-maintainedAMIs that will save you time
• One cluster or many clusters ?
• Build on-demand development and test clusters with CloudFormation,Terraform, etc.
• Many customers find that running a large single production cluster works better
• Still instance-based and not fully-managed
• Not a hands-off operation: services / pods, service discovery, etc. are nice but you still have work to do
• And yes, this matters even if « someone else is taking care of clusters »
Score card
EC2 ECS / EKS
Infrastructure effort Lots Some (Docker tools)
ML setup effort Some (DLAMI) Some (DL containers)
CI/CD integration No change No change
Build models DIY DIY
Train models (at scale) DIY DIY (Docker tools)
Deploy models (at scale) DIY (model servers) DIY (Docker tools)
Scale/HA inference DIY (Auto Scaling, LB) DIY (Services, pods, etc.)
Optimize costs DIY (Spot, RIs, automation) DIY (Spot, RIs, automation)
Security DIY (IAM,VPC, KMS) DIY (IAM,VPC, KMS)
Option 3:go fullymanaged withAmazonSageMaker
1
2
3
Model options on Amazon SageMaker
Training code
Factorization Machines
Linear Learner
Principal Component Analysis
K-Means Clustering
Image classification
And more
Built-in Algorithms (17)
No ML coding required
No infrastructure work required
Distributed training
Pipe mode
BringYour Own Container
Full control, run anything!
R, C++, etc.
No infrastructure work required
Built-in Frameworks
Bring your own code: script mode
Open source containers
No infrastructure work required
Distributed training
Pipe mode
TheAmazonSageMakerAPI
• Python SDK orchestrating all Amazon SageMaker activity
• High-level objects for algorithm selection, training, deploying,
automatic model tuning, etc.
https://github.com/aws/sagemaker-python-sdk
• Spark SDK (Python & Scala)
https://github.com/aws/sagemaker-spark/tree/master/sagemaker-spark-sdk
• AWS SDK
• For scripting and automation
• CLI : ‘aws sagemaker’
• Language SDKs: boto3, etc.
Training and deploying
tf_estimator = TensorFlow(entry_point='mnist_keras_tf.py',
role=role,
train_instance_count=1,
train_instance_type='ml.c5.2xlarge’,
framework_version='1.12',
py_version='py3',
script_mode=True,
hyperparameters={
'epochs': 10,
'learning-rate': 0.01})
tf_estimator.fit(data)
# HTTPS endpoint backed by a single instance
tf_endpoint = tf_estimator.deploy(initial_instance_count=1, instance_type=ml.t3.xlarge)
tf_endpoint.predict(…)
Training and deploying, atany scale
tf_estimator = TensorFlow(entry_point=’my_crazy_cnn.py',
role=role,
train_instance_count=8,
train_instance_type='ml.p3.16xlarge', # Total of 64 GPUs
framework_version='1.12',
py_version='py3',
script_mode=True,
hyperparameters={
'epochs': 200,
'learning-rate': 0.01})
tf_estimator.fit(data)
# HTTPS endpoint backed by 16 multi-AZ load-balanced instances
tf_endpoint = tf_estimator.deploy(initial_instance_count=16, instance_type=ml.p3.2xlarge)
tf_endpoint.predict(…)
Score card
EC2 ECS / EKS SageMaker
Infrastructure effort Maximal Some (Docker tools) None
ML setup effort Some (DLAMI) Some (DL containers) Minimal
CI/CD integration No change No change Some (SDK, Step Functions)
Build models DIY DIY 17 built-in algorithms
Train models (at scale) DIY DIY (Docker tools) SDK: 2 LOCs
Deploy models (at scale) DIY (model servers) DIY (Docker tools) SDK: 1 LOCs
Kubernetes support
Scale/HA inference DIY (Auto Scaling, LB) DIY (Services, pods, etc.) Built-in
Optimize costs DIY (Spot, RIs, automation) DIY (Spot, RIs, automation) On-demand/Spot training,
Auto Scaling for inference
Security DIY (IAM,VPC, KMS) DIY (IAM,VPC, KMS) API parameters
Score card
Flamewarin3,2,1…
EC2 ECS / EKS SageMaker
Infrastructure effort Maximal Some (Docker tools) None
ML setup effort Some (DLAMI) Some (DL containers) Minimal
CI/CD integration No change No change Some (SDK, Step Functions)
Build models DIY DIY 17 built-in algorithms
Train models (at scale) DIY DIY (Docker tools) 2 LOCs
Deploy models (at scale) DIY (model servers) DIY (Docker tools) SDK: 1 LOCs
Kubernetes support
Scale/HA inference DIY (Auto Scaling, LB) DIY (Services, pods, etc.) Built-in
Optimize costs DIY (Spot, RIs, automation) DIY (Spot, RIs, automation) Spot training,
Auto Scaling for inference
Security DIY (IAM,VPC, KMS) DIY (IAM,VPC, KMS) API parameters
Personal opinion Small scale only, unless you have
strong DevOps skills and enjoy
exercising them.
Reasonable choice if you’re a Docker
Docker shop, and if you’re able and
and willing to integrate with the
Docker/OSS ecosystem. If not, I’d
think twice: Docker isn’t an ML
platform.
Learn it in a few hours, forget
about servers, focus 100% on
ML, enjoy goodies like pipe
mode, distributed training, HPO,
HPO, inference pipelines and
more.
Conclusion
• Whatever works for you at this time is fine
• Don’t over-engineer, and don’t « plan for the future »
• Optimize for current business conditions, pay attention toTCO
• Models and data matter, not infrastructure
• When conditions change, move fast: smash and rebuild
• ... which is what cloud is all about!
• « 100% of our time spent on ML » shall be the whole of the Law
• Mix and match if it makes sense
• Train on SageMaker, deploy on ECS/EKS… or vice versa
• Write your own story!
Getting started
https://aws.amazon.com/machine-learning/amis/
https://aws.amazon.com/machine-learning/containers/
https://aws.amazon.com/sagemaker
https://github.com/aws/sagemaker-python-sdk
https://github.com/awslabs/amazon-sagemaker-examples
https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_operators_for_kubernetes.html
https://medium.com/@julsimon
https://youtube.com/juliensimonfr
https://gitlab.com/juliensimon/dlcontainers DL AMI / container demos
https://gitlab.com/juliensimon/dlnotebooks SageMaker notebooks
Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Julien Simon
GlobalTechnical Evangelist, AI & Machine Learning, AWS
@julsimon

More Related Content

What's hot

Scaling Machine Learning from zero to millions of users (May 2019)
Scaling Machine Learning from zero to millions of users (May 2019)Scaling Machine Learning from zero to millions of users (May 2019)
Scaling Machine Learning from zero to millions of users (May 2019)
Julien SIMON
 
Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)
Julien SIMON
 
A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)
Julien SIMON
 
Build, train and deploy Machine Learning models on Amazon SageMaker (May 2019)
Build, train and deploy Machine Learning models on Amazon SageMaker (May 2019)Build, train and deploy Machine Learning models on Amazon SageMaker (May 2019)
Build, train and deploy Machine Learning models on Amazon SageMaker (May 2019)
Julien SIMON
 
Build, train, and deploy ML models at scale.pdf
Build, train, and deploy ML models at scale.pdfBuild, train, and deploy ML models at scale.pdf
Build, train, and deploy ML models at scale.pdf
Amazon Web Services
 
Become a Machine Learning developer with AWS services (May 2019)
Become a Machine Learning developer with AWS services (May 2019)Become a Machine Learning developer with AWS services (May 2019)
Become a Machine Learning developer with AWS services (May 2019)
Julien SIMON
 
The Future of AI (September 2019)
The Future of AI (September 2019)The Future of AI (September 2019)
The Future of AI (September 2019)
Julien SIMON
 
Integrating Deep Learning into your Enterprise
Integrating Deep Learning into your EnterpriseIntegrating Deep Learning into your Enterprise
Integrating Deep Learning into your Enterprise
Amazon Web Services
 
Quantum Computing with Amazon Braket
Quantum Computing with Amazon BraketQuantum Computing with Amazon Braket
Quantum Computing with Amazon BraketAmazon Web Services
 
Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)
Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)
Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)
Julien SIMON
 
Machine Learning: From Notebook to Production with Amazon Sagemaker
Machine Learning: From Notebook to Production with Amazon SagemakerMachine Learning: From Notebook to Production with Amazon Sagemaker
Machine Learning: From Notebook to Production with Amazon Sagemaker
Amazon Web Services
 
AWS Machine Learning Week SF: End to End Model Development Using SageMaker
AWS Machine Learning Week SF: End to End Model Development Using SageMakerAWS Machine Learning Week SF: End to End Model Development Using SageMaker
AWS Machine Learning Week SF: End to End Model Development Using SageMaker
Amazon Web Services
 
Machine Learning on AWS
Machine Learning on AWSMachine Learning on AWS
Machine Learning on AWS
Stefan Bergstein
 
End-to-End Machine Learning with Amazon SageMaker
End-to-End Machine Learning with Amazon SageMakerEnd-to-End Machine Learning with Amazon SageMaker
End-to-End Machine Learning with Amazon SageMaker
Sungmin Kim
 
Demystifying Amazon Sagemaker (ACD Kochi)
Demystifying Amazon Sagemaker (ACD Kochi)Demystifying Amazon Sagemaker (ACD Kochi)
Demystifying Amazon Sagemaker (ACD Kochi)
AWS User Group Pune
 
Optimize your Machine Learning workloads (April 2019)
Optimize your Machine Learning workloads (April 2019)Optimize your Machine Learning workloads (April 2019)
Optimize your Machine Learning workloads (April 2019)
Julien SIMON
 
AWS ML and SparkML on EMR to Build Recommendation Engine
AWS ML and SparkML on EMR to Build Recommendation Engine AWS ML and SparkML on EMR to Build Recommendation Engine
AWS ML and SparkML on EMR to Build Recommendation Engine
Amazon Web Services
 
AWS re:Invent 2016: Transforming Industrial Processes with Deep Learning (MAC...
AWS re:Invent 2016: Transforming Industrial Processes with Deep Learning (MAC...AWS re:Invent 2016: Transforming Industrial Processes with Deep Learning (MAC...
AWS re:Invent 2016: Transforming Industrial Processes with Deep Learning (MAC...
Amazon Web Services
 
Costruisci modelli di Machine Learning con Amazon SageMaker Autopilot
Costruisci modelli di Machine Learning con Amazon SageMaker AutopilotCostruisci modelli di Machine Learning con Amazon SageMaker Autopilot
Costruisci modelli di Machine Learning con Amazon SageMaker Autopilot
Amazon Web Services
 
Build Text Analytics Solutions with Amazon Comprehend and Amazon Translate
Build Text Analytics Solutions with Amazon Comprehend and Amazon TranslateBuild Text Analytics Solutions with Amazon Comprehend and Amazon Translate
Build Text Analytics Solutions with Amazon Comprehend and Amazon Translate
Amazon Web Services
 

What's hot (20)

Scaling Machine Learning from zero to millions of users (May 2019)
Scaling Machine Learning from zero to millions of users (May 2019)Scaling Machine Learning from zero to millions of users (May 2019)
Scaling Machine Learning from zero to millions of users (May 2019)
 
Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)
 
A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)
 
Build, train and deploy Machine Learning models on Amazon SageMaker (May 2019)
Build, train and deploy Machine Learning models on Amazon SageMaker (May 2019)Build, train and deploy Machine Learning models on Amazon SageMaker (May 2019)
Build, train and deploy Machine Learning models on Amazon SageMaker (May 2019)
 
Build, train, and deploy ML models at scale.pdf
Build, train, and deploy ML models at scale.pdfBuild, train, and deploy ML models at scale.pdf
Build, train, and deploy ML models at scale.pdf
 
Become a Machine Learning developer with AWS services (May 2019)
Become a Machine Learning developer with AWS services (May 2019)Become a Machine Learning developer with AWS services (May 2019)
Become a Machine Learning developer with AWS services (May 2019)
 
The Future of AI (September 2019)
The Future of AI (September 2019)The Future of AI (September 2019)
The Future of AI (September 2019)
 
Integrating Deep Learning into your Enterprise
Integrating Deep Learning into your EnterpriseIntegrating Deep Learning into your Enterprise
Integrating Deep Learning into your Enterprise
 
Quantum Computing with Amazon Braket
Quantum Computing with Amazon BraketQuantum Computing with Amazon Braket
Quantum Computing with Amazon Braket
 
Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)
Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)
Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)
 
Machine Learning: From Notebook to Production with Amazon Sagemaker
Machine Learning: From Notebook to Production with Amazon SagemakerMachine Learning: From Notebook to Production with Amazon Sagemaker
Machine Learning: From Notebook to Production with Amazon Sagemaker
 
AWS Machine Learning Week SF: End to End Model Development Using SageMaker
AWS Machine Learning Week SF: End to End Model Development Using SageMakerAWS Machine Learning Week SF: End to End Model Development Using SageMaker
AWS Machine Learning Week SF: End to End Model Development Using SageMaker
 
Machine Learning on AWS
Machine Learning on AWSMachine Learning on AWS
Machine Learning on AWS
 
End-to-End Machine Learning with Amazon SageMaker
End-to-End Machine Learning with Amazon SageMakerEnd-to-End Machine Learning with Amazon SageMaker
End-to-End Machine Learning with Amazon SageMaker
 
Demystifying Amazon Sagemaker (ACD Kochi)
Demystifying Amazon Sagemaker (ACD Kochi)Demystifying Amazon Sagemaker (ACD Kochi)
Demystifying Amazon Sagemaker (ACD Kochi)
 
Optimize your Machine Learning workloads (April 2019)
Optimize your Machine Learning workloads (April 2019)Optimize your Machine Learning workloads (April 2019)
Optimize your Machine Learning workloads (April 2019)
 
AWS ML and SparkML on EMR to Build Recommendation Engine
AWS ML and SparkML on EMR to Build Recommendation Engine AWS ML and SparkML on EMR to Build Recommendation Engine
AWS ML and SparkML on EMR to Build Recommendation Engine
 
AWS re:Invent 2016: Transforming Industrial Processes with Deep Learning (MAC...
AWS re:Invent 2016: Transforming Industrial Processes with Deep Learning (MAC...AWS re:Invent 2016: Transforming Industrial Processes with Deep Learning (MAC...
AWS re:Invent 2016: Transforming Industrial Processes with Deep Learning (MAC...
 
Costruisci modelli di Machine Learning con Amazon SageMaker Autopilot
Costruisci modelli di Machine Learning con Amazon SageMaker AutopilotCostruisci modelli di Machine Learning con Amazon SageMaker Autopilot
Costruisci modelli di Machine Learning con Amazon SageMaker Autopilot
 
Build Text Analytics Solutions with Amazon Comprehend and Amazon Translate
Build Text Analytics Solutions with Amazon Comprehend and Amazon TranslateBuild Text Analytics Solutions with Amazon Comprehend and Amazon Translate
Build Text Analytics Solutions with Amazon Comprehend and Amazon Translate
 

Similar to Scale Machine Learning from zero to millions of users (April 2020)

"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
Provectus
 
Julien Simon "Scaling ML from 0 to millions of users"
Julien Simon "Scaling ML from 0 to millions of users"Julien Simon "Scaling ML from 0 to millions of users"
Julien Simon "Scaling ML from 0 to millions of users"
Fwdays
 
Canada DevOps Summit 2020 Presentation Nov_03_2020
Canada DevOps Summit 2020 Presentation Nov_03_2020Canada DevOps Summit 2020 Presentation Nov_03_2020
Canada DevOps Summit 2020 Presentation Nov_03_2020
Varun Manik
 
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
Amazon Web Services
 
Ml ops on AWS
Ml ops on AWSMl ops on AWS
Ml ops on AWS
PhilipBasford
 
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...
Rustem Feyzkhanov
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated ML
Mark Tabladillo
 
AWS Serverless patterns & best-practices in AWS
AWS Serverless  patterns & best-practices in AWSAWS Serverless  patterns & best-practices in AWS
AWS Serverless patterns & best-practices in AWS
Dima Pasko
 
Serverless at Lifestage
Serverless at LifestageServerless at Lifestage
Serverless at Lifestage
BATbern
 
RightScale Webinar: Operationalize Your Enterprise AWS Usage Through an IT Ve...
RightScale Webinar: Operationalize Your Enterprise AWS Usage Through an IT Ve...RightScale Webinar: Operationalize Your Enterprise AWS Usage Through an IT Ve...
RightScale Webinar: Operationalize Your Enterprise AWS Usage Through an IT Ve...
RightScale
 
오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015
오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015
오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015
Amazon Web Services Korea
 
AWS 201 Webinar Series - Rightsizing and Cost Optimizing your Deployment
AWS 201 Webinar Series - Rightsizing and Cost Optimizing your DeploymentAWS 201 Webinar Series - Rightsizing and Cost Optimizing your Deployment
AWS 201 Webinar Series - Rightsizing and Cost Optimizing your Deployment
Amazon Web Services
 
OpenSourceIndia-Suman.pptx
OpenSourceIndia-Suman.pptxOpenSourceIndia-Suman.pptx
OpenSourceIndia-Suman.pptx
Suman Debnath
 
Amazon ECS at Coursera: A unified execution framework while defending against...
Amazon ECS at Coursera: A unified execution framework while defending against...Amazon ECS at Coursera: A unified execution framework while defending against...
Amazon ECS at Coursera: A unified execution framework while defending against...
Brennan Saeta
 
(CMP406) Amazon ECS at Coursera: A general-purpose microservice
(CMP406) Amazon ECS at Coursera: A general-purpose microservice(CMP406) Amazon ECS at Coursera: A general-purpose microservice
(CMP406) Amazon ECS at Coursera: A general-purpose microservice
Amazon Web Services
 
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
Julien SIMON
 
AWS Summit 2013 | India - Running High Churn Development & Test Environments,...
AWS Summit 2013 | India - Running High Churn Development & Test Environments,...AWS Summit 2013 | India - Running High Churn Development & Test Environments,...
AWS Summit 2013 | India - Running High Churn Development & Test Environments,...
Amazon Web Services
 
Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS
Amazon Web Services
 
AWS DeepLens Workshop: Building Computer Vision Applications - BDA201 - Atlan...
AWS DeepLens Workshop: Building Computer Vision Applications - BDA201 - Atlan...AWS DeepLens Workshop: Building Computer Vision Applications - BDA201 - Atlan...
AWS DeepLens Workshop: Building Computer Vision Applications - BDA201 - Atlan...
Amazon Web Services
 
ServerTemplate Deep Dive
ServerTemplate Deep DiveServerTemplate Deep Dive
ServerTemplate Deep Dive
RightScale
 

Similar to Scale Machine Learning from zero to millions of users (April 2020) (20)

"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
 
Julien Simon "Scaling ML from 0 to millions of users"
Julien Simon "Scaling ML from 0 to millions of users"Julien Simon "Scaling ML from 0 to millions of users"
Julien Simon "Scaling ML from 0 to millions of users"
 
Canada DevOps Summit 2020 Presentation Nov_03_2020
Canada DevOps Summit 2020 Presentation Nov_03_2020Canada DevOps Summit 2020 Presentation Nov_03_2020
Canada DevOps Summit 2020 Presentation Nov_03_2020
 
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
 
Ml ops on AWS
Ml ops on AWSMl ops on AWS
Ml ops on AWS
 
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated ML
 
AWS Serverless patterns & best-practices in AWS
AWS Serverless  patterns & best-practices in AWSAWS Serverless  patterns & best-practices in AWS
AWS Serverless patterns & best-practices in AWS
 
Serverless at Lifestage
Serverless at LifestageServerless at Lifestage
Serverless at Lifestage
 
RightScale Webinar: Operationalize Your Enterprise AWS Usage Through an IT Ve...
RightScale Webinar: Operationalize Your Enterprise AWS Usage Through an IT Ve...RightScale Webinar: Operationalize Your Enterprise AWS Usage Through an IT Ve...
RightScale Webinar: Operationalize Your Enterprise AWS Usage Through an IT Ve...
 
오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015
오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015
오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015
 
AWS 201 Webinar Series - Rightsizing and Cost Optimizing your Deployment
AWS 201 Webinar Series - Rightsizing and Cost Optimizing your DeploymentAWS 201 Webinar Series - Rightsizing and Cost Optimizing your Deployment
AWS 201 Webinar Series - Rightsizing and Cost Optimizing your Deployment
 
OpenSourceIndia-Suman.pptx
OpenSourceIndia-Suman.pptxOpenSourceIndia-Suman.pptx
OpenSourceIndia-Suman.pptx
 
Amazon ECS at Coursera: A unified execution framework while defending against...
Amazon ECS at Coursera: A unified execution framework while defending against...Amazon ECS at Coursera: A unified execution framework while defending against...
Amazon ECS at Coursera: A unified execution framework while defending against...
 
(CMP406) Amazon ECS at Coursera: A general-purpose microservice
(CMP406) Amazon ECS at Coursera: A general-purpose microservice(CMP406) Amazon ECS at Coursera: A general-purpose microservice
(CMP406) Amazon ECS at Coursera: A general-purpose microservice
 
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
 
AWS Summit 2013 | India - Running High Churn Development & Test Environments,...
AWS Summit 2013 | India - Running High Churn Development & Test Environments,...AWS Summit 2013 | India - Running High Churn Development & Test Environments,...
AWS Summit 2013 | India - Running High Churn Development & Test Environments,...
 
Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS
 
AWS DeepLens Workshop: Building Computer Vision Applications - BDA201 - Atlan...
AWS DeepLens Workshop: Building Computer Vision Applications - BDA201 - Atlan...AWS DeepLens Workshop: Building Computer Vision Applications - BDA201 - Atlan...
AWS DeepLens Workshop: Building Computer Vision Applications - BDA201 - Atlan...
 
ServerTemplate Deep Dive
ServerTemplate Deep DiveServerTemplate Deep Dive
ServerTemplate Deep Dive
 

More from Julien SIMON

An introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceAn introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging Face
Julien SIMON
 
Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face Transformers
Julien SIMON
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with Transformers
Julien SIMON
 
Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)
Julien SIMON
 
Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)
Julien SIMON
 
An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)
Julien SIMON
 
Optimize your Machine Learning Workloads on AWS (July 2019)
Optimize your Machine Learning Workloads on AWS (July 2019)Optimize your Machine Learning Workloads on AWS (July 2019)
Optimize your Machine Learning Workloads on AWS (July 2019)
Julien SIMON
 
Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)
Julien SIMON
 
Become a Machine Learning developer with AWS (Avril 2019)
Become a Machine Learning developer with AWS (Avril 2019)Become a Machine Learning developer with AWS (Avril 2019)
Become a Machine Learning developer with AWS (Avril 2019)
Julien SIMON
 
Solve complex business problems with Amazon Personalize and Amazon Forecast (...
Solve complex business problems with Amazon Personalize and Amazon Forecast (...Solve complex business problems with Amazon Personalize and Amazon Forecast (...
Solve complex business problems with Amazon Personalize and Amazon Forecast (...
Julien SIMON
 
Build, Train and Deploy Machine Learning Models at Scale (April 2019)
Build, Train and Deploy Machine Learning Models at Scale (April 2019)Build, Train and Deploy Machine Learning Models at Scale (April 2019)
Build, Train and Deploy Machine Learning Models at Scale (April 2019)
Julien SIMON
 
Build Machine Learning Models with Amazon SageMaker (April 2019)
Build Machine Learning Models with Amazon SageMaker (April 2019)Build Machine Learning Models with Amazon SageMaker (April 2019)
Build Machine Learning Models with Amazon SageMaker (April 2019)
Julien SIMON
 
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
Julien SIMON
 
Optimize your machine learning workloads on AWS (March 2019)
Optimize your machine learning workloads on AWS (March 2019)Optimize your machine learning workloads on AWS (March 2019)
Optimize your machine learning workloads on AWS (March 2019)
Julien SIMON
 
Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)
Julien SIMON
 

More from Julien SIMON (15)

An introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceAn introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging Face
 
Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face Transformers
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with Transformers
 
Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)
 
Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)
 
An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)
 
Optimize your Machine Learning Workloads on AWS (July 2019)
Optimize your Machine Learning Workloads on AWS (July 2019)Optimize your Machine Learning Workloads on AWS (July 2019)
Optimize your Machine Learning Workloads on AWS (July 2019)
 
Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)
 
Become a Machine Learning developer with AWS (Avril 2019)
Become a Machine Learning developer with AWS (Avril 2019)Become a Machine Learning developer with AWS (Avril 2019)
Become a Machine Learning developer with AWS (Avril 2019)
 
Solve complex business problems with Amazon Personalize and Amazon Forecast (...
Solve complex business problems with Amazon Personalize and Amazon Forecast (...Solve complex business problems with Amazon Personalize and Amazon Forecast (...
Solve complex business problems with Amazon Personalize and Amazon Forecast (...
 
Build, Train and Deploy Machine Learning Models at Scale (April 2019)
Build, Train and Deploy Machine Learning Models at Scale (April 2019)Build, Train and Deploy Machine Learning Models at Scale (April 2019)
Build, Train and Deploy Machine Learning Models at Scale (April 2019)
 
Build Machine Learning Models with Amazon SageMaker (April 2019)
Build Machine Learning Models with Amazon SageMaker (April 2019)Build Machine Learning Models with Amazon SageMaker (April 2019)
Build Machine Learning Models with Amazon SageMaker (April 2019)
 
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
 
Optimize your machine learning workloads on AWS (March 2019)
Optimize your machine learning workloads on AWS (March 2019)Optimize your machine learning workloads on AWS (March 2019)
Optimize your machine learning workloads on AWS (March 2019)
 
Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)
 

Recently uploaded

Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 

Recently uploaded (20)

Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 

Scale Machine Learning from zero to millions of users (April 2020)

  • 1. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Scale Machine Learning from zero to millions of users Julien Simon GlobalTechnical Evangelist, AI & Machine Learning, AWS @julsimon
  • 2. Rationale How to train ML models and deploy them in production, from humble beginnings to world domination Try to take reasonable and justified steps Longer, more opinionated version: https://medium.com/@julsimon/scaling-machine- learning-from-0-to-millions-of-users-part-1-a2d36a5e849
  • 3.
  • 4. And so itbegins • You’ve trained a model on a local machine, using a popular open source library. • You’ve measured the model’s accuracy, and things look good. • Now you’d like to deploy it to check its actual behaviour, to run A/B tests, etc. • You’ve embedded the model in your business application. • You’ve deployed everything to a single virtual machine in the cloud. • Everything works, you’re serving predictions, life is good!
  • 5. Score card Single EC2 instance Infrastructure effort C’mon, it’s just one instance ML setup effort pip install tensorflow CI/CD integration Not needed Build models DIY Train models python train.py Deploy models (at scale) python predict.py Scale/HA inference Not needed Optimize costs Not needed Security Not needed
  • 6.
  • 7. A fewinstancesand models later… • Life is not that good • Too much manual work • Time-consuming and error-prone • Dependency hell • No cost optimization • Monolithic architecture • Deployment hell • Multiple apps can’t share the same model • Apps and models scale differently Use AWS-maintained tools • Deep Learning Amazon Machine Images • Deep Learning containers Dockerize Create a prediction service • Model servers • Bespoke API (Flask?)
  • 8. AWS Deep LearningAMIs andContainers Optimized environments on Amazon Linux or Ubuntu Conda AMI For developers who want pre- installed pip packages of DL frameworks in separate virtual environments. Base AMI For developers who want a clean slate to set up private DL engine repositories or custom builds of DL engines. Containers For developers who want pre- installed containers for DL frameworks (TensorFlow, PyTorch, Apache MXNet)
  • 9.
  • 10.
  • 11. Scaling alert! • More customers, more team members, more models, woohoo! • Scalability, high availability & security are now a thing • Scaling up is a losing proposition.You need to scale out • Only automation can save you: IaC, CI/CD and all that good DevOps stuff • What are your options?
  • 12. Option 1:virtualmachines • Definitely possible, but: • Why? Seriously, I want to know. • Operational and financial issues await if you don’t automate extensively • Training • Build on-demand clusters with CloudFormation,Terraform, etc. • Distributed training is a pain to set up • Prediction • Automate deployement with CI/CD • Scale with Auto Scaling, Load Balancers, etc. • Spot, spot, spot
  • 13. Score card More EC2 instances Infrastructure effort Lots ML setup effort Some (DLAMI) CI/CD integration No change Build models DIY Train models DIY Deploy models DIY (model servers) Scale/HA inference DIY (Auto Scaling, LB) Optimize costs DIY (Spot, automation) Security DIY (IAM,VPC, KMS)
  • 14. Option 2:Docker clusters • This makes a lot of sense if you’re already deploying apps to Docker • No change to the dev experience: same workflows, same CI/CD, etc. • Deploy prediction services on the same infrastructure as business apps. • Amazon ECS and Amazon EKS • Lots of flexibility: mixed instance types (including GPUs), placement constraints, etc. • Both come with AWS-maintainedAMIs that will save you time • One cluster or many clusters ? • Build on-demand development and test clusters with CloudFormation,Terraform, etc. • Many customers find that running a large single production cluster works better • Still instance-based and not fully-managed • Not a hands-off operation: services / pods, service discovery, etc. are nice but you still have work to do • And yes, this matters even if « someone else is taking care of clusters »
  • 15.
  • 16. Score card EC2 ECS / EKS Infrastructure effort Lots Some (Docker tools) ML setup effort Some (DLAMI) Some (DL containers) CI/CD integration No change No change Build models DIY DIY Train models (at scale) DIY DIY (Docker tools) Deploy models (at scale) DIY (model servers) DIY (Docker tools) Scale/HA inference DIY (Auto Scaling, LB) DIY (Services, pods, etc.) Optimize costs DIY (Spot, RIs, automation) DIY (Spot, RIs, automation) Security DIY (IAM,VPC, KMS) DIY (IAM,VPC, KMS)
  • 17. Option 3:go fullymanaged withAmazonSageMaker 1 2 3
  • 18. Model options on Amazon SageMaker Training code Factorization Machines Linear Learner Principal Component Analysis K-Means Clustering Image classification And more Built-in Algorithms (17) No ML coding required No infrastructure work required Distributed training Pipe mode BringYour Own Container Full control, run anything! R, C++, etc. No infrastructure work required Built-in Frameworks Bring your own code: script mode Open source containers No infrastructure work required Distributed training Pipe mode
  • 19. TheAmazonSageMakerAPI • Python SDK orchestrating all Amazon SageMaker activity • High-level objects for algorithm selection, training, deploying, automatic model tuning, etc. https://github.com/aws/sagemaker-python-sdk • Spark SDK (Python & Scala) https://github.com/aws/sagemaker-spark/tree/master/sagemaker-spark-sdk • AWS SDK • For scripting and automation • CLI : ‘aws sagemaker’ • Language SDKs: boto3, etc.
  • 20. Training and deploying tf_estimator = TensorFlow(entry_point='mnist_keras_tf.py', role=role, train_instance_count=1, train_instance_type='ml.c5.2xlarge’, framework_version='1.12', py_version='py3', script_mode=True, hyperparameters={ 'epochs': 10, 'learning-rate': 0.01}) tf_estimator.fit(data) # HTTPS endpoint backed by a single instance tf_endpoint = tf_estimator.deploy(initial_instance_count=1, instance_type=ml.t3.xlarge) tf_endpoint.predict(…)
  • 21. Training and deploying, atany scale tf_estimator = TensorFlow(entry_point=’my_crazy_cnn.py', role=role, train_instance_count=8, train_instance_type='ml.p3.16xlarge', # Total of 64 GPUs framework_version='1.12', py_version='py3', script_mode=True, hyperparameters={ 'epochs': 200, 'learning-rate': 0.01}) tf_estimator.fit(data) # HTTPS endpoint backed by 16 multi-AZ load-balanced instances tf_endpoint = tf_estimator.deploy(initial_instance_count=16, instance_type=ml.p3.2xlarge) tf_endpoint.predict(…)
  • 22. Score card EC2 ECS / EKS SageMaker Infrastructure effort Maximal Some (Docker tools) None ML setup effort Some (DLAMI) Some (DL containers) Minimal CI/CD integration No change No change Some (SDK, Step Functions) Build models DIY DIY 17 built-in algorithms Train models (at scale) DIY DIY (Docker tools) SDK: 2 LOCs Deploy models (at scale) DIY (model servers) DIY (Docker tools) SDK: 1 LOCs Kubernetes support Scale/HA inference DIY (Auto Scaling, LB) DIY (Services, pods, etc.) Built-in Optimize costs DIY (Spot, RIs, automation) DIY (Spot, RIs, automation) On-demand/Spot training, Auto Scaling for inference Security DIY (IAM,VPC, KMS) DIY (IAM,VPC, KMS) API parameters
  • 23. Score card Flamewarin3,2,1… EC2 ECS / EKS SageMaker Infrastructure effort Maximal Some (Docker tools) None ML setup effort Some (DLAMI) Some (DL containers) Minimal CI/CD integration No change No change Some (SDK, Step Functions) Build models DIY DIY 17 built-in algorithms Train models (at scale) DIY DIY (Docker tools) 2 LOCs Deploy models (at scale) DIY (model servers) DIY (Docker tools) SDK: 1 LOCs Kubernetes support Scale/HA inference DIY (Auto Scaling, LB) DIY (Services, pods, etc.) Built-in Optimize costs DIY (Spot, RIs, automation) DIY (Spot, RIs, automation) Spot training, Auto Scaling for inference Security DIY (IAM,VPC, KMS) DIY (IAM,VPC, KMS) API parameters Personal opinion Small scale only, unless you have strong DevOps skills and enjoy exercising them. Reasonable choice if you’re a Docker Docker shop, and if you’re able and and willing to integrate with the Docker/OSS ecosystem. If not, I’d think twice: Docker isn’t an ML platform. Learn it in a few hours, forget about servers, focus 100% on ML, enjoy goodies like pipe mode, distributed training, HPO, HPO, inference pipelines and more.
  • 24. Conclusion • Whatever works for you at this time is fine • Don’t over-engineer, and don’t « plan for the future » • Optimize for current business conditions, pay attention toTCO • Models and data matter, not infrastructure • When conditions change, move fast: smash and rebuild • ... which is what cloud is all about! • « 100% of our time spent on ML » shall be the whole of the Law • Mix and match if it makes sense • Train on SageMaker, deploy on ECS/EKS… or vice versa • Write your own story!
  • 26. Thank you! © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Julien Simon GlobalTechnical Evangelist, AI & Machine Learning, AWS @julsimon

Editor's Notes

  1. AI Services: AI Services are intentionally easy to use. They can be accessed via a simple API call. We’ve pulled the best and most targeted capabilities into ready-made services--for example image recognition or transcription. The focus here is really on enabling any developer—no ML skills required—to be able to develop AI applications using one of our services. These API services, used in conjunction, create compelling solutions that really target business problems and use cases. Customers can build these capabilities into their new and existing applications to reduce costs, increase speed,  improve customer satisfaction and insight, and build ‘modern’ intelligent applications What is your use case? What are the capabilities you might need? There’s an AI Service, or a pairing of services that will address the need. AI Services descriptions for color: Amazon Rekognition: Rekognition makes it easy to add image and video analysis to your applications. You just provide an image or video to the Rekognition API, and the service can identify the objects, people, text, scenes, and activities, as well as detect any inappropriate content. Amazon Rekognition also provides highly accurate facial analysis and facial recognition on images and video that you provide. You can detect, analyze, and compare faces for a wide variety of user verification, people counting, and public safety use cases. Rekognition is a simple and easy to use API that can quickly analyze any image or video file stored in Amazon S3. Amazon Rekognition is always learning from new data, and we are continually adding new labels and facial recognition features to the service. More info: https://aws.amazon.com/rekognition/ Amazon Polly: Amazon Polly is a service that turns text into lifelike speech, allowing you to create applications that talk, and build entirely new categories of speech-enabled products. Polly is a text to speech service that uses advanced deep learning technologies to synthesize speech that sounds like a human voice. With dozens of lifelike voices across a variety of languages, you can select the ideal voice and build speech-enabled applications that work in many different countries. More info: https://aws.amazon.com/polly/ Amazon Transcribe: Amazon Transcribe is an automatic speech recognition (ASR) service that makes it easy for developers to add speech-to-text capability to their applications. Using the Amazon Transcribe API, you can analyze audio files stored in Amazon S3 and have the service return a text file of the transcribed speech. Amazon Transcribe can be used for lots of common applications, including the transcription of customer service calls and generating subtitles on audio and video content. The service can transcribe audio files stored in common formats, like WAV and MP3, with time stamps for every word so that you can easily locate the audio in the original source by searching for the text. Amazon Transcribe is continually learning and improving to keep pace with the evolution of language. More info: https://aws.amazon.com/transcribe/ Amazon Translate: Amazon Translate is a neural machine translation service that delivers fast, high-quality, and affordable language translation. Neural machine translation is a form of language translation automation that uses deep learning models to deliver more accurate and more natural sounding translation than traditional statistical and rule-based translation algorithms. Amazon Translate allows you to localize content - such as websites and applications - for international users, and to easily translate large volumes of text efficiently. More info: https://aws.amazon.com/translate/ Amazon Comprehend: Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text. The service identifies the language of the text; extracts key phrases, places, people, brands, or events; understands how positive or negative the text is; analyzes text using tokenization and parts of speech; and automatically organizes a collection of text files by topic. Using these APIs, you can analyze text and apply the results in a wide range of applications including voice of customer analysis, intelligent document search, and content personalization for web applications.  More info: https://aws.amazon.com/comprehend Amazon Lex: Amazon Lex is a service for building conversational interfaces into any application using voice and text. Amazon Lex provides the advanced deep learning functionalities of automatic speech recognition (ASR) for converting speech to text, and natural language understanding (NLU) to recognize the intent of the text, to enable you to build applications with highly engaging user experiences and lifelike conversational interactions. With Amazon Lex, the same deep learning technologies that power Amazon Alexa are now available to any developer, enabling you to quickly and easily build sophisticated, natural language, conversational bots  More info: https://aws.amazon.com/lex    
  2. AI Services: AI Services are intentionally easy to use. They can be accessed via a simple API call. We’ve pulled the best and most targeted capabilities into ready-made services--for example image recognition or transcription. The focus here is really on enabling any developer—no ML skills required—to be able to develop AI applications using one of our services. These API services, used in conjunction, create compelling solutions that really target business problems and use cases. Customers can build these capabilities into their new and existing applications to reduce costs, increase speed,  improve customer satisfaction and insight, and build ‘modern’ intelligent applications What is your use case? What are the capabilities you might need? There’s an AI Service, or a pairing of services that will address the need. AI Services descriptions for color: Amazon Rekognition: Rekognition makes it easy to add image and video analysis to your applications. You just provide an image or video to the Rekognition API, and the service can identify the objects, people, text, scenes, and activities, as well as detect any inappropriate content. Amazon Rekognition also provides highly accurate facial analysis and facial recognition on images and video that you provide. You can detect, analyze, and compare faces for a wide variety of user verification, people counting, and public safety use cases. Rekognition is a simple and easy to use API that can quickly analyze any image or video file stored in Amazon S3. Amazon Rekognition is always learning from new data, and we are continually adding new labels and facial recognition features to the service. More info: https://aws.amazon.com/rekognition/ Amazon Polly: Amazon Polly is a service that turns text into lifelike speech, allowing you to create applications that talk, and build entirely new categories of speech-enabled products. Polly is a text to speech service that uses advanced deep learning technologies to synthesize speech that sounds like a human voice. With dozens of lifelike voices across a variety of languages, you can select the ideal voice and build speech-enabled applications that work in many different countries. More info: https://aws.amazon.com/polly/ Amazon Transcribe: Amazon Transcribe is an automatic speech recognition (ASR) service that makes it easy for developers to add speech-to-text capability to their applications. Using the Amazon Transcribe API, you can analyze audio files stored in Amazon S3 and have the service return a text file of the transcribed speech. Amazon Transcribe can be used for lots of common applications, including the transcription of customer service calls and generating subtitles on audio and video content. The service can transcribe audio files stored in common formats, like WAV and MP3, with time stamps for every word so that you can easily locate the audio in the original source by searching for the text. Amazon Transcribe is continually learning and improving to keep pace with the evolution of language. More info: https://aws.amazon.com/transcribe/ Amazon Translate: Amazon Translate is a neural machine translation service that delivers fast, high-quality, and affordable language translation. Neural machine translation is a form of language translation automation that uses deep learning models to deliver more accurate and more natural sounding translation than traditional statistical and rule-based translation algorithms. Amazon Translate allows you to localize content - such as websites and applications - for international users, and to easily translate large volumes of text efficiently. More info: https://aws.amazon.com/translate/ Amazon Comprehend: Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text. The service identifies the language of the text; extracts key phrases, places, people, brands, or events; understands how positive or negative the text is; analyzes text using tokenization and parts of speech; and automatically organizes a collection of text files by topic. Using these APIs, you can analyze text and apply the results in a wide range of applications including voice of customer analysis, intelligent document search, and content personalization for web applications.  More info: https://aws.amazon.com/comprehend Amazon Lex: Amazon Lex is a service for building conversational interfaces into any application using voice and text. Amazon Lex provides the advanced deep learning functionalities of automatic speech recognition (ASR) for converting speech to text, and natural language understanding (NLU) to recognize the intent of the text, to enable you to build applications with highly engaging user experiences and lifelike conversational interactions. With Amazon Lex, the same deep learning technologies that power Amazon Alexa are now available to any developer, enabling you to quickly and easily build sophisticated, natural language, conversational bots  More info: https://aws.amazon.com/lex