SlideShare a Scribd company logo

Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Practices

SQUADEX
SQUADEX

The right setup of the local development and cloud infrastructure are the requirement for reproducible and reliable Machine Learning products. They also require a well-polished process behind the management of the data science life cycle, from research to production. ML stimulates the need for a more advanced type of software development process and requires a sophisticated ecosystem of services than classic IDE. This SlideShare provides ML engineers with insightful tips on how to use specific AWS & open-sources tools as well as DevOps best practices to complete routine tasks like data ingestion, data preprocessing, feature engineering, labeling, training, parameters tuning, testing, deployment, monitoring, and retraining. On top of that, you will learn what can and what can not be automated when it comes to using both AWS products and tools like Kubernetes, Kubeflow, Jupiter notebooks, TensorFlow, and TPOT. The keynote was originally delivered to Stanford academia (University IT, students, and staff) on campus of Stanford University. Speakers: -- Stepan Pushkarev, CTO at Squadex (https://www.linkedin.com/in/stepanpushkarev/) -- Rinat Gareev, Machine Learning Engineer at Squadex (https://www.linkedin.com/in/gareev/) -- Iskandar Sitdikov, Machine Learning Engineer at Squadex (https://www.linkedin.com/in/icekhan/)

1 of 43
Download to read offline
Tooling for Machine Learning:
AWS Products, Open Source
Tools, and DevOps Practices
Squadex Consulting Services
Stepan Pushkarev
Platform Engineer
Rinat Gareev
ML Engineer
Iskandar Sitdikov
ML Engineer
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Practices
Overview
● Data Pipelines
● Modeling & Training
● Deployment
Data Ingestion
● Streaming - as much as possible even if you are not realtime
○ 99% of world's data has a timestamp
○ Fast Data is better that Big Data
○ Rich API for time series data (late arrivals, replays, etc)
○ Streams - advanced durable file system
○ Monitoring and operations friendly
○ Windowing aggregations and stateful transformations
○ Enrichments right in the stream
○ Realtime!
Ad

Recommended

Scaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and FeastScaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and FeastDatabricks
 
Architectures, Frameworks and Infrastructure
Architectures, Frameworks and InfrastructureArchitectures, Frameworks and Infrastructure
Architectures, Frameworks and Infrastructureharendra_pathak
 
Predicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data AnalyticsPredicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data AnalyticsDatabricks
 
XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebookAniket Mokashi
 
Migrating Oracle database to Cassandra
Migrating Oracle database to CassandraMigrating Oracle database to Cassandra
Migrating Oracle database to CassandraUmair Mansoob
 
Migrating Oracle database to PostgreSQL
Migrating Oracle database to PostgreSQLMigrating Oracle database to PostgreSQL
Migrating Oracle database to PostgreSQLUmair Mansoob
 
An introduction into Spark ML plus how to go beyond when you get stuck
An introduction into Spark ML plus how to go beyond when you get stuckAn introduction into Spark ML plus how to go beyond when you get stuck
An introduction into Spark ML plus how to go beyond when you get stuckData Con LA
 
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...Databricks
 

More Related Content

What's hot

Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityJen Aman
 
Sysml 2019 demo_paper
Sysml 2019 demo_paperSysml 2019 demo_paper
Sysml 2019 demo_paperstrange_loop
 
AutoML Toolkit – Deep Dive
AutoML Toolkit – Deep DiveAutoML Toolkit – Deep Dive
AutoML Toolkit – Deep DiveDatabricks
 
Productionizing Deep Reinforcement Learning with Spark and MLflow
Productionizing Deep Reinforcement Learning with Spark and MLflowProductionizing Deep Reinforcement Learning with Spark and MLflow
Productionizing Deep Reinforcement Learning with Spark and MLflowDatabricks
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopJosh Patterson
 
SAP Adaptive Computing Design
SAP Adaptive Computing DesignSAP Adaptive Computing Design
SAP Adaptive Computing DesignGary Jackson MBCS
 
Your Guide to Streaming - The Engineer's Perspective
Your Guide to Streaming - The Engineer's PerspectiveYour Guide to Streaming - The Engineer's Perspective
Your Guide to Streaming - The Engineer's PerspectiveIlya Ganelin
 
Software architectures for the cloud
Software architectures for the cloudSoftware architectures for the cloud
Software architectures for the cloudGeorgios Gousios
 
Benchmarking Performance and Scalability with Web Stress
Benchmarking Performance and Scalability with Web StressBenchmarking Performance and Scalability with Web Stress
Benchmarking Performance and Scalability with Web StressInterSystems Corporation
 
Apache Spark - A High Level overview
Apache Spark - A High Level overviewApache Spark - A High Level overview
Apache Spark - A High Level overviewKaran Alang
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsDatabricks
 
BK2015 Arkitektur sikkerhet og skalering
BK2015 Arkitektur sikkerhet og skaleringBK2015 Arkitektur sikkerhet og skalering
BK2015 Arkitektur sikkerhet og skaleringGeodata AS
 
How to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkHow to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkDatabricks
 
PandasUDFs: One Weird Trick to Scaled Ensembles
PandasUDFs: One Weird Trick to Scaled EnsemblesPandasUDFs: One Weird Trick to Scaled Ensembles
PandasUDFs: One Weird Trick to Scaled EnsemblesDatabricks
 
Operational Tips for Deploying Spark by Miklos Christine
Operational Tips for Deploying Spark by Miklos ChristineOperational Tips for Deploying Spark by Miklos Christine
Operational Tips for Deploying Spark by Miklos ChristineSpark Summit
 
Akka Streams - From Zero to Kafka
Akka Streams - From Zero to KafkaAkka Streams - From Zero to Kafka
Akka Streams - From Zero to KafkaMark Harrison
 
Meetup developing building and_deploying databases with SSDT
Meetup developing building and_deploying databases with SSDTMeetup developing building and_deploying databases with SSDT
Meetup developing building and_deploying databases with SSDTSolidify
 

What's hot (20)

Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Sysml 2019 demo_paper
Sysml 2019 demo_paperSysml 2019 demo_paper
Sysml 2019 demo_paper
 
AutoML Toolkit – Deep Dive
AutoML Toolkit – Deep DiveAutoML Toolkit – Deep Dive
AutoML Toolkit – Deep Dive
 
Productionizing Deep Reinforcement Learning with Spark and MLflow
Productionizing Deep Reinforcement Learning with Spark and MLflowProductionizing Deep Reinforcement Learning with Spark and MLflow
Productionizing Deep Reinforcement Learning with Spark and MLflow
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
 
SAP Adaptive Computing Design
SAP Adaptive Computing DesignSAP Adaptive Computing Design
SAP Adaptive Computing Design
 
Your Guide to Streaming - The Engineer's Perspective
Your Guide to Streaming - The Engineer's PerspectiveYour Guide to Streaming - The Engineer's Perspective
Your Guide to Streaming - The Engineer's Perspective
 
Software architectures for the cloud
Software architectures for the cloudSoftware architectures for the cloud
Software architectures for the cloud
 
Spark on YARN
Spark on YARNSpark on YARN
Spark on YARN
 
Benchmarking Performance and Scalability with Web Stress
Benchmarking Performance and Scalability with Web StressBenchmarking Performance and Scalability with Web Stress
Benchmarking Performance and Scalability with Web Stress
 
Oracle OpenWorld 2014 Review Part Four - PaaS Middleware
Oracle OpenWorld 2014 Review Part Four - PaaS MiddlewareOracle OpenWorld 2014 Review Part Four - PaaS Middleware
Oracle OpenWorld 2014 Review Part Four - PaaS Middleware
 
Apache Spark - A High Level overview
Apache Spark - A High Level overviewApache Spark - A High Level overview
Apache Spark - A High Level overview
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark Jobs
 
BK2015 Arkitektur sikkerhet og skalering
BK2015 Arkitektur sikkerhet og skaleringBK2015 Arkitektur sikkerhet og skalering
BK2015 Arkitektur sikkerhet og skalering
 
How to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkHow to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache Spark
 
PandasUDFs: One Weird Trick to Scaled Ensembles
PandasUDFs: One Weird Trick to Scaled EnsemblesPandasUDFs: One Weird Trick to Scaled Ensembles
PandasUDFs: One Weird Trick to Scaled Ensembles
 
Azure appfabric caching intro and tips
Azure appfabric caching intro and tipsAzure appfabric caching intro and tips
Azure appfabric caching intro and tips
 
Operational Tips for Deploying Spark by Miklos Christine
Operational Tips for Deploying Spark by Miklos ChristineOperational Tips for Deploying Spark by Miklos Christine
Operational Tips for Deploying Spark by Miklos Christine
 
Akka Streams - From Zero to Kafka
Akka Streams - From Zero to KafkaAkka Streams - From Zero to Kafka
Akka Streams - From Zero to Kafka
 
Meetup developing building and_deploying databases with SSDT
Meetup developing building and_deploying databases with SSDTMeetup developing building and_deploying databases with SSDT
Meetup developing building and_deploying databases with SSDT
 

Similar to Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Practices

AI Stack on AWS: Amazon SageMaker and Beyond
AI Stack on AWS: Amazon SageMaker and BeyondAI Stack on AWS: Amazon SageMaker and Beyond
AI Stack on AWS: Amazon SageMaker and BeyondProvectus
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....Databricks
 
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Sotrender
 
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...PAPIs.io
 
ML_Development_with_Sagemaker.pptx
ML_Development_with_Sagemaker.pptxML_Development_with_Sagemaker.pptx
ML_Development_with_Sagemaker.pptxTemiReply
 
Performance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei RadovPerformance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei RadovValeriia Maliarenko
 
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...Rustem Feyzkhanov
 
High Performance Distributed TensorFlow with GPUs and Kubernetes
High Performance Distributed TensorFlow with GPUs and KubernetesHigh Performance Distributed TensorFlow with GPUs and Kubernetes
High Performance Distributed TensorFlow with GPUs and Kubernetesinside-BigData.com
 
MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...
MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...
MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...Piyush Kumar
 
Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...
Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...
Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...Data Con LA
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21JDA Labs MTL
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT_MTL
 
Azure machine learning service
Azure machine learning serviceAzure machine learning service
Azure machine learning serviceRuth Yakubu
 
국내 건설 기계사 도입 사례를 통해 보는 AI가 적용된 수요 예측 관리 - 베스핀글로벌 조창윤 AI/ML팀 팀장
국내 건설 기계사 도입 사례를 통해 보는 AI가 적용된 수요 예측 관리 - 베스핀글로벌 조창윤 AI/ML팀 팀장국내 건설 기계사 도입 사례를 통해 보는 AI가 적용된 수요 예측 관리 - 베스핀글로벌 조창윤 AI/ML팀 팀장
국내 건설 기계사 도입 사례를 통해 보는 AI가 적용된 수요 예측 관리 - 베스핀글로벌 조창윤 AI/ML팀 팀장BESPIN GLOBAL
 
Camunda BPM 7.2: Performance and Scalability (English)
Camunda BPM 7.2: Performance and Scalability (English)Camunda BPM 7.2: Performance and Scalability (English)
Camunda BPM 7.2: Performance and Scalability (English)camunda services GmbH
 
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJAsudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJANicolas Poggi
 
Scalability strategies for cloud based system architecture
Scalability strategies for cloud based system architectureScalability strategies for cloud based system architecture
Scalability strategies for cloud based system architectureSangJin Kang
 
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Iulian Pintoiu
 
20211202 NADOG Adapting to Covid with Serverless Craeg Strong Ariel Partners
20211202 NADOG Adapting to Covid with Serverless Craeg Strong Ariel Partners20211202 NADOG Adapting to Covid with Serverless Craeg Strong Ariel Partners
20211202 NADOG Adapting to Covid with Serverless Craeg Strong Ariel PartnersCraeg Strong
 
Productionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices ArchitectureProductionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices ArchitectureDatabricks
 

Similar to Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Practices (20)

AI Stack on AWS: Amazon SageMaker and Beyond
AI Stack on AWS: Amazon SageMaker and BeyondAI Stack on AWS: Amazon SageMaker and Beyond
AI Stack on AWS: Amazon SageMaker and Beyond
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
 
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
 
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
 
ML_Development_with_Sagemaker.pptx
ML_Development_with_Sagemaker.pptxML_Development_with_Sagemaker.pptx
ML_Development_with_Sagemaker.pptx
 
Performance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei RadovPerformance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei Radov
 
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...
 
High Performance Distributed TensorFlow with GPUs and Kubernetes
High Performance Distributed TensorFlow with GPUs and KubernetesHigh Performance Distributed TensorFlow with GPUs and Kubernetes
High Performance Distributed TensorFlow with GPUs and Kubernetes
 
MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...
MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...
MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...
 
Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...
Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...
Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017
 
Azure machine learning service
Azure machine learning serviceAzure machine learning service
Azure machine learning service
 
국내 건설 기계사 도입 사례를 통해 보는 AI가 적용된 수요 예측 관리 - 베스핀글로벌 조창윤 AI/ML팀 팀장
국내 건설 기계사 도입 사례를 통해 보는 AI가 적용된 수요 예측 관리 - 베스핀글로벌 조창윤 AI/ML팀 팀장국내 건설 기계사 도입 사례를 통해 보는 AI가 적용된 수요 예측 관리 - 베스핀글로벌 조창윤 AI/ML팀 팀장
국내 건설 기계사 도입 사례를 통해 보는 AI가 적용된 수요 예측 관리 - 베스핀글로벌 조창윤 AI/ML팀 팀장
 
Camunda BPM 7.2: Performance and Scalability (English)
Camunda BPM 7.2: Performance and Scalability (English)Camunda BPM 7.2: Performance and Scalability (English)
Camunda BPM 7.2: Performance and Scalability (English)
 
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJAsudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJA
 
Scalability strategies for cloud based system architecture
Scalability strategies for cloud based system architectureScalability strategies for cloud based system architecture
Scalability strategies for cloud based system architecture
 
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019
 
20211202 NADOG Adapting to Covid with Serverless Craeg Strong Ariel Partners
20211202 NADOG Adapting to Covid with Serverless Craeg Strong Ariel Partners20211202 NADOG Adapting to Covid with Serverless Craeg Strong Ariel Partners
20211202 NADOG Adapting to Covid with Serverless Craeg Strong Ariel Partners
 
Productionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices ArchitectureProductionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices Architecture
 

More from SQUADEX

Osdn serverless technologies with kubernetes
Osdn serverless technologies with kubernetes Osdn serverless technologies with kubernetes
Osdn serverless technologies with kubernetes SQUADEX
 
Spark as etl_squadex
Spark as etl_squadexSpark as etl_squadex
Spark as etl_squadexSQUADEX
 
Squadex DevOps Trainings
Squadex DevOps TrainingsSquadex DevOps Trainings
Squadex DevOps TrainingsSQUADEX
 
Data driven culture & infrastructure from the ground up
Data driven culture & infrastructure from the ground upData driven culture & infrastructure from the ground up
Data driven culture & infrastructure from the ground upSQUADEX
 
Enterprise level cloud CI
Enterprise level cloud CIEnterprise level cloud CI
Enterprise level cloud CISQUADEX
 
Canary releases & Blue green deployment
Canary releases & Blue green deploymentCanary releases & Blue green deployment
Canary releases & Blue green deploymentSQUADEX
 
Building DevOps culture from bottom up
Building DevOps culture from bottom upBuilding DevOps culture from bottom up
Building DevOps culture from bottom upSQUADEX
 
Kubernetes as a cloud for CI
Kubernetes as a cloud for CIKubernetes as a cloud for CI
Kubernetes as a cloud for CISQUADEX
 

More from SQUADEX (8)

Osdn serverless technologies with kubernetes
Osdn serverless technologies with kubernetes Osdn serverless technologies with kubernetes
Osdn serverless technologies with kubernetes
 
Spark as etl_squadex
Spark as etl_squadexSpark as etl_squadex
Spark as etl_squadex
 
Squadex DevOps Trainings
Squadex DevOps TrainingsSquadex DevOps Trainings
Squadex DevOps Trainings
 
Data driven culture & infrastructure from the ground up
Data driven culture & infrastructure from the ground upData driven culture & infrastructure from the ground up
Data driven culture & infrastructure from the ground up
 
Enterprise level cloud CI
Enterprise level cloud CIEnterprise level cloud CI
Enterprise level cloud CI
 
Canary releases & Blue green deployment
Canary releases & Blue green deploymentCanary releases & Blue green deployment
Canary releases & Blue green deployment
 
Building DevOps culture from bottom up
Building DevOps culture from bottom upBuilding DevOps culture from bottom up
Building DevOps culture from bottom up
 
Kubernetes as a cloud for CI
Kubernetes as a cloud for CIKubernetes as a cloud for CI
Kubernetes as a cloud for CI
 

Recently uploaded

Battle of React State Managers in frontend applications
Battle of React State Managers in frontend applicationsBattle of React State Managers in frontend applications
Battle of React State Managers in frontend applicationsEvangelia Mitsopoulou
 
Roundtable_-_API_Research__Testing_Tools.pdf
Roundtable_-_API_Research__Testing_Tools.pdfRoundtable_-_API_Research__Testing_Tools.pdf
Roundtable_-_API_Research__Testing_Tools.pdfMostafa Higazy
 
The Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolThe Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolProduct School
 
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro KozhevinFwdays
 
Revolutionizing The Banking Industry: The Monzo Way by CPO, Monzo
Revolutionizing The Banking Industry: The Monzo Way by CPO, MonzoRevolutionizing The Banking Industry: The Monzo Way by CPO, Monzo
Revolutionizing The Banking Industry: The Monzo Way by CPO, MonzoProduct School
 
Dynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineeringDynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineeringMassimo Talia
 
From Challenger to Champion: How SpiraPlan Outperforms JIRA+Plugins
From Challenger to Champion: How SpiraPlan Outperforms JIRA+PluginsFrom Challenger to Champion: How SpiraPlan Outperforms JIRA+Plugins
From Challenger to Champion: How SpiraPlan Outperforms JIRA+PluginsInflectra
 
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaBuilding Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaISPMAIndia
 
Power of 2024 - WITforce Odyssey.pptx.pdf
Power of 2024 - WITforce Odyssey.pptx.pdfPower of 2024 - WITforce Odyssey.pptx.pdf
Power of 2024 - WITforce Odyssey.pptx.pdfkatalinjordans1
 
How to write an effective Cyber Incident Response Plan
How to write an effective Cyber Incident Response PlanHow to write an effective Cyber Incident Response Plan
How to write an effective Cyber Incident Response PlanDatabarracks
 
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...ISPMAIndia
 
Mind your App Footprint 🐾⚡️🌱 (@FlutterHeroes 2024)
Mind your App Footprint 🐾⚡️🌱 (@FlutterHeroes 2024)Mind your App Footprint 🐾⚡️🌱 (@FlutterHeroes 2024)
Mind your App Footprint 🐾⚡️🌱 (@FlutterHeroes 2024)François
 
Launching New Products In Companies Where It Matters Most by Product Director...
Launching New Products In Companies Where It Matters Most by Product Director...Launching New Products In Companies Where It Matters Most by Product Director...
Launching New Products In Companies Where It Matters Most by Product Director...Product School
 
Relationship Counselling: From Disjointed Features to Product-First Thinking ...
Relationship Counselling: From Disjointed Features to Product-First Thinking ...Relationship Counselling: From Disjointed Features to Product-First Thinking ...
Relationship Counselling: From Disjointed Features to Product-First Thinking ...Product School
 
Enterprise Architecture As Strategy - Book Review
Enterprise Architecture As Strategy - Book ReviewEnterprise Architecture As Strategy - Book Review
Enterprise Architecture As Strategy - Book ReviewAshraf Fouad
 
How we think about an advisor tech stack
How we think about an advisor tech stackHow we think about an advisor tech stack
How we think about an advisor tech stackSummit
 
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...Product School
 
How AI and ChatGPT are changing cybersecurity forever.pptx
How AI and ChatGPT are changing cybersecurity forever.pptxHow AI and ChatGPT are changing cybersecurity forever.pptx
How AI and ChatGPT are changing cybersecurity forever.pptxInfosec
 
Pragmatic UI testing with Compose Semantics.pdf
Pragmatic UI testing with Compose Semantics.pdfPragmatic UI testing with Compose Semantics.pdf
Pragmatic UI testing with Compose Semantics.pdfinfogdgmi
 

Recently uploaded (20)

Battle of React State Managers in frontend applications
Battle of React State Managers in frontend applicationsBattle of React State Managers in frontend applications
Battle of React State Managers in frontend applications
 
Roundtable_-_API_Research__Testing_Tools.pdf
Roundtable_-_API_Research__Testing_Tools.pdfRoundtable_-_API_Research__Testing_Tools.pdf
Roundtable_-_API_Research__Testing_Tools.pdf
 
The Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolThe Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product School
 
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
 
Revolutionizing The Banking Industry: The Monzo Way by CPO, Monzo
Revolutionizing The Banking Industry: The Monzo Way by CPO, MonzoRevolutionizing The Banking Industry: The Monzo Way by CPO, Monzo
Revolutionizing The Banking Industry: The Monzo Way by CPO, Monzo
 
Dynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineeringDynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineering
 
From Challenger to Champion: How SpiraPlan Outperforms JIRA+Plugins
From Challenger to Champion: How SpiraPlan Outperforms JIRA+PluginsFrom Challenger to Champion: How SpiraPlan Outperforms JIRA+Plugins
From Challenger to Champion: How SpiraPlan Outperforms JIRA+Plugins
 
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaBuilding Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
 
Power of 2024 - WITforce Odyssey.pptx.pdf
Power of 2024 - WITforce Odyssey.pptx.pdfPower of 2024 - WITforce Odyssey.pptx.pdf
Power of 2024 - WITforce Odyssey.pptx.pdf
 
How to write an effective Cyber Incident Response Plan
How to write an effective Cyber Incident Response PlanHow to write an effective Cyber Incident Response Plan
How to write an effective Cyber Incident Response Plan
 
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
 
Mind your App Footprint 🐾⚡️🌱 (@FlutterHeroes 2024)
Mind your App Footprint 🐾⚡️🌱 (@FlutterHeroes 2024)Mind your App Footprint 🐾⚡️🌱 (@FlutterHeroes 2024)
Mind your App Footprint 🐾⚡️🌱 (@FlutterHeroes 2024)
 
Launching New Products In Companies Where It Matters Most by Product Director...
Launching New Products In Companies Where It Matters Most by Product Director...Launching New Products In Companies Where It Matters Most by Product Director...
Launching New Products In Companies Where It Matters Most by Product Director...
 
Relationship Counselling: From Disjointed Features to Product-First Thinking ...
Relationship Counselling: From Disjointed Features to Product-First Thinking ...Relationship Counselling: From Disjointed Features to Product-First Thinking ...
Relationship Counselling: From Disjointed Features to Product-First Thinking ...
 
Enterprise Architecture As Strategy - Book Review
Enterprise Architecture As Strategy - Book ReviewEnterprise Architecture As Strategy - Book Review
Enterprise Architecture As Strategy - Book Review
 
How we think about an advisor tech stack
How we think about an advisor tech stackHow we think about an advisor tech stack
How we think about an advisor tech stack
 
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
 
How AI and ChatGPT are changing cybersecurity forever.pptx
How AI and ChatGPT are changing cybersecurity forever.pptxHow AI and ChatGPT are changing cybersecurity forever.pptx
How AI and ChatGPT are changing cybersecurity forever.pptx
 
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
 
Pragmatic UI testing with Compose Semantics.pdf
Pragmatic UI testing with Compose Semantics.pdfPragmatic UI testing with Compose Semantics.pdf
Pragmatic UI testing with Compose Semantics.pdf
 

Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Practices

  • 1. Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Practices
  • 3. Stepan Pushkarev Platform Engineer Rinat Gareev ML Engineer Iskandar Sitdikov ML Engineer
  • 5. Overview ● Data Pipelines ● Modeling & Training ● Deployment
  • 6. Data Ingestion ● Streaming - as much as possible even if you are not realtime ○ 99% of world's data has a timestamp ○ Fast Data is better that Big Data ○ Rich API for time series data (late arrivals, replays, etc) ○ Streams - advanced durable file system ○ Monitoring and operations friendly ○ Windowing aggregations and stateful transformations ○ Enrichments right in the stream ○ Realtime!
  • 9. Batch AirFlow - DAG runner Pros: - Nice UI - Easy to start, plugins ecosystem Cons: - Celery - Challenging for Operations Tip: - Always use Docker or Kube Operators AWS Step Functions Pros - Managed by AWS - Event driven Cons: - Less sophisticated flows - JSON programming - Managed by AWS
  • 10. Batch - System Layer - Versioning - Contracts with input / output schemas and transformation rules (DSL) - HTTP, S3, Postgres etc. clients to easily retrieve / persist / upload data - Unified naming and folder structures based on service / resource / timestamp / extension - Transactions, state persistence - Cleanup - Metrics, Logging & Tracing - Trick - allows switching to streaming any time!
  • 11. Glue Catalogue - Must have Glue ETL Jobs - Your ETL “System Layer” for Spark
  • 12. Overview ● ML Process ● Data Pipelines ● Modeling & Training ● Deployment
  • 13. Machine Learning on AWS Amazon Machine Learning Amazon SageMaker AWS Deep Learning AMIs * application services (Comprehend, Lex, Polly, Transcribe, Translate, Rekognition) are out of scope
  • 14. Machine Learning on AWS – AML Amazon Machine Learning Amazon SageMaker AWS Deep Learning AMIs Provide data, no code* required *optionally – JSON Programming Efforts ` Flexibility
  • 15. Amazon Machine Learning – Overview Good for lazy & quick baselines Inputs: CSV on S3, SQL over Redshift or RDS Mysql Includes: - Data visualization - Data transformation utils (with inferred default ‘recipes’), Feature Selection - Feature crosses - Prediction targets: numeric – linear regression categorical – (multinomial) logistic regression - Evaluation
  • 16. Amazon Machine Learning – Limits No hyperparameter tuning (Cf. SageMaker Linear Learner) No built-in cross-validation (requires additional scripting) System limits: 100 KB per observation, max 10000 output features, max 100 classes, etc
  • 17. Machine Learning on AWS – SageMaker Amazon Machine Learning Amazon SageMaker AWS Deep Learning AMIs A platform that consists of: ● model implementations ● model tuner ● notebook hosting ● model hosting Programming Efforts ` Flexibility
  • 18. SageMaker – Algorithms – Overview Classification/Regression ● Linear Learner ● XGBoost ● Factorization Machine ● K-Nearest Neighbors Image processing ● Image Classification ● Object Detection Text processing ● BlazingText ● Topic Modeling – LDA ● Topic Modeling – NTM Clustering ● K-Means Time Series ● DeepAR Forecasting Dimensionality Reduction ● Principal Component Analysis Encoder-Decoder ● Seq2Seq Anomaly Detection ● Random Cut Forest Bring Your Own
  • 19. SageMaker – BlazingText Very fast & convenient FastText implementation Use case: 1. Given few labelled texts for a ML problem and much more unlabelled (domain-specific) 2. Train word embeddings on unlabelled data 3. Use these embeddings to initialize input layer of your neural network Also: BlazingText provides supervised mode for text classification
  • 20. SageMaker – Image Classification Implementation of ‘Deep Residual Learning for Image Recognition’ aka ResNet. Two modes: ● Full training – starts from scratch ● Transfer learning – starts from a model trained on ImageNet ○ different options available – 18-200 layers. Other handy features: image resizing, augmentation. Not good: nasty checkpointing, no early stopping.
  • 21. TensorFlow on SageMaker You provide a python* script with functions: ● model_fn ● train_input_fn ● eval_input_fn ● serving_input_fn SageMaker does the rest (including TensorBoard in training mode). * 😟 Lack of Python 3 support (so far) Similar to TF Estimator API
  • 22. SageMaker – Custom Algorithms Bring Your Own Docker Image Training Inference SageMaker provides: ● JSON with configurations of data & cluster & hyperparameters ● mount data files (or pipes) ● model artifacts SageMaker expects: ● train entry point ● model artifacts after completion ● serve entry point ● responds to /invocations and /ping on port 8080 SageMaker does not provide: ● access to the container, e.g., no TensorBoard without tricks 😕
  • 23. SageMaker – Automatic Model Tuning Implementation of Bayesian optimization for hyperparameter value search Good: ● easy-to-use, like Sklearn GridSearch ● supports BYO/custom models ● faster experimentation ● no mess with provisioning instances ● no pricing overhead Not good: ● unlike Sklearn – no cross-validation ● unlike Sklearn – no fitting of an entire pipeline (e.g., including feature extractors and other transformers) ● support of spot instances won’t hurt
  • 24. SageMaker – Algorithms – Pros and Cons Not good: ‘S3-centered’, complicates fitting feature extraction into pipeline Good: Unified IO system for training & inference, + Pipe mode Model agnostic Hyperparameter Tuning Distributed training Flexible pricing for training – pay by the second All above is available for your custom algorithm
  • 25. Machine Learning on AWS – AWS DL AMIs Amazon Machine Learning Amazon SageMaker AWS Deep Learning AMIs Programming Efforts ` Flexibility Freedom! * https://en.wikipedia.org/wiki/File:Raft-slab.jpg (CC BY-SA 3.0)
  • 26. Avoid headache with EC2 instance setup: installation of Nvidia drivers, python, your favourite DL framework, compatibility warnings & errors, configuration, optimization, etc. Use cases: ● to use different data sources rather than S3 ● to implement more sophisticated data pipelines ● to implement more complex training workflows ● to run long training on Spot Instances AWS Deep Learning AMIs
  • 27. SageMaker – Jupyter Notebooks AWS value-add: ● Better authentication ● Ready-to-go environments ● Easy IAM setup ● Quick start with example notebooks ● Stop & Restore when necessary ○ Lifecycle configurations
  • 28. Notebook Best Practices / versioning - Git (cli, plugins, extensions) - Post-save hook to save .py and .html files alongside with notebook.
  • 29. Notebook Best Practices / reproducing Steps: 1. Configure variables 2. Install dependencies 3. List dependencies 4. Define constants 5. Load data 6. Load schema - During instance boot - Within notebook itself (hacky way)
  • 30. Notebook Best Practices / share & collaboration Instance Proxy Kernel Instance Kernel Notebook storage Browser Multiuser instance Kernel Kernel Browser Notebook storage SageMaker, JupyterHub Colab, Zeppelin
  • 31. ● ML Process ● Data Pipelines ● Modeling & Training ● Deployment Overview
  • 32. SageMaker – Deployment Pros: - One click - Single row & batch - Traffic split, A/B - Autoscale group Cons - EC2 instance per model - No Contracts - No shadow, replay tests - No model metrics - No model versioning Alternative: - Do it yourself on EKS/ECS - Opensource: hydrosphere.io
  • 33. Deployment Best Practice /predict input: output: JVM DL4j / TF / Other GPU CPU model v2 [ .... ] gRPC HTTP server sidecar serving requests training data stats: - min, max - range - clusters - quantiles - autoencoder
  • 34. ● Bad training data ● Bad serving data ● Training/serving data skew ● Misconfiguration ● Performance ● Concept Drift ● Adversarial input ● Upstream error Model Maintenance
  • 35. ● Data contracts ● Data Profiling ● Model Performance monitoring ● Concept drift monitoring ● Smart subsampling ● Active retraining and active learning Model Maintenance - Best practices
  • 36. Online Monitoring / Profiling infrastructure - Learn the latent space online in a stream of production inputs - Compare “online” snapshots with offline - Index encodings for analysis / audit / exploration
  • 38. ML lifecycle orchestration - Jenkins :) - AWS CodePipeline - StepFunctions - KubeFlow
  • 40. MLFlow - experiments tracking and reproducibility
  • 41. KubeFlow = argo + kube configs + ml tutorial
  • 42. Takeaways - Well architected Data Platform is a key enabler for successful ML project - AWS ML Ecosystem is a great foundation of your enterprise wide ML platform but you should know your options and extra layer to be built - ML Reproducibility and ML lifecycle management is a wild space, no off the shelf solution
  • 43. squadex.com 125 University Avenue Suite 290, Palo Alto California, 94301 Questions, details? We would be happy to answer!