SlideShare a Scribd company logo
1 of 38
Download to read offline
Introduction to
AWS SageMaker
Introduction
Applied
Machine Learning
Data Engineering Training & EducationConversational Interfaces
Machine Learning Overview
Machine Learning Workflow
What is SageMaker?
● Machine Learning as a Service
● A cloud-based training and deployment framework
● Hosted notebooks for interactive development
● A set of optimised algorithms and open-source containers
Notebook Jobs Models Endpoint
Jupyter Notebooks
● Open source web app
● Create and share documents
● live code, visualisations,
documentation
● Data exploration, cleaning &
transformation
● Statistical modelling
● Numerical simulation
● … and machine learning
Flexible Compute
Flexible Storage
● Share workspaces with teams
● Persist data between sessions/instances
● Availability & reliability
● Dynamically scalable storage
● Large file support
● Read-after-write consistency
● Low-ish latency
● High throughput
Big Data
Clean, Fetch, Prepare Big Data (>100GB)
● Kinesis Firehose & Data Pipeline - ETL
● Glue - Catalogue, Crawl S3, ETL
● EMR/Hadoop/YARN - Distributed Compute and Storage
● Spark - Data Science Friendly Distributed Compute
● Athena / Redshift Spectrum - Use SQL with S3 data
Training in SageMaker
● Scalable Algorithms
● Streaming Data
● Incremental Training
● Containerized
● Accelerated Computing
Out of the Box Algorithms
Supervised Learning
LinearLearner
XGBoost
FactorisationMachine
Forecasting
Deep AR
Text Mining
BlazingText (word2vec)
Neural Topic Modelling
Latent Dirichlet Allocation
seq2seq
Computer Vision
ImageClassification
Anomaly Detection
Random Cut Forest
Unsupervised Learning
K Means
PCA
Exercising SageMaker
The Problem
NYC Taxicab Dataset
● 1.3 Billion Taxi Trips
● 2009 Onwards
● 330 GB
● Pickup longitude and latitude provided
● AWS Public Dataset (Sits in S3)
Demo: Clustering
Notebooks vs Distributed Training
scikit-learn
● 1 month of data (2GB)
● 13 million rows
● 7 minutes to train (50c)
● 1 x m4.16xlarge
● sklearn.cluster.KMeans
Sagemaker
● 1 year of data (24GB)
● 172 million rows
● 10 minutes to train ($3)
● 4 x m4.16xlarge
● sagemaker.KMeans
Out-of-the-Box vs Custom Algorithms
Out-of-the-Box (SageMaker k-means)
● Pay only for hours used
● Data lives in S3
● Data Scientist provisions
compute
● Easy to host models
● Limited Model Options
Custom (eg. scikit-learn)
● Use any model you like
● Distributed Training is
Hard
● Convert models
● Build Containers
TensorFlow
Image Classification
CIFAR-10
● 60000 32x32 colour
images
● 10 classes
● 6000 images per class
● 50000 training images
● 10000 test images
Demo: Train Locally
Local Limitations
● Compute
● Storage
● Memory
● Scaling
● Reliability
● Availability
More Data,
More Iterations,
More Compute
Demo: Train Remotely
Performance vs Cost
Time vs Cost for 1m Iterations
$2228
>1000 hours!
Steps per $
Review
6 Months of SageMaker
● Very large datasets
● Incremental training
● Scaling “Time to Solution”
● Simple deployment of models
● Easy hosted notebooks
● No auto hyperparameter
tuning
● Local mode needs
improvement
● Evolving TensorBoard support
Notebook Jobs Models Endpoint
Questions?
Thanks
Pricing
Alternatives
Google CloudML
● Integrated into GCP
● Efficient hyperparameter
search
● Mature (as much as things
are in ML)
KubeFlow
● Defines TfJob in K8s
● Allows easy creation of a TF
cluster
● Immature (but promising)
● Cloud Agnostic

More Related Content

What's hot

Getting Started with BigQuery ML
Getting Started with BigQuery MLGetting Started with BigQuery ML
Getting Started with BigQuery MLDan Sullivan, Ph.D.
 
Ruby onrails overview
Ruby onrails overviewRuby onrails overview
Ruby onrails overviewPiyush Chand
 
Running DSpace: Technical overview, lessons learned, workflows and essential ...
Running DSpace: Technical overview, lessons learned, workflows and essential ...Running DSpace: Technical overview, lessons learned, workflows and essential ...
Running DSpace: Technical overview, lessons learned, workflows and essential ...ILRI
 
Oasis montaj workshop session 1
Oasis montaj workshop session 1Oasis montaj workshop session 1
Oasis montaj workshop session 1Amin khalil
 
Cassandra data access
Cassandra data accessCassandra data access
Cassandra data accesstechblog
 
Scaling drupal on amazon web services dr
Scaling drupal on amazon web services drScaling drupal on amazon web services dr
Scaling drupal on amazon web services drTristan Roddis
 
Handle TBs with $1500 per month
Handle TBs with $1500 per monthHandle TBs with $1500 per month
Handle TBs with $1500 per monthHung Lin
 
Data2breakfast - Introduction à la base de données NoSQL Apache Cassandra
Data2breakfast - Introduction à  la base de données NoSQL Apache CassandraData2breakfast - Introduction à  la base de données NoSQL Apache Cassandra
Data2breakfast - Introduction à la base de données NoSQL Apache CassandraData2B
 
Introducing MagnetoDB, a key-value storage sevice for OpenStack
Introducing MagnetoDB, a key-value storage sevice for OpenStackIntroducing MagnetoDB, a key-value storage sevice for OpenStack
Introducing MagnetoDB, a key-value storage sevice for OpenStackMirantis
 
Future of ai on the jvm
Future of ai on the jvmFuture of ai on the jvm
Future of ai on the jvmAdam Gibson
 
Overkill Analytics Seattle Spark Meetup
Overkill Analytics Seattle Spark MeetupOverkill Analytics Seattle Spark Meetup
Overkill Analytics Seattle Spark MeetupClaudiu Barbura
 
Cassandra Lunch #59 Functions in Cassandra
Cassandra Lunch #59  Functions in CassandraCassandra Lunch #59  Functions in Cassandra
Cassandra Lunch #59 Functions in CassandraAnant Corporation
 
Azure intoduksjon for it pro 02 data protection public
Azure intoduksjon for it pro 02 data protection publicAzure intoduksjon for it pro 02 data protection public
Azure intoduksjon for it pro 02 data protection publicMorgan Simonsen
 
Empowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with AlternatorEmpowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with AlternatorScyllaDB
 

What's hot (20)

Getting Started with BigQuery ML
Getting Started with BigQuery MLGetting Started with BigQuery ML
Getting Started with BigQuery ML
 
Ruby onrails overview
Ruby onrails overviewRuby onrails overview
Ruby onrails overview
 
Graph Databases at Netflix
Graph Databases at NetflixGraph Databases at Netflix
Graph Databases at Netflix
 
MongoDB SF Ruby
MongoDB SF RubyMongoDB SF Ruby
MongoDB SF Ruby
 
Mashing the data
Mashing the dataMashing the data
Mashing the data
 
Running DSpace: Technical overview, lessons learned, workflows and essential ...
Running DSpace: Technical overview, lessons learned, workflows and essential ...Running DSpace: Technical overview, lessons learned, workflows and essential ...
Running DSpace: Technical overview, lessons learned, workflows and essential ...
 
Oasis montaj workshop session 1
Oasis montaj workshop session 1Oasis montaj workshop session 1
Oasis montaj workshop session 1
 
Cassandra data access
Cassandra data accessCassandra data access
Cassandra data access
 
NoSQL Talk at eBuddy
NoSQL Talk at eBuddyNoSQL Talk at eBuddy
NoSQL Talk at eBuddy
 
Scaling drupal on amazon web services dr
Scaling drupal on amazon web services drScaling drupal on amazon web services dr
Scaling drupal on amazon web services dr
 
Handle TBs with $1500 per month
Handle TBs with $1500 per monthHandle TBs with $1500 per month
Handle TBs with $1500 per month
 
Netflix Data Benchmark @ HPTS 2017
Netflix Data Benchmark @ HPTS 2017Netflix Data Benchmark @ HPTS 2017
Netflix Data Benchmark @ HPTS 2017
 
Data2breakfast - Introduction à la base de données NoSQL Apache Cassandra
Data2breakfast - Introduction à  la base de données NoSQL Apache CassandraData2breakfast - Introduction à  la base de données NoSQL Apache Cassandra
Data2breakfast - Introduction à la base de données NoSQL Apache Cassandra
 
Introducing MagnetoDB, a key-value storage sevice for OpenStack
Introducing MagnetoDB, a key-value storage sevice for OpenStackIntroducing MagnetoDB, a key-value storage sevice for OpenStack
Introducing MagnetoDB, a key-value storage sevice for OpenStack
 
Future of ai on the jvm
Future of ai on the jvmFuture of ai on the jvm
Future of ai on the jvm
 
Mongo db
Mongo dbMongo db
Mongo db
 
Overkill Analytics Seattle Spark Meetup
Overkill Analytics Seattle Spark MeetupOverkill Analytics Seattle Spark Meetup
Overkill Analytics Seattle Spark Meetup
 
Cassandra Lunch #59 Functions in Cassandra
Cassandra Lunch #59  Functions in CassandraCassandra Lunch #59  Functions in Cassandra
Cassandra Lunch #59 Functions in Cassandra
 
Azure intoduksjon for it pro 02 data protection public
Azure intoduksjon for it pro 02 data protection publicAzure intoduksjon for it pro 02 data protection public
Azure intoduksjon for it pro 02 data protection public
 
Empowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with AlternatorEmpowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with Alternator
 

Similar to Sagemaker Brownbag

AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | EnglishOmid Vahdaty
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3 Omid Vahdaty
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1Ruslan Meshenberg
 
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...Anna Ossowski
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated MLMark Tabladillo
 
Summer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpointSummer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpointChristopher Dubois
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AIJames Serra
 
ML Infrastracture @ Dropbox
ML Infrastracture @ Dropbox ML Infrastracture @ Dropbox
ML Infrastracture @ Dropbox Tsahi Glik
 
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...SQUADEX
 
Nikhil summer internship 2016
Nikhil   summer internship 2016Nikhil   summer internship 2016
Nikhil summer internship 2016Nikhil Shekhar
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsZhenxiao Luo
 
Public Cloud Workshop
Public Cloud WorkshopPublic Cloud Workshop
Public Cloud WorkshopAmer Ather
 
Optimizing spark based data pipelines - are you up for it?
Optimizing spark based data pipelines - are you up for it?Optimizing spark based data pipelines - are you up for it?
Optimizing spark based data pipelines - are you up for it?Etti Gur
 
Amazon Athena Hands-On Workshop
Amazon Athena Hands-On WorkshopAmazon Athena Hands-On Workshop
Amazon Athena Hands-On WorkshopDoiT International
 
Time Series Analytics Azure ADX
Time Series Analytics Azure ADXTime Series Analytics Azure ADX
Time Series Analytics Azure ADXRiccardo Zamana
 
Automating using Ansible
Automating using AnsibleAutomating using Ansible
Automating using AnsibleAlok Patra
 
Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016aspyker
 
Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016Sharma Podila
 

Similar to Sagemaker Brownbag (20)

AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated ML
 
Summer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpointSummer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpoint
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
 
ML Infrastracture @ Dropbox
ML Infrastracture @ Dropbox ML Infrastracture @ Dropbox
ML Infrastracture @ Dropbox
 
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
 
Nikhil summer internship 2016
Nikhil   summer internship 2016Nikhil   summer internship 2016
Nikhil summer internship 2016
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
 
Public Cloud Workshop
Public Cloud WorkshopPublic Cloud Workshop
Public Cloud Workshop
 
Optimizing spark based data pipelines - are you up for it?
Optimizing spark based data pipelines - are you up for it?Optimizing spark based data pipelines - are you up for it?
Optimizing spark based data pipelines - are you up for it?
 
Amazon Athena Hands-On Workshop
Amazon Athena Hands-On WorkshopAmazon Athena Hands-On Workshop
Amazon Athena Hands-On Workshop
 
Running Cassandra in AWS
Running Cassandra in AWSRunning Cassandra in AWS
Running Cassandra in AWS
 
Time Series Analytics Azure ADX
Time Series Analytics Azure ADXTime Series Analytics Azure ADX
Time Series Analytics Azure ADX
 
Automating using Ansible
Automating using AnsibleAutomating using Ansible
Automating using Ansible
 
Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016
 
Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016
 

Recently uploaded

Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 

Recently uploaded (20)

Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 

Sagemaker Brownbag

  • 2. Introduction Applied Machine Learning Data Engineering Training & EducationConversational Interfaces
  • 4.
  • 5.
  • 7. What is SageMaker? ● Machine Learning as a Service ● A cloud-based training and deployment framework ● Hosted notebooks for interactive development ● A set of optimised algorithms and open-source containers Notebook Jobs Models Endpoint
  • 8. Jupyter Notebooks ● Open source web app ● Create and share documents ● live code, visualisations, documentation ● Data exploration, cleaning & transformation ● Statistical modelling ● Numerical simulation ● … and machine learning
  • 10. Flexible Storage ● Share workspaces with teams ● Persist data between sessions/instances ● Availability & reliability ● Dynamically scalable storage ● Large file support ● Read-after-write consistency ● Low-ish latency ● High throughput
  • 12. Clean, Fetch, Prepare Big Data (>100GB) ● Kinesis Firehose & Data Pipeline - ETL ● Glue - Catalogue, Crawl S3, ETL ● EMR/Hadoop/YARN - Distributed Compute and Storage ● Spark - Data Science Friendly Distributed Compute ● Athena / Redshift Spectrum - Use SQL with S3 data
  • 13. Training in SageMaker ● Scalable Algorithms ● Streaming Data ● Incremental Training ● Containerized ● Accelerated Computing
  • 14. Out of the Box Algorithms Supervised Learning LinearLearner XGBoost FactorisationMachine Forecasting Deep AR Text Mining BlazingText (word2vec) Neural Topic Modelling Latent Dirichlet Allocation seq2seq Computer Vision ImageClassification Anomaly Detection Random Cut Forest Unsupervised Learning K Means PCA
  • 17. NYC Taxicab Dataset ● 1.3 Billion Taxi Trips ● 2009 Onwards ● 330 GB ● Pickup longitude and latitude provided ● AWS Public Dataset (Sits in S3)
  • 19.
  • 20.
  • 21. Notebooks vs Distributed Training scikit-learn ● 1 month of data (2GB) ● 13 million rows ● 7 minutes to train (50c) ● 1 x m4.16xlarge ● sklearn.cluster.KMeans Sagemaker ● 1 year of data (24GB) ● 172 million rows ● 10 minutes to train ($3) ● 4 x m4.16xlarge ● sagemaker.KMeans
  • 22. Out-of-the-Box vs Custom Algorithms Out-of-the-Box (SageMaker k-means) ● Pay only for hours used ● Data lives in S3 ● Data Scientist provisions compute ● Easy to host models ● Limited Model Options Custom (eg. scikit-learn) ● Use any model you like ● Distributed Training is Hard ● Convert models ● Build Containers
  • 25. CIFAR-10 ● 60000 32x32 colour images ● 10 classes ● 6000 images per class ● 50000 training images ● 10000 test images
  • 27. Local Limitations ● Compute ● Storage ● Memory ● Scaling ● Reliability ● Availability
  • 31. Time vs Cost for 1m Iterations $2228 >1000 hours!
  • 34. 6 Months of SageMaker ● Very large datasets ● Incremental training ● Scaling “Time to Solution” ● Simple deployment of models ● Easy hosted notebooks ● No auto hyperparameter tuning ● Local mode needs improvement ● Evolving TensorBoard support Notebook Jobs Models Endpoint
  • 38. Alternatives Google CloudML ● Integrated into GCP ● Efficient hyperparameter search ● Mature (as much as things are in ML) KubeFlow ● Defines TfJob in K8s ● Allows easy creation of a TF cluster ● Immature (but promising) ● Cloud Agnostic