SlideShare a Scribd company logo
Hadoop in the Cloud: AWS Elastic Map Reduce
• What is EMR?
• How does EMR compare to Hadoop?
• Use cases
EMR is an AWS Service
• AWS review helpful to understand
• Infiniteskills offers a course!
– http://bit.ly/learn-aws
• AWS constantly changing and evolving
http://aws.amazon.com/documentation/elasticmapreduce/
EMR Overview
• Abstracts out cluster setup & management
– Integrated provisioning, tooling, debug, monitoring
– AWS constantly tuning and optimizing
– Failed nodes automatically re-provisioned by AWS
• Reduced costs
– Clusters shut down automatically by default
– Excellent for sporadic MapReduce needs
• Integration to AWS
– Leverage cost-effective EC2 instances for processing, S3 for storage
– Monitoring done via CloudWatch
EMR Architecture
Master Instance Group
EC2
S3
Core Instance Group
EC2EC2
HDFS HDFS
Task Instance Group
EC2 EC2
EC2 EC2
• Master group controls cluster
• Core group runs DataNode &
TaskTracker daemons
• Task group runs tasks
• Can be added & removed
• S3 can be used for data input / output
• Master group coordinates core + task
activities and manages cluster state
• Core + task instances read / write to /
from S3
EMR AWS Integration
• Datastore pull / push to
– RDS
– DynamoDB
– S3
• Derived data can be stored in RedShift
– Via AWS DataPipelines
– Further post-processing
• Data can be pre-processed with Kinesis
What you give up with EMR
• Control
– Always 2-3 months behind Hadoop releases
– Cannot use CDH or HDP releases (although MapR is supported)
• Speed (if you’re not an AWS customer)
• Vendor lock-in
EMR Use Cases
• Already AWS customer
– Lots of data in S3 / DynamoDB / RDS
• Sporadic MapReduce needs
• Proof-of-concepting Hadoop
• Ease of use
– Seamless, near-infinite scale
– Simple administration
Hadoop in the Cloud: AWS Elastic Map Reduce
• What is EMR?
• How does EMR compare to Hadoop?
• Benefits & downsides
• Use cases

More Related Content

What's hot

Big data with amazon EMR - Pop-up Loft Tel Aviv
Big data with amazon EMR - Pop-up Loft Tel AvivBig data with amazon EMR - Pop-up Loft Tel Aviv
Big data with amazon EMR - Pop-up Loft Tel Aviv
Amazon Web Services
 
Apache Hadoop and Spark on AWS: Getting started with Amazon EMR - Pop-up Loft...
Apache Hadoop and Spark on AWS: Getting started with Amazon EMR - Pop-up Loft...Apache Hadoop and Spark on AWS: Getting started with Amazon EMR - Pop-up Loft...
Apache Hadoop and Spark on AWS: Getting started with Amazon EMR - Pop-up Loft...
Amazon Web Services
 
Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)
Amazon Web Services
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduce
Amazon Web Services
 
Masterclass Live: Amazon EMR
Masterclass Live: Amazon EMRMasterclass Live: Amazon EMR
Masterclass Live: Amazon EMR
Amazon Web Services
 
(BDT305) Amazon EMR Deep Dive and Best Practices
(BDT305) Amazon EMR Deep Dive and Best Practices(BDT305) Amazon EMR Deep Dive and Best Practices
(BDT305) Amazon EMR Deep Dive and Best Practices
Amazon Web Services
 
Getting Started with Amazon EMR
Getting Started with Amazon EMRGetting Started with Amazon EMR
Getting Started with Amazon EMR
Arman Iman
 
(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...
(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...
(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...
Amazon Web Services
 
Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent...
Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent...Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent...
Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent...
Amazon Web Services
 
Masterclass Webinar: Amazon Elastic MapReduce (EMR)
Masterclass Webinar: Amazon Elastic MapReduce (EMR)Masterclass Webinar: Amazon Elastic MapReduce (EMR)
Masterclass Webinar: Amazon Elastic MapReduce (EMR)
Amazon Web Services
 
Tune your Big Data Platform to Work at Scale: Taking Hadoop to the Next Level...
Tune your Big Data Platform to Work at Scale: Taking Hadoop to the Next Level...Tune your Big Data Platform to Work at Scale: Taking Hadoop to the Next Level...
Tune your Big Data Platform to Work at Scale: Taking Hadoop to the Next Level...
Amazon Web Services
 
AWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best Practices
AWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best PracticesAWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best Practices
AWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best Practices
Amazon Web Services
 
Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3
Amazon Web Services
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduce
Amazon Web Services
 
Beyond EC2 and S3
Beyond EC2 and S3Beyond EC2 and S3
Beyond EC2 and S3
Lorenzo Aiello
 
Masterclass Webinar - Amazon Elastic MapReduce (EMR)
Masterclass Webinar - Amazon Elastic MapReduce (EMR)Masterclass Webinar - Amazon Elastic MapReduce (EMR)
Masterclass Webinar - Amazon Elastic MapReduce (EMR)
Amazon Web Services
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best Practices
Amazon Web Services
 
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
Amazon Web Services
 
Data Science & Best Practices for Apache Spark on Amazon EMR
Data Science & Best Practices for Apache Spark on Amazon EMRData Science & Best Practices for Apache Spark on Amazon EMR
Data Science & Best Practices for Apache Spark on Amazon EMR
Amazon Web Services
 
Autoscaling Spark on AWS EC2 - 11th Spark London meetup
Autoscaling Spark on AWS EC2 - 11th Spark London meetupAutoscaling Spark on AWS EC2 - 11th Spark London meetup
Autoscaling Spark on AWS EC2 - 11th Spark London meetup
Rafal Kwasny
 

What's hot (20)

Big data with amazon EMR - Pop-up Loft Tel Aviv
Big data with amazon EMR - Pop-up Loft Tel AvivBig data with amazon EMR - Pop-up Loft Tel Aviv
Big data with amazon EMR - Pop-up Loft Tel Aviv
 
Apache Hadoop and Spark on AWS: Getting started with Amazon EMR - Pop-up Loft...
Apache Hadoop and Spark on AWS: Getting started with Amazon EMR - Pop-up Loft...Apache Hadoop and Spark on AWS: Getting started with Amazon EMR - Pop-up Loft...
Apache Hadoop and Spark on AWS: Getting started with Amazon EMR - Pop-up Loft...
 
Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduce
 
Masterclass Live: Amazon EMR
Masterclass Live: Amazon EMRMasterclass Live: Amazon EMR
Masterclass Live: Amazon EMR
 
(BDT305) Amazon EMR Deep Dive and Best Practices
(BDT305) Amazon EMR Deep Dive and Best Practices(BDT305) Amazon EMR Deep Dive and Best Practices
(BDT305) Amazon EMR Deep Dive and Best Practices
 
Getting Started with Amazon EMR
Getting Started with Amazon EMRGetting Started with Amazon EMR
Getting Started with Amazon EMR
 
(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...
(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...
(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...
 
Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent...
Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent...Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent...
Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent...
 
Masterclass Webinar: Amazon Elastic MapReduce (EMR)
Masterclass Webinar: Amazon Elastic MapReduce (EMR)Masterclass Webinar: Amazon Elastic MapReduce (EMR)
Masterclass Webinar: Amazon Elastic MapReduce (EMR)
 
Tune your Big Data Platform to Work at Scale: Taking Hadoop to the Next Level...
Tune your Big Data Platform to Work at Scale: Taking Hadoop to the Next Level...Tune your Big Data Platform to Work at Scale: Taking Hadoop to the Next Level...
Tune your Big Data Platform to Work at Scale: Taking Hadoop to the Next Level...
 
AWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best Practices
AWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best PracticesAWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best Practices
AWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best Practices
 
Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduce
 
Beyond EC2 and S3
Beyond EC2 and S3Beyond EC2 and S3
Beyond EC2 and S3
 
Masterclass Webinar - Amazon Elastic MapReduce (EMR)
Masterclass Webinar - Amazon Elastic MapReduce (EMR)Masterclass Webinar - Amazon Elastic MapReduce (EMR)
Masterclass Webinar - Amazon Elastic MapReduce (EMR)
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best Practices
 
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
 
Data Science & Best Practices for Apache Spark on Amazon EMR
Data Science & Best Practices for Apache Spark on Amazon EMRData Science & Best Practices for Apache Spark on Amazon EMR
Data Science & Best Practices for Apache Spark on Amazon EMR
 
Autoscaling Spark on AWS EC2 - 11th Spark London meetup
Autoscaling Spark on AWS EC2 - 11th Spark London meetupAutoscaling Spark on AWS EC2 - 11th Spark London meetup
Autoscaling Spark on AWS EC2 - 11th Spark London meetup
 

Viewers also liked

Hadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluationHadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluation
mattlieber
 
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
Alexander Dean
 
Big Data & Analytics: End to End on AWS - Technical 101
Big Data & Analytics: End to End on AWS - Technical 101Big Data & Analytics: End to End on AWS - Technical 101
Big Data & Analytics: End to End on AWS - Technical 101
Amazon Web Services
 
Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Amazon Web Services
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
Amazon Web Services Korea
 
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Amazon Web Services
 
Hands-on Labs: Getting Started with AWS - March 2017 AWS Online Tech Talks
Hands-on Labs: Getting Started with AWS  - March 2017 AWS Online Tech TalksHands-on Labs: Getting Started with AWS  - March 2017 AWS Online Tech Talks
Hands-on Labs: Getting Started with AWS - March 2017 AWS Online Tech Talks
Amazon Web Services
 
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
Amazon Web Services
 
Samsung hope for children tehran
Samsung hope for children   tehranSamsung hope for children   tehran
Samsung hope for children tehran
samsungmena
 
P1161211140
P1161211140P1161211140
P1161211140
Ashraf Aboshosha
 
P1141218183
P1141218183P1141218183
P1141218183
Ashraf Aboshosha
 
Opeb cost control siia may 2008
Opeb cost control   siia may 2008Opeb cost control   siia may 2008
Opeb cost control siia may 2008
Jim van Iwaarden
 
Nightingale Features Showcase
Nightingale Features ShowcaseNightingale Features Showcase
Nightingale Features Showcase
Martin Giger
 
Iphone Presentation Wash U042110
Iphone Presentation Wash U042110Iphone Presentation Wash U042110
Iphone Presentation Wash U042110
The Loud Few
 
Protect Your Heart
Protect Your HeartProtect Your Heart
Protect Your Heart
Pk Doctors
 
P1121102462
P1121102462P1121102462
P1121102462
Ashraf Aboshosha
 
ASC: Integrating Technology into Construction and Engineering Courses
ASC: Integrating Technology into Construction and Engineering CoursesASC: Integrating Technology into Construction and Engineering Courses
ASC: Integrating Technology into Construction and Engineering Courses
guestb8f153b
 

Viewers also liked (19)

Hadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluationHadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluation
 
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
 
Big Data & Analytics: End to End on AWS - Technical 101
Big Data & Analytics: End to End on AWS - Technical 101Big Data & Analytics: End to End on AWS - Technical 101
Big Data & Analytics: End to End on AWS - Technical 101
 
Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
 
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
 
Hands-on Labs: Getting Started with AWS - March 2017 AWS Online Tech Talks
Hands-on Labs: Getting Started with AWS  - March 2017 AWS Online Tech TalksHands-on Labs: Getting Started with AWS  - March 2017 AWS Online Tech Talks
Hands-on Labs: Getting Started with AWS - March 2017 AWS Online Tech Talks
 
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
 
Samsung hope for children tehran
Samsung hope for children   tehranSamsung hope for children   tehran
Samsung hope for children tehran
 
P1161211140
P1161211140P1161211140
P1161211140
 
P1141218183
P1141218183P1141218183
P1141218183
 
Opeb cost control siia may 2008
Opeb cost control   siia may 2008Opeb cost control   siia may 2008
Opeb cost control siia may 2008
 
Nightingale Features Showcase
Nightingale Features ShowcaseNightingale Features Showcase
Nightingale Features Showcase
 
opendatahub team
opendatahub teamopendatahub team
opendatahub team
 
Iphone Presentation Wash U042110
Iphone Presentation Wash U042110Iphone Presentation Wash U042110
Iphone Presentation Wash U042110
 
Powert point surf
Powert point surfPowert point surf
Powert point surf
 
Protect Your Heart
Protect Your HeartProtect Your Heart
Protect Your Heart
 
P1121102462
P1121102462P1121102462
P1121102462
 
ASC: Integrating Technology into Construction and Engineering Courses
ASC: Integrating Technology into Construction and Engineering CoursesASC: Integrating Technology into Construction and Engineering Courses
ASC: Integrating Technology into Construction and Engineering Courses
 

Similar to Hadoop in the cloud with AWS' EMR

Amazon Elastic Map Reduce: the concepts
Amazon Elastic Map Reduce: the conceptsAmazon Elastic Map Reduce: the concepts
Amazon Elastic Map Reduce: the concepts
Julien SIMON
 
Cloud & Native Cloud for Managers
Cloud & Native Cloud for ManagersCloud & Native Cloud for Managers
Cloud & Native Cloud for Managers
Eitan Sela
 
Introduction to AWS and Docker on ECS
Introduction to AWS and Docker on ECSIntroduction to AWS and Docker on ECS
Introduction to AWS and Docker on ECS
CloudHesive
 
AWS 101 - An Introduction to the Amazon Cloud
AWS 101  - An Introduction to the Amazon CloudAWS 101  - An Introduction to the Amazon Cloud
AWS 101 - An Introduction to the Amazon Cloud
CloudHesive
 
Cost Optimization with Spot Instances
Cost Optimization with Spot InstancesCost Optimization with Spot Instances
Cost Optimization with Spot Instances
Arun Sirimalla
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWS
Amazon Web Services
 
AWS Certified Solutions Architect Professional Course S15-S18
AWS Certified Solutions Architect Professional Course S15-S18AWS Certified Solutions Architect Professional Course S15-S18
AWS Certified Solutions Architect Professional Course S15-S18
Neal Davis
 
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMRSpark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Amazon Web Services
 
Building and scaling your containerized microservices on Amazon ECS
Building and scaling your containerized microservices on Amazon ECSBuilding and scaling your containerized microservices on Amazon ECS
Building and scaling your containerized microservices on Amazon ECS
Amazon Web Services
 
Introduction to Batch Processing on AWS
Introduction to Batch Processing on AWSIntroduction to Batch Processing on AWS
Introduction to Batch Processing on AWS
Amazon Web Services
 
AWS Distilled
AWS DistilledAWS Distilled
AWS Distilled
Jeyaram Gurusamy
 
PASS 17 SQL Server on AWS Best Practices
PASS 17 SQL Server on AWS Best PracticesPASS 17 SQL Server on AWS Best Practices
PASS 17 SQL Server on AWS Best Practices
Amazon Web Services
 
AWS Black Belt Tips
AWS Black Belt TipsAWS Black Belt Tips
AWS Black Belt Tips
Amazon Web Services
 
AWS Black Belt Tips
AWS Black Belt TipsAWS Black Belt Tips
AWS Black Belt Tips
Amazon Web Services
 
EMR Training
EMR TrainingEMR Training
EMR Training
vishal192091
 
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMRSpark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Amazon Web Services
 
PHP LAMP AWS RightSscale
PHP LAMP AWS RightSscalePHP LAMP AWS RightSscale
PHP LAMP AWS RightSscale
maxgribov
 
AWS Black Belt Tips
AWS Black Belt TipsAWS Black Belt Tips
AWS Black Belt Tips
Amazon Web Services
 
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
Amazon Web Services Japan
 
Accelerate SQL Server Migration to the AWS Cloud
Accelerate SQL Server Migration to the AWS Cloud Accelerate SQL Server Migration to the AWS Cloud
Accelerate SQL Server Migration to the AWS Cloud
Datavail
 

Similar to Hadoop in the cloud with AWS' EMR (20)

Amazon Elastic Map Reduce: the concepts
Amazon Elastic Map Reduce: the conceptsAmazon Elastic Map Reduce: the concepts
Amazon Elastic Map Reduce: the concepts
 
Cloud & Native Cloud for Managers
Cloud & Native Cloud for ManagersCloud & Native Cloud for Managers
Cloud & Native Cloud for Managers
 
Introduction to AWS and Docker on ECS
Introduction to AWS and Docker on ECSIntroduction to AWS and Docker on ECS
Introduction to AWS and Docker on ECS
 
AWS 101 - An Introduction to the Amazon Cloud
AWS 101  - An Introduction to the Amazon CloudAWS 101  - An Introduction to the Amazon Cloud
AWS 101 - An Introduction to the Amazon Cloud
 
Cost Optimization with Spot Instances
Cost Optimization with Spot InstancesCost Optimization with Spot Instances
Cost Optimization with Spot Instances
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWS
 
AWS Certified Solutions Architect Professional Course S15-S18
AWS Certified Solutions Architect Professional Course S15-S18AWS Certified Solutions Architect Professional Course S15-S18
AWS Certified Solutions Architect Professional Course S15-S18
 
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMRSpark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMR
 
Building and scaling your containerized microservices on Amazon ECS
Building and scaling your containerized microservices on Amazon ECSBuilding and scaling your containerized microservices on Amazon ECS
Building and scaling your containerized microservices on Amazon ECS
 
Introduction to Batch Processing on AWS
Introduction to Batch Processing on AWSIntroduction to Batch Processing on AWS
Introduction to Batch Processing on AWS
 
AWS Distilled
AWS DistilledAWS Distilled
AWS Distilled
 
PASS 17 SQL Server on AWS Best Practices
PASS 17 SQL Server on AWS Best PracticesPASS 17 SQL Server on AWS Best Practices
PASS 17 SQL Server on AWS Best Practices
 
AWS Black Belt Tips
AWS Black Belt TipsAWS Black Belt Tips
AWS Black Belt Tips
 
AWS Black Belt Tips
AWS Black Belt TipsAWS Black Belt Tips
AWS Black Belt Tips
 
EMR Training
EMR TrainingEMR Training
EMR Training
 
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMRSpark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMR
 
PHP LAMP AWS RightSscale
PHP LAMP AWS RightSscalePHP LAMP AWS RightSscale
PHP LAMP AWS RightSscale
 
AWS Black Belt Tips
AWS Black Belt TipsAWS Black Belt Tips
AWS Black Belt Tips
 
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
 
Accelerate SQL Server Migration to the AWS Cloud
Accelerate SQL Server Migration to the AWS Cloud Accelerate SQL Server Migration to the AWS Cloud
Accelerate SQL Server Migration to the AWS Cloud
 

More from rICh morrow

IoT Stream Conf Keynote: Past, Present and Future of IoT
IoT Stream Conf Keynote: Past, Present and Future of IoTIoT Stream Conf Keynote: Past, Present and Future of IoT
IoT Stream Conf Keynote: Past, Present and Future of IoT
rICh morrow
 
PHP from soup to nuts Course Deck
PHP from soup to nuts Course DeckPHP from soup to nuts Course Deck
PHP from soup to nuts Course Deck
rICh morrow
 
"PHP from soup to nuts" -- lab exercises
"PHP from soup to nuts" -- lab exercises"PHP from soup to nuts" -- lab exercises
"PHP from soup to nuts" -- lab exercises
rICh morrow
 
EC2 Pricing Model (deck 0307 of the InfiniteSkills AWS course at http://bit.l...
EC2 Pricing Model (deck 0307 of the InfiniteSkills AWS course at http://bit.l...EC2 Pricing Model (deck 0307 of the InfiniteSkills AWS course at http://bit.l...
EC2 Pricing Model (deck 0307 of the InfiniteSkills AWS course at http://bit.l...
rICh morrow
 
No sql distilled-distilled
No sql distilled-distilledNo sql distilled-distilled
No sql distilled-distilled
rICh morrow
 
quicloud Apr 20 2010 Boulder New Tech Presentation
quicloud Apr 20 2010 Boulder New Tech Presentationquicloud Apr 20 2010 Boulder New Tech Presentation
quicloud Apr 20 2010 Boulder New Tech Presentation
rICh morrow
 

More from rICh morrow (6)

IoT Stream Conf Keynote: Past, Present and Future of IoT
IoT Stream Conf Keynote: Past, Present and Future of IoTIoT Stream Conf Keynote: Past, Present and Future of IoT
IoT Stream Conf Keynote: Past, Present and Future of IoT
 
PHP from soup to nuts Course Deck
PHP from soup to nuts Course DeckPHP from soup to nuts Course Deck
PHP from soup to nuts Course Deck
 
"PHP from soup to nuts" -- lab exercises
"PHP from soup to nuts" -- lab exercises"PHP from soup to nuts" -- lab exercises
"PHP from soup to nuts" -- lab exercises
 
EC2 Pricing Model (deck 0307 of the InfiniteSkills AWS course at http://bit.l...
EC2 Pricing Model (deck 0307 of the InfiniteSkills AWS course at http://bit.l...EC2 Pricing Model (deck 0307 of the InfiniteSkills AWS course at http://bit.l...
EC2 Pricing Model (deck 0307 of the InfiniteSkills AWS course at http://bit.l...
 
No sql distilled-distilled
No sql distilled-distilledNo sql distilled-distilled
No sql distilled-distilled
 
quicloud Apr 20 2010 Boulder New Tech Presentation
quicloud Apr 20 2010 Boulder New Tech Presentationquicloud Apr 20 2010 Boulder New Tech Presentation
quicloud Apr 20 2010 Boulder New Tech Presentation
 

Recently uploaded

Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 

Recently uploaded (20)

Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 

Hadoop in the cloud with AWS' EMR

  • 1. Hadoop in the Cloud: AWS Elastic Map Reduce • What is EMR? • How does EMR compare to Hadoop? • Use cases
  • 2. EMR is an AWS Service • AWS review helpful to understand • Infiniteskills offers a course! – http://bit.ly/learn-aws • AWS constantly changing and evolving http://aws.amazon.com/documentation/elasticmapreduce/
  • 3. EMR Overview • Abstracts out cluster setup & management – Integrated provisioning, tooling, debug, monitoring – AWS constantly tuning and optimizing – Failed nodes automatically re-provisioned by AWS • Reduced costs – Clusters shut down automatically by default – Excellent for sporadic MapReduce needs • Integration to AWS – Leverage cost-effective EC2 instances for processing, S3 for storage – Monitoring done via CloudWatch
  • 4. EMR Architecture Master Instance Group EC2 S3 Core Instance Group EC2EC2 HDFS HDFS Task Instance Group EC2 EC2 EC2 EC2 • Master group controls cluster • Core group runs DataNode & TaskTracker daemons • Task group runs tasks • Can be added & removed • S3 can be used for data input / output • Master group coordinates core + task activities and manages cluster state • Core + task instances read / write to / from S3
  • 5. EMR AWS Integration • Datastore pull / push to – RDS – DynamoDB – S3 • Derived data can be stored in RedShift – Via AWS DataPipelines – Further post-processing • Data can be pre-processed with Kinesis
  • 6. What you give up with EMR • Control – Always 2-3 months behind Hadoop releases – Cannot use CDH or HDP releases (although MapR is supported) • Speed (if you’re not an AWS customer) • Vendor lock-in
  • 7. EMR Use Cases • Already AWS customer – Lots of data in S3 / DynamoDB / RDS • Sporadic MapReduce needs • Proof-of-concepting Hadoop • Ease of use – Seamless, near-infinite scale – Simple administration
  • 8. Hadoop in the Cloud: AWS Elastic Map Reduce • What is EMR? • How does EMR compare to Hadoop? • Benefits & downsides • Use cases