SlideShare a Scribd company logo
1 of 17
Download to read offline
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Migrating on-premises Apache Spark
and Hive to Amazon EMR
Radhika Ravirala
Specialist Solutions Architect
Data and Analytics
AWS
A D B 3 0 4
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Hadoop/Spark deployments on-premises
Tightly coupled
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Compute and storage grow together
Storage grows along with compute
Compute requirements vary
Tightly coupled
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Underutilized or scarce resources
0
20
40
60
80
100
120
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Underutilized or scarce resources
0
20
40
60
80
100
120
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Re-processingWeekly peaks
Steady state
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Contention for same resources
Compute
bound
Memory
bound
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Separation of resources creates data silos
Team A
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Replication adds to cost
Single data center
3x
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Application point of view
Large scale transformation: Map/Reduce, Hive, Pig, Spark
Interactive queries: Impala, Spark SQL, Presto
Machine learning: Spark ML, MxNet, TensorFlow
Interactive notebooks: Jupyter, Zeppelin
NoSQL: HBase
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Swim lane of jobs
Type of Job /Time 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
EDR
CDR
CDR JavaMR
LSR Load Load Load Load Load Load Load Load Load Load Load Load
Aggregation
LSR Rotation
Performance Performance
RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC
BQR BQR
Others
10EDR DailyAggregations10EDR DailyAggregations
58Hive Extracts- CDR
Others
Over utilized
Under utilized
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Decouple storage and compute
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Amazon Simple Storage Service (Amazon S3) is your persistent
data store
11 nines of durability
Low cost
Life cycle policies
Versioning
Distributed by default
EMR FS
Amazon S3
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Benefit 1: Switch off clusters
Amazon S3Amazon S3
Amazon S3
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Separate them to run on persistent or long-running clusters
Type of Job /Time 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
EDR
CDR
CDR JavaMR
LSR Load Load Load Load Load Load Load Load Load Load Load Load
Aggregation
LSR Rotation
Performance Performance
RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC
BQR BQR
Others
10EDR DailyAggregations10EDR DailyAggregations
58Hive Extracts- CDR
Others
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Run a long-running cluster
Amazon EMR Cluster
Benefit 2: Autoscale persistent clusters to save costs
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Benefit 3: Logical separation of jobs
Hive, Pig,
Cascading
Prod
Presto Ad-Hoc
Amazon S3
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Benefit 4: Disaster recovery built in
Cluster 1 Cluster 2
Cluster 3 Cluster 4
Amazon S3
Availability Zone
Availability Zone

More Related Content

What's hot

Amazon Virtual Private Cloud (VPC): Networking Fundamentals and Connectivity ...
Amazon Virtual Private Cloud (VPC): Networking Fundamentals and Connectivity ...Amazon Virtual Private Cloud (VPC): Networking Fundamentals and Connectivity ...
Amazon Virtual Private Cloud (VPC): Networking Fundamentals and Connectivity ...Amazon Web Services
 
Software Testing Interview Questions & Answers | Edureka
Software Testing Interview Questions & Answers | EdurekaSoftware Testing Interview Questions & Answers | Edureka
Software Testing Interview Questions & Answers | EdurekaEdureka!
 
Using Asterisk and Kamailio for Reliable, Scalable and Secure Communication S...
Using Asterisk and Kamailio for Reliable, Scalable and Secure Communication S...Using Asterisk and Kamailio for Reliable, Scalable and Secure Communication S...
Using Asterisk and Kamailio for Reliable, Scalable and Secure Communication S...Fred Posner
 
Wicked rugby
Wicked rugbyWicked rugby
Wicked rugbyDklumb4
 
Atlassian jira как полностью раскрыть возможности
Atlassian jira   как полностью раскрыть возможностиAtlassian jira   как полностью раскрыть возможности
Atlassian jira как полностью раскрыть возможностиAndrew Fadeev
 
[오픈소스컨설팅]Virtualization kvm-rhev
[오픈소스컨설팅]Virtualization kvm-rhev[오픈소스컨설팅]Virtualization kvm-rhev
[오픈소스컨설팅]Virtualization kvm-rhevJi-Woong Choi
 
Building Serverless Web Applications
Building Serverless Web Applications Building Serverless Web Applications
Building Serverless Web Applications Amazon Web Services
 
A+ Chapter 3 Review
A+ Chapter 3 ReviewA+ Chapter 3 Review
A+ Chapter 3 ReviewAmy McMullin
 
Aws Developer Associate Overview
Aws Developer Associate OverviewAws Developer Associate Overview
Aws Developer Associate OverviewAbhi Jain
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
Mobile Automation with Appium
Mobile Automation with AppiumMobile Automation with Appium
Mobile Automation with AppiumManoj Kumar Kumar
 
AWS Certification | AWS Architect Certification Training | AWS Tutorial | AWS...
AWS Certification | AWS Architect Certification Training | AWS Tutorial | AWS...AWS Certification | AWS Architect Certification Training | AWS Tutorial | AWS...
AWS Certification | AWS Architect Certification Training | AWS Tutorial | AWS...Edureka!
 
Top 20 best automation testing tools
Top 20 best automation testing toolsTop 20 best automation testing tools
Top 20 best automation testing toolsQACraft
 
Continuous Quality with Postman
Continuous Quality with PostmanContinuous Quality with Postman
Continuous Quality with PostmanPostman
 
Azure Arc - Managing Hybrid and Multi-Cloud Platforms
Azure Arc - Managing Hybrid and Multi-Cloud PlatformsAzure Arc - Managing Hybrid and Multi-Cloud Platforms
Azure Arc - Managing Hybrid and Multi-Cloud PlatformsWinWire Technologies Inc
 

What's hot (20)

Amazon Virtual Private Cloud (VPC): Networking Fundamentals and Connectivity ...
Amazon Virtual Private Cloud (VPC): Networking Fundamentals and Connectivity ...Amazon Virtual Private Cloud (VPC): Networking Fundamentals and Connectivity ...
Amazon Virtual Private Cloud (VPC): Networking Fundamentals and Connectivity ...
 
Software Testing Interview Questions & Answers | Edureka
Software Testing Interview Questions & Answers | EdurekaSoftware Testing Interview Questions & Answers | Edureka
Software Testing Interview Questions & Answers | Edureka
 
Using Asterisk and Kamailio for Reliable, Scalable and Secure Communication S...
Using Asterisk and Kamailio for Reliable, Scalable and Secure Communication S...Using Asterisk and Kamailio for Reliable, Scalable and Secure Communication S...
Using Asterisk and Kamailio for Reliable, Scalable and Secure Communication S...
 
Wicked rugby
Wicked rugbyWicked rugby
Wicked rugby
 
Atlassian jira как полностью раскрыть возможности
Atlassian jira   как полностью раскрыть возможностиAtlassian jira   как полностью раскрыть возможности
Atlassian jira как полностью раскрыть возможности
 
[오픈소스컨설팅]Virtualization kvm-rhev
[오픈소스컨설팅]Virtualization kvm-rhev[오픈소스컨설팅]Virtualization kvm-rhev
[오픈소스컨설팅]Virtualization kvm-rhev
 
Building Serverless Web Applications
Building Serverless Web Applications Building Serverless Web Applications
Building Serverless Web Applications
 
A+ Chapter 3 Review
A+ Chapter 3 ReviewA+ Chapter 3 Review
A+ Chapter 3 Review
 
Aws Developer Associate Overview
Aws Developer Associate OverviewAws Developer Associate Overview
Aws Developer Associate Overview
 
SRV321 Deep Dive on Amazon EBS
 SRV321 Deep Dive on Amazon EBS SRV321 Deep Dive on Amazon EBS
SRV321 Deep Dive on Amazon EBS
 
What is shodan
What is shodanWhat is shodan
What is shodan
 
Wazuh Security Platform
Wazuh Security PlatformWazuh Security Platform
Wazuh Security Platform
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Mobile Automation with Appium
Mobile Automation with AppiumMobile Automation with Appium
Mobile Automation with Appium
 
AWS Certification | AWS Architect Certification Training | AWS Tutorial | AWS...
AWS Certification | AWS Architect Certification Training | AWS Tutorial | AWS...AWS Certification | AWS Architect Certification Training | AWS Tutorial | AWS...
AWS Certification | AWS Architect Certification Training | AWS Tutorial | AWS...
 
Top 20 best automation testing tools
Top 20 best automation testing toolsTop 20 best automation testing tools
Top 20 best automation testing tools
 
AWS RDS
AWS RDSAWS RDS
AWS RDS
 
Security Best Practices
Security Best PracticesSecurity Best Practices
Security Best Practices
 
Continuous Quality with Postman
Continuous Quality with PostmanContinuous Quality with Postman
Continuous Quality with Postman
 
Azure Arc - Managing Hybrid and Multi-Cloud Platforms
Azure Arc - Managing Hybrid and Multi-Cloud PlatformsAzure Arc - Managing Hybrid and Multi-Cloud Platforms
Azure Arc - Managing Hybrid and Multi-Cloud Platforms
 

Similar to Migrating on-premises Apache Spark and Hive to Amazon EMR - ADB304 - New York AWS Summit

Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Amazon Web Services
 
Databases in the Cloud em Amazon Web Services
Databases in the Cloud em Amazon Web Services Databases in the Cloud em Amazon Web Services
Databases in the Cloud em Amazon Web Services Amazon Web Services LATAM
 
Best practices for Running Spark jobs on Amazon EMR with Spot Instances | AWS...
Best practices for Running Spark jobs on Amazon EMR with Spot Instances | AWS...Best practices for Running Spark jobs on Amazon EMR with Spot Instances | AWS...
Best practices for Running Spark jobs on Amazon EMR with Spot Instances | AWS...Amazon Web Services
 
All Databases Are Equal, But Some Databases Are More Equal than Others: How t...
All Databases Are Equal, But Some Databases Are More Equal than Others: How t...All Databases Are Equal, But Some Databases Are More Equal than Others: How t...
All Databases Are Equal, But Some Databases Are More Equal than Others: How t...javier ramirez
 
Amazon EC2 Strategie per l'ottimizzazione dei costi
Amazon EC2 Strategie per l'ottimizzazione dei costiAmazon EC2 Strategie per l'ottimizzazione dei costi
Amazon EC2 Strategie per l'ottimizzazione dei costiAmazon Web Services
 
Building a Critical Communications Platform Using Serverless Technologies
Building a Critical Communications Platform Using Serverless TechnologiesBuilding a Critical Communications Platform Using Serverless Technologies
Building a Critical Communications Platform Using Serverless TechnologiesAmazon Web Services
 
Performing serverless analytics in AWS Glue - ADB202 - Chicago AWS Summit
Performing serverless analytics in AWS Glue - ADB202 - Chicago AWS SummitPerforming serverless analytics in AWS Glue - ADB202 - Chicago AWS Summit
Performing serverless analytics in AWS Glue - ADB202 - Chicago AWS SummitAmazon Web Services
 
Introduction to AWS Global Accelerator - SVC211 - Chicago AWS Summit
Introduction to AWS Global Accelerator - SVC211 - Chicago AWS SummitIntroduction to AWS Global Accelerator - SVC211 - Chicago AWS Summit
Introduction to AWS Global Accelerator - SVC211 - Chicago AWS SummitAmazon Web Services
 
Amazon Relational Database (RDS) on VMware: Running Amazon RDS On-Premises
Amazon Relational Database (RDS) on VMware: Running Amazon RDS On-PremisesAmazon Relational Database (RDS) on VMware: Running Amazon RDS On-Premises
Amazon Relational Database (RDS) on VMware: Running Amazon RDS On-PremisesAmazon Web Services
 
Scale - Implementing a Data Warehouse on AWS
Scale - Implementing a Data Warehouse on AWSScale - Implementing a Data Warehouse on AWS
Scale - Implementing a Data Warehouse on AWSAmazon Web Services
 
Creating Serverless apps for NASA in GovCloud
Creating Serverless apps for NASA in GovCloudCreating Serverless apps for NASA in GovCloud
Creating Serverless apps for NASA in GovCloudChris Shenton
 
Migrate and Modernize Your Database
Migrate and Modernize Your DatabaseMigrate and Modernize Your Database
Migrate and Modernize Your DatabaseAmazon Web Services
 
What's new in Amazon Aurora - ADB204 - Santa Clara AWS Summit.pdf
What's new in Amazon Aurora - ADB204 - Santa Clara AWS Summit.pdfWhat's new in Amazon Aurora - ADB204 - Santa Clara AWS Summit.pdf
What's new in Amazon Aurora - ADB204 - Santa Clara AWS Summit.pdfAmazon Web Services
 
Running Lean Performant Yet Cost Optimised - AWS Summit Sydney
Running Lean Performant Yet Cost Optimised - AWS Summit SydneyRunning Lean Performant Yet Cost Optimised - AWS Summit Sydney
Running Lean Performant Yet Cost Optimised - AWS Summit SydneyAmazon Web Services
 
Data Warehousing in the Cloud - AWS Summit Sydney
Data Warehousing in the Cloud - AWS Summit SydneyData Warehousing in the Cloud - AWS Summit Sydney
Data Warehousing in the Cloud - AWS Summit SydneyAmazon Web Services
 
Run Production Workloads on Spot, Save up to 90%
Run Production Workloads on Spot, Save up to 90%Run Production Workloads on Spot, Save up to 90%
Run Production Workloads on Spot, Save up to 90%Amazon Web Services
 
AWS DevDay Berlin - Resiliency and availability design patterns for the cloud
AWS DevDay Berlin - Resiliency and availability design patterns for the cloudAWS DevDay Berlin - Resiliency and availability design patterns for the cloud
AWS DevDay Berlin - Resiliency and availability design patterns for the cloudCobus Bernard
 
Resiliency-and-Availability-Design-Patterns-for-the-Cloud
Resiliency-and-Availability-Design-Patterns-for-the-CloudResiliency-and-Availability-Design-Patterns-for-the-Cloud
Resiliency-and-Availability-Design-Patterns-for-the-CloudAmazon Web Services
 
Modern-Application-Design-with-Amazon-ECS
Modern-Application-Design-with-Amazon-ECSModern-Application-Design-with-Amazon-ECS
Modern-Application-Design-with-Amazon-ECSAmazon Web Services
 
Budget management with Cloud Economics | AWS Summit Tel Aviv 2019
Budget management with Cloud Economics | AWS Summit Tel Aviv 2019Budget management with Cloud Economics | AWS Summit Tel Aviv 2019
Budget management with Cloud Economics | AWS Summit Tel Aviv 2019AWS Summits
 

Similar to Migrating on-premises Apache Spark and Hive to Amazon EMR - ADB304 - New York AWS Summit (20)

Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
 
Databases in the Cloud em Amazon Web Services
Databases in the Cloud em Amazon Web Services Databases in the Cloud em Amazon Web Services
Databases in the Cloud em Amazon Web Services
 
Best practices for Running Spark jobs on Amazon EMR with Spot Instances | AWS...
Best practices for Running Spark jobs on Amazon EMR with Spot Instances | AWS...Best practices for Running Spark jobs on Amazon EMR with Spot Instances | AWS...
Best practices for Running Spark jobs on Amazon EMR with Spot Instances | AWS...
 
All Databases Are Equal, But Some Databases Are More Equal than Others: How t...
All Databases Are Equal, But Some Databases Are More Equal than Others: How t...All Databases Are Equal, But Some Databases Are More Equal than Others: How t...
All Databases Are Equal, But Some Databases Are More Equal than Others: How t...
 
Amazon EC2 Strategie per l'ottimizzazione dei costi
Amazon EC2 Strategie per l'ottimizzazione dei costiAmazon EC2 Strategie per l'ottimizzazione dei costi
Amazon EC2 Strategie per l'ottimizzazione dei costi
 
Building a Critical Communications Platform Using Serverless Technologies
Building a Critical Communications Platform Using Serverless TechnologiesBuilding a Critical Communications Platform Using Serverless Technologies
Building a Critical Communications Platform Using Serverless Technologies
 
Performing serverless analytics in AWS Glue - ADB202 - Chicago AWS Summit
Performing serverless analytics in AWS Glue - ADB202 - Chicago AWS SummitPerforming serverless analytics in AWS Glue - ADB202 - Chicago AWS Summit
Performing serverless analytics in AWS Glue - ADB202 - Chicago AWS Summit
 
Introduction to AWS Global Accelerator - SVC211 - Chicago AWS Summit
Introduction to AWS Global Accelerator - SVC211 - Chicago AWS SummitIntroduction to AWS Global Accelerator - SVC211 - Chicago AWS Summit
Introduction to AWS Global Accelerator - SVC211 - Chicago AWS Summit
 
Amazon Relational Database (RDS) on VMware: Running Amazon RDS On-Premises
Amazon Relational Database (RDS) on VMware: Running Amazon RDS On-PremisesAmazon Relational Database (RDS) on VMware: Running Amazon RDS On-Premises
Amazon Relational Database (RDS) on VMware: Running Amazon RDS On-Premises
 
Scale - Implementing a Data Warehouse on AWS
Scale - Implementing a Data Warehouse on AWSScale - Implementing a Data Warehouse on AWS
Scale - Implementing a Data Warehouse on AWS
 
Creating Serverless apps for NASA in GovCloud
Creating Serverless apps for NASA in GovCloudCreating Serverless apps for NASA in GovCloud
Creating Serverless apps for NASA in GovCloud
 
Migrate and Modernize Your Database
Migrate and Modernize Your DatabaseMigrate and Modernize Your Database
Migrate and Modernize Your Database
 
What's new in Amazon Aurora - ADB204 - Santa Clara AWS Summit.pdf
What's new in Amazon Aurora - ADB204 - Santa Clara AWS Summit.pdfWhat's new in Amazon Aurora - ADB204 - Santa Clara AWS Summit.pdf
What's new in Amazon Aurora - ADB204 - Santa Clara AWS Summit.pdf
 
Running Lean Performant Yet Cost Optimised - AWS Summit Sydney
Running Lean Performant Yet Cost Optimised - AWS Summit SydneyRunning Lean Performant Yet Cost Optimised - AWS Summit Sydney
Running Lean Performant Yet Cost Optimised - AWS Summit Sydney
 
Data Warehousing in the Cloud - AWS Summit Sydney
Data Warehousing in the Cloud - AWS Summit SydneyData Warehousing in the Cloud - AWS Summit Sydney
Data Warehousing in the Cloud - AWS Summit Sydney
 
Run Production Workloads on Spot, Save up to 90%
Run Production Workloads on Spot, Save up to 90%Run Production Workloads on Spot, Save up to 90%
Run Production Workloads on Spot, Save up to 90%
 
AWS DevDay Berlin - Resiliency and availability design patterns for the cloud
AWS DevDay Berlin - Resiliency and availability design patterns for the cloudAWS DevDay Berlin - Resiliency and availability design patterns for the cloud
AWS DevDay Berlin - Resiliency and availability design patterns for the cloud
 
Resiliency-and-Availability-Design-Patterns-for-the-Cloud
Resiliency-and-Availability-Design-Patterns-for-the-CloudResiliency-and-Availability-Design-Patterns-for-the-Cloud
Resiliency-and-Availability-Design-Patterns-for-the-Cloud
 
Modern-Application-Design-with-Amazon-ECS
Modern-Application-Design-with-Amazon-ECSModern-Application-Design-with-Amazon-ECS
Modern-Application-Design-with-Amazon-ECS
 
Budget management with Cloud Economics | AWS Summit Tel Aviv 2019
Budget management with Cloud Economics | AWS Summit Tel Aviv 2019Budget management with Cloud Economics | AWS Summit Tel Aviv 2019
Budget management with Cloud Economics | AWS Summit Tel Aviv 2019
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 
Come costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSCome costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 
Come costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSCome costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWS
 

Migrating on-premises Apache Spark and Hive to Amazon EMR - ADB304 - New York AWS Summit

  • 1. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Migrating on-premises Apache Spark and Hive to Amazon EMR Radhika Ravirala Specialist Solutions Architect Data and Analytics AWS A D B 3 0 4
  • 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Hadoop/Spark deployments on-premises Tightly coupled
  • 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Compute and storage grow together Storage grows along with compute Compute requirements vary Tightly coupled
  • 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Underutilized or scarce resources 0 20 40 60 80 100 120 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
  • 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Underutilized or scarce resources 0 20 40 60 80 100 120 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Re-processingWeekly peaks Steady state
  • 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Contention for same resources Compute bound Memory bound
  • 7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Separation of resources creates data silos Team A
  • 8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Replication adds to cost Single data center 3x
  • 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Application point of view Large scale transformation: Map/Reduce, Hive, Pig, Spark Interactive queries: Impala, Spark SQL, Presto Machine learning: Spark ML, MxNet, TensorFlow Interactive notebooks: Jupyter, Zeppelin NoSQL: HBase
  • 10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Swim lane of jobs Type of Job /Time 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 EDR CDR CDR JavaMR LSR Load Load Load Load Load Load Load Load Load Load Load Load Aggregation LSR Rotation Performance Performance RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC BQR BQR Others 10EDR DailyAggregations10EDR DailyAggregations 58Hive Extracts- CDR Others Over utilized Under utilized
  • 11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Decouple storage and compute
  • 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon Simple Storage Service (Amazon S3) is your persistent data store 11 nines of durability Low cost Life cycle policies Versioning Distributed by default EMR FS Amazon S3
  • 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Benefit 1: Switch off clusters Amazon S3Amazon S3 Amazon S3
  • 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Separate them to run on persistent or long-running clusters Type of Job /Time 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 EDR CDR CDR JavaMR LSR Load Load Load Load Load Load Load Load Load Load Load Load Aggregation LSR Rotation Performance Performance RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC RCC BQR BQR Others 10EDR DailyAggregations10EDR DailyAggregations 58Hive Extracts- CDR Others
  • 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Run a long-running cluster Amazon EMR Cluster Benefit 2: Autoscale persistent clusters to save costs
  • 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Benefit 3: Logical separation of jobs Hive, Pig, Cascading Prod Presto Ad-Hoc Amazon S3
  • 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Benefit 4: Disaster recovery built in Cluster 1 Cluster 2 Cluster 3 Cluster 4 Amazon S3 Availability Zone Availability Zone