SlideShare a Scribd company logo
Lower Costs on Amazon EMR: Auto Scaling, Spot Pricing, & Expert Strategies (ANT385) - AWS re:Invent 2018
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Lower Costs on Amazon EMR: Auto
Scaling, Spot Pricing, & Expert
Strategies
Bruno Faria
Senior EMR Solutions Architect
AWS Solutions Architecture
A N T 3 8 5
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon EMR pricing
With Amazon EMR, you only pay a per-second rate
for every second you use. The price is based on the
instance type and number of EC2 instances that
you deploy, and the region in which you launch
your cluster.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Reserved, Spot, and On-Demand Instances
Spot Instances
Amazon EC2 Spot Instances
offer spare compute
capacity available at
discounts compared to On-
Demand Instances.
Reserved Instances
Amazon EC2 Reserved
Instances provide you the
option to make a payment for
instances that you want to
reserve at a significant
discount compared to On-
Demand pricing.
On-Demand Instances
Amazon EC2 On-
Demand Instances are
instances that you
launch and pay by the
second.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Understanding the node types in Amazon EMR
Master node: The node that manages the cluster. The master node
tracks the status of tasks and monitors the health of the cluster.
Core node: The node that runs tasks and stores data in the Hadoop
Distributed File System (HDFS) on your cluster.
Task node: The node that only runs tasks and does not store data in
HDFS. Task nodes are optional.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Lower costs with Spot and Reserved Instances
Spot for
task nodes
Up to 80%
off EC2
On-Demand
pricing
On-demand for
core nodes
Standard
Amazon EC2
pricing for
on-demand
capacity
Meet SLA at predictable cost Exceed SLA at lower cost
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Performance and hardware
Considerations
• Transient or long running
• Instance types
• Cluster size
• Application settings
• File formats and S3 tuning
Master node
c5.2xlarge
Slave group - Core
c5.2xlarge
Slave group – Task
m5.2xlarge (EC2 Spot)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Advanced Spot Provisioning with Instance Fleets
Master node Core instance fleet Task instance fleet
• Provision from a list of instance types with Spot and On-Demand
• Launch in the most optimal Availability Zone based on capacity/price
• Spot block support
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Transient or long running workloads
Transient
Long running
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Lower costs with Auto Scaling
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Use Amazon S3 as your persistent data store
• Decouple storage and compute
• Scale up or down for your compute and storage needs
independently
• Can run transient Amazon EMR clusters with Amazon EC2
Spot Instances
• Designed for 99.999999999% durability
• No need to pay for data replication
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3 Tips
• Partition your data to reduce amount of data scanned
• Optimize file sizes to reduce amount S3 requests
• Compress data set to minimize bandwidth from S3 to EC2
• Use a columnar file format like Parquet when selecting only a subset of columns
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Bruno Faria
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

More Related Content

What's hot

Optimize Your SQL Server Licenses on Amazon Web Services (DAT210) - AWS re:In...
Optimize Your SQL Server Licenses on Amazon Web Services (DAT210) - AWS re:In...Optimize Your SQL Server Licenses on Amazon Web Services (DAT210) - AWS re:In...
Optimize Your SQL Server Licenses on Amazon Web Services (DAT210) - AWS re:In...
Amazon Web Services
 
How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...
How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...
How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...
Amazon Web Services
 
End Extra Spending Hunting for Increased Value through Cost Optimization (ENT...
End Extra Spending Hunting for Increased Value through Cost Optimization (ENT...End Extra Spending Hunting for Increased Value through Cost Optimization (ENT...
End Extra Spending Hunting for Increased Value through Cost Optimization (ENT...
Amazon Web Services
 
Engage Users in Real-Time through Event-Based Messaging (MOB322-R1) - AWS re:...
Engage Users in Real-Time through Event-Based Messaging (MOB322-R1) - AWS re:...Engage Users in Real-Time through Event-Based Messaging (MOB322-R1) - AWS re:...
Engage Users in Real-Time through Event-Based Messaging (MOB322-R1) - AWS re:...
Amazon Web Services
 
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
Amazon Web Services
 
Manage Queries, and Audit Usage & Control Costs at Scale on Amazon Athena (AN...
Manage Queries, and Audit Usage & Control Costs at Scale on Amazon Athena (AN...Manage Queries, and Audit Usage & Control Costs at Scale on Amazon Athena (AN...
Manage Queries, and Audit Usage & Control Costs at Scale on Amazon Athena (AN...
Amazon Web Services
 
Best Practices for Scalable Monitoring (ENT310-S) - AWS re:Invent 2018
Best Practices for Scalable Monitoring (ENT310-S) - AWS re:Invent 2018Best Practices for Scalable Monitoring (ENT310-S) - AWS re:Invent 2018
Best Practices for Scalable Monitoring (ENT310-S) - AWS re:Invent 2018
Amazon Web Services
 
Overview of the New Amazon EC2 Instances with AMD EPYC (CMP385-R1) - AWS re:I...
Overview of the New Amazon EC2 Instances with AMD EPYC (CMP385-R1) - AWS re:I...Overview of the New Amazon EC2 Instances with AMD EPYC (CMP385-R1) - AWS re:I...
Overview of the New Amazon EC2 Instances with AMD EPYC (CMP385-R1) - AWS re:I...
Amazon Web Services
 
Architecting Digital Media Archive Migrations with AWS (STG357) - AWS re:Inve...
Architecting Digital Media Archive Migrations with AWS (STG357) - AWS re:Inve...Architecting Digital Media Archive Migrations with AWS (STG357) - AWS re:Inve...
Architecting Digital Media Archive Migrations with AWS (STG357) - AWS re:Inve...
Amazon Web Services
 
Configuration Management and Service Discovery with AWS Lambda (SRV338-R1) - ...
Configuration Management and Service Discovery with AWS Lambda (SRV338-R1) - ...Configuration Management and Service Discovery with AWS Lambda (SRV338-R1) - ...
Configuration Management and Service Discovery with AWS Lambda (SRV338-R1) - ...
Amazon Web Services
 
Workshop: Architecting a Serverless Data Lake
Workshop: Architecting a Serverless Data LakeWorkshop: Architecting a Serverless Data Lake
Workshop: Architecting a Serverless Data Lake
Amazon Web Services
 
Using Containers and Serverless to Deploy Microservices (ARC214) - AWS re:Inv...
Using Containers and Serverless to Deploy Microservices (ARC214) - AWS re:Inv...Using Containers and Serverless to Deploy Microservices (ARC214) - AWS re:Inv...
Using Containers and Serverless to Deploy Microservices (ARC214) - AWS re:Inv...
Amazon Web Services
 
Better, Faster, Cheaper – Cost Optimizing Compute with Amazon EC2 Fleet #savi...
Better, Faster, Cheaper – Cost Optimizing Compute with Amazon EC2 Fleet #savi...Better, Faster, Cheaper – Cost Optimizing Compute with Amazon EC2 Fleet #savi...
Better, Faster, Cheaper – Cost Optimizing Compute with Amazon EC2 Fleet #savi...
Amazon Web Services
 
Automate Your Alexa Lambda Function Deployment Workflows Using AWS CodeCommit...
Automate Your Alexa Lambda Function Deployment Workflows Using AWS CodeCommit...Automate Your Alexa Lambda Function Deployment Workflows Using AWS CodeCommit...
Automate Your Alexa Lambda Function Deployment Workflows Using AWS CodeCommit...
Amazon Web Services
 
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Amazon Web Services
 
[NEW LAUNCH!] Advancing Software Procurement in a Containerized World with th...
[NEW LAUNCH!] Advancing Software Procurement in a Containerized World with th...[NEW LAUNCH!] Advancing Software Procurement in a Containerized World with th...
[NEW LAUNCH!] Advancing Software Procurement in a Containerized World with th...
Amazon Web Services
 
Bring the Power of AI to Your Amazon Connect Contact Center (BAP322-R1) - AWS...
Bring the Power of AI to Your Amazon Connect Contact Center (BAP322-R1) - AWS...Bring the Power of AI to Your Amazon Connect Contact Center (BAP322-R1) - AWS...
Bring the Power of AI to Your Amazon Connect Contact Center (BAP322-R1) - AWS...
Amazon Web Services
 
Building Your Own ML Application with AWS Lambda and Amazon SageMaker (SRV404...
Building Your Own ML Application with AWS Lambda and Amazon SageMaker (SRV404...Building Your Own ML Application with AWS Lambda and Amazon SageMaker (SRV404...
Building Your Own ML Application with AWS Lambda and Amazon SageMaker (SRV404...
Amazon Web Services
 
Industrialize Machine Learning Using CI/CD Techniques (FSV304-i) - AWS re:Inv...
Industrialize Machine Learning Using CI/CD Techniques (FSV304-i) - AWS re:Inv...Industrialize Machine Learning Using CI/CD Techniques (FSV304-i) - AWS re:Inv...
Industrialize Machine Learning Using CI/CD Techniques (FSV304-i) - AWS re:Inv...
Amazon Web Services
 
Managing Modern Infrastructure in Enterprises (ENT227-R1) - AWS re:Invent 2018
Managing Modern Infrastructure in Enterprises (ENT227-R1) - AWS re:Invent 2018Managing Modern Infrastructure in Enterprises (ENT227-R1) - AWS re:Invent 2018
Managing Modern Infrastructure in Enterprises (ENT227-R1) - AWS re:Invent 2018
Amazon Web Services
 

What's hot (20)

Optimize Your SQL Server Licenses on Amazon Web Services (DAT210) - AWS re:In...
Optimize Your SQL Server Licenses on Amazon Web Services (DAT210) - AWS re:In...Optimize Your SQL Server Licenses on Amazon Web Services (DAT210) - AWS re:In...
Optimize Your SQL Server Licenses on Amazon Web Services (DAT210) - AWS re:In...
 
How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...
How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...
How to Use Jupyter Notebooks with Amazon EMR for Better Productivity (ANT387)...
 
End Extra Spending Hunting for Increased Value through Cost Optimization (ENT...
End Extra Spending Hunting for Increased Value through Cost Optimization (ENT...End Extra Spending Hunting for Increased Value through Cost Optimization (ENT...
End Extra Spending Hunting for Increased Value through Cost Optimization (ENT...
 
Engage Users in Real-Time through Event-Based Messaging (MOB322-R1) - AWS re:...
Engage Users in Real-Time through Event-Based Messaging (MOB322-R1) - AWS re:...Engage Users in Real-Time through Event-Based Messaging (MOB322-R1) - AWS re:...
Engage Users in Real-Time through Event-Based Messaging (MOB322-R1) - AWS re:...
 
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
 
Manage Queries, and Audit Usage & Control Costs at Scale on Amazon Athena (AN...
Manage Queries, and Audit Usage & Control Costs at Scale on Amazon Athena (AN...Manage Queries, and Audit Usage & Control Costs at Scale on Amazon Athena (AN...
Manage Queries, and Audit Usage & Control Costs at Scale on Amazon Athena (AN...
 
Best Practices for Scalable Monitoring (ENT310-S) - AWS re:Invent 2018
Best Practices for Scalable Monitoring (ENT310-S) - AWS re:Invent 2018Best Practices for Scalable Monitoring (ENT310-S) - AWS re:Invent 2018
Best Practices for Scalable Monitoring (ENT310-S) - AWS re:Invent 2018
 
Overview of the New Amazon EC2 Instances with AMD EPYC (CMP385-R1) - AWS re:I...
Overview of the New Amazon EC2 Instances with AMD EPYC (CMP385-R1) - AWS re:I...Overview of the New Amazon EC2 Instances with AMD EPYC (CMP385-R1) - AWS re:I...
Overview of the New Amazon EC2 Instances with AMD EPYC (CMP385-R1) - AWS re:I...
 
Architecting Digital Media Archive Migrations with AWS (STG357) - AWS re:Inve...
Architecting Digital Media Archive Migrations with AWS (STG357) - AWS re:Inve...Architecting Digital Media Archive Migrations with AWS (STG357) - AWS re:Inve...
Architecting Digital Media Archive Migrations with AWS (STG357) - AWS re:Inve...
 
Configuration Management and Service Discovery with AWS Lambda (SRV338-R1) - ...
Configuration Management and Service Discovery with AWS Lambda (SRV338-R1) - ...Configuration Management and Service Discovery with AWS Lambda (SRV338-R1) - ...
Configuration Management and Service Discovery with AWS Lambda (SRV338-R1) - ...
 
Workshop: Architecting a Serverless Data Lake
Workshop: Architecting a Serverless Data LakeWorkshop: Architecting a Serverless Data Lake
Workshop: Architecting a Serverless Data Lake
 
Using Containers and Serverless to Deploy Microservices (ARC214) - AWS re:Inv...
Using Containers and Serverless to Deploy Microservices (ARC214) - AWS re:Inv...Using Containers and Serverless to Deploy Microservices (ARC214) - AWS re:Inv...
Using Containers and Serverless to Deploy Microservices (ARC214) - AWS re:Inv...
 
Better, Faster, Cheaper – Cost Optimizing Compute with Amazon EC2 Fleet #savi...
Better, Faster, Cheaper – Cost Optimizing Compute with Amazon EC2 Fleet #savi...Better, Faster, Cheaper – Cost Optimizing Compute with Amazon EC2 Fleet #savi...
Better, Faster, Cheaper – Cost Optimizing Compute with Amazon EC2 Fleet #savi...
 
Automate Your Alexa Lambda Function Deployment Workflows Using AWS CodeCommit...
Automate Your Alexa Lambda Function Deployment Workflows Using AWS CodeCommit...Automate Your Alexa Lambda Function Deployment Workflows Using AWS CodeCommit...
Automate Your Alexa Lambda Function Deployment Workflows Using AWS CodeCommit...
 
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
 
[NEW LAUNCH!] Advancing Software Procurement in a Containerized World with th...
[NEW LAUNCH!] Advancing Software Procurement in a Containerized World with th...[NEW LAUNCH!] Advancing Software Procurement in a Containerized World with th...
[NEW LAUNCH!] Advancing Software Procurement in a Containerized World with th...
 
Bring the Power of AI to Your Amazon Connect Contact Center (BAP322-R1) - AWS...
Bring the Power of AI to Your Amazon Connect Contact Center (BAP322-R1) - AWS...Bring the Power of AI to Your Amazon Connect Contact Center (BAP322-R1) - AWS...
Bring the Power of AI to Your Amazon Connect Contact Center (BAP322-R1) - AWS...
 
Building Your Own ML Application with AWS Lambda and Amazon SageMaker (SRV404...
Building Your Own ML Application with AWS Lambda and Amazon SageMaker (SRV404...Building Your Own ML Application with AWS Lambda and Amazon SageMaker (SRV404...
Building Your Own ML Application with AWS Lambda and Amazon SageMaker (SRV404...
 
Industrialize Machine Learning Using CI/CD Techniques (FSV304-i) - AWS re:Inv...
Industrialize Machine Learning Using CI/CD Techniques (FSV304-i) - AWS re:Inv...Industrialize Machine Learning Using CI/CD Techniques (FSV304-i) - AWS re:Inv...
Industrialize Machine Learning Using CI/CD Techniques (FSV304-i) - AWS re:Inv...
 
Managing Modern Infrastructure in Enterprises (ENT227-R1) - AWS re:Invent 2018
Managing Modern Infrastructure in Enterprises (ENT227-R1) - AWS re:Invent 2018Managing Modern Infrastructure in Enterprises (ENT227-R1) - AWS re:Invent 2018
Managing Modern Infrastructure in Enterprises (ENT227-R1) - AWS re:Invent 2018
 

Similar to Lower Costs on Amazon EMR: Auto Scaling, Spot Pricing, & Expert Strategies (ANT385) - AWS re:Invent 2018

Use Auto Scaling, Spot Pricing, and More Expert Strategies (ANT347) - AWS re:...
Use Auto Scaling, Spot Pricing, and More Expert Strategies (ANT347) - AWS re:...Use Auto Scaling, Spot Pricing, and More Expert Strategies (ANT347) - AWS re:...
Use Auto Scaling, Spot Pricing, and More Expert Strategies (ANT347) - AWS re:...
Amazon Web Services
 
AWS Compute Evolved Week: Cost Optimize EC2 with Amazon EC2 Spot Instances
AWS Compute Evolved Week: Cost Optimize EC2 with Amazon EC2 Spot InstancesAWS Compute Evolved Week: Cost Optimize EC2 with Amazon EC2 Spot Instances
AWS Compute Evolved Week: Cost Optimize EC2 with Amazon EC2 Spot Instances
Amazon Web Services
 
Cost Optimize EC2 with Amazon EC2 Spot Instances
Cost Optimize EC2 with Amazon EC2 Spot InstancesCost Optimize EC2 with Amazon EC2 Spot Instances
Cost Optimize EC2 with Amazon EC2 Spot Instances
Amazon Web Services
 
Optimize Amazon EC2 for Fun and Profit
Optimize Amazon EC2 for Fun and Profit Optimize Amazon EC2 for Fun and Profit
Optimize Amazon EC2 for Fun and Profit
Amazon Web Services
 
Best practices for optimizing your EC2 costs with Spot Instances | AWS Floor28
Best practices for optimizing your EC2 costs with Spot Instances | AWS Floor28Best practices for optimizing your EC2 costs with Spot Instances | AWS Floor28
Best practices for optimizing your EC2 costs with Spot Instances | AWS Floor28
Amazon Web Services
 
Cost optimisation as a by-product of awesome practice and agility at Trainline
Cost optimisation as a by-product of awesome practice and agility at TrainlineCost optimisation as a by-product of awesome practice and agility at Trainline
Cost optimisation as a by-product of awesome practice and agility at Trainline
Amazon Web Services
 
Running Amazon EC2 Workloads at Scale (CMP402-R1) - AWS re:Invent 2018
Running Amazon EC2 Workloads at Scale (CMP402-R1) - AWS re:Invent 2018Running Amazon EC2 Workloads at Scale (CMP402-R1) - AWS re:Invent 2018
Running Amazon EC2 Workloads at Scale (CMP402-R1) - AWS re:Invent 2018
Amazon Web Services
 
Amazon EC2 Spot- AWS Container Day 2019 Barcelona
Amazon EC2 Spot- AWS Container Day 2019 BarcelonaAmazon EC2 Spot- AWS Container Day 2019 Barcelona
Amazon EC2 Spot- AWS Container Day 2019 Barcelona
Amazon Web Services
 
SRV203 Optimizing Amazon EC2 for Fun and Profit
 SRV203 Optimizing Amazon EC2 for Fun and Profit SRV203 Optimizing Amazon EC2 for Fun and Profit
SRV203 Optimizing Amazon EC2 for Fun and Profit
Amazon Web Services
 
Optimize Amazon EC2 for Fun and Profit - SRV203 - Chicago AWS Summit
Optimize Amazon EC2 for Fun and Profit - SRV203 - Chicago AWS SummitOptimize Amazon EC2 for Fun and Profit - SRV203 - Chicago AWS Summit
Optimize Amazon EC2 for Fun and Profit - SRV203 - Chicago AWS Summit
Amazon Web Services
 
Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...
Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...
Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...
Amazon Web Services
 
Running Amazon EKS Workloads on Amazon EC2 Spot Instances (CMP403-R1) - AWS r...
Running Amazon EKS Workloads on Amazon EC2 Spot Instances (CMP403-R1) - AWS r...Running Amazon EKS Workloads on Amazon EC2 Spot Instances (CMP403-R1) - AWS r...
Running Amazon EKS Workloads on Amazon EC2 Spot Instances (CMP403-R1) - AWS r...
Amazon Web Services
 
以 Amazon EC2 Spot 執行個體有效控制專案成本 (Level: 200)
以 Amazon EC2 Spot 執行個體有效控制專案成本 (Level: 200)以 Amazon EC2 Spot 執行個體有效控制專案成本 (Level: 200)
以 Amazon EC2 Spot 執行個體有效控制專案成本 (Level: 200)
Amazon Web Services
 
Amazon EC2 Spot Instances Workshop
Amazon EC2 Spot Instances WorkshopAmazon EC2 Spot Instances Workshop
Amazon EC2 Spot Instances Workshop
AWS User Group Bengaluru
 
Save up to 90% Using Multiple Purchase Options in Amazon EC2 Auto Scaling (CM...
Save up to 90% Using Multiple Purchase Options in Amazon EC2 Auto Scaling (CM...Save up to 90% Using Multiple Purchase Options in Amazon EC2 Auto Scaling (CM...
Save up to 90% Using Multiple Purchase Options in Amazon EC2 Auto Scaling (CM...
Amazon Web Services
 
Getting started with AWS Foundational Services
Getting started with AWS Foundational ServicesGetting started with AWS Foundational Services
Getting started with AWS Foundational Services
Amazon Web Services
 
Building High-Scale Web Apps on Amazon EC2 Fleet (CMP409-R1) - AWS re:Invent ...
Building High-Scale Web Apps on Amazon EC2 Fleet (CMP409-R1) - AWS re:Invent ...Building High-Scale Web Apps on Amazon EC2 Fleet (CMP409-R1) - AWS re:Invent ...
Building High-Scale Web Apps on Amazon EC2 Fleet (CMP409-R1) - AWS re:Invent ...
Amazon Web Services
 
[NEW LAUNCH!] Introducing Amazon EC2 A1 Instances Based on the Arm Architectu...
[NEW LAUNCH!] Introducing Amazon EC2 A1 Instances Based on the Arm Architectu...[NEW LAUNCH!] Introducing Amazon EC2 A1 Instances Based on the Arm Architectu...
[NEW LAUNCH!] Introducing Amazon EC2 A1 Instances Based on the Arm Architectu...
Amazon Web Services
 
Best practices for migrating big data workloads to Amazon EMR - ADB204 - Chic...
Best practices for migrating big data workloads to Amazon EMR - ADB204 - Chic...Best practices for migrating big data workloads to Amazon EMR - ADB204 - Chic...
Best practices for migrating big data workloads to Amazon EMR - ADB204 - Chic...
Amazon Web Services
 
Amazon EC2 Spot Instances
Amazon EC2 Spot InstancesAmazon EC2 Spot Instances
Amazon EC2 Spot Instances
AWS User Group Bengaluru
 

Similar to Lower Costs on Amazon EMR: Auto Scaling, Spot Pricing, & Expert Strategies (ANT385) - AWS re:Invent 2018 (20)

Use Auto Scaling, Spot Pricing, and More Expert Strategies (ANT347) - AWS re:...
Use Auto Scaling, Spot Pricing, and More Expert Strategies (ANT347) - AWS re:...Use Auto Scaling, Spot Pricing, and More Expert Strategies (ANT347) - AWS re:...
Use Auto Scaling, Spot Pricing, and More Expert Strategies (ANT347) - AWS re:...
 
AWS Compute Evolved Week: Cost Optimize EC2 with Amazon EC2 Spot Instances
AWS Compute Evolved Week: Cost Optimize EC2 with Amazon EC2 Spot InstancesAWS Compute Evolved Week: Cost Optimize EC2 with Amazon EC2 Spot Instances
AWS Compute Evolved Week: Cost Optimize EC2 with Amazon EC2 Spot Instances
 
Cost Optimize EC2 with Amazon EC2 Spot Instances
Cost Optimize EC2 with Amazon EC2 Spot InstancesCost Optimize EC2 with Amazon EC2 Spot Instances
Cost Optimize EC2 with Amazon EC2 Spot Instances
 
Optimize Amazon EC2 for Fun and Profit
Optimize Amazon EC2 for Fun and Profit Optimize Amazon EC2 for Fun and Profit
Optimize Amazon EC2 for Fun and Profit
 
Best practices for optimizing your EC2 costs with Spot Instances | AWS Floor28
Best practices for optimizing your EC2 costs with Spot Instances | AWS Floor28Best practices for optimizing your EC2 costs with Spot Instances | AWS Floor28
Best practices for optimizing your EC2 costs with Spot Instances | AWS Floor28
 
Cost optimisation as a by-product of awesome practice and agility at Trainline
Cost optimisation as a by-product of awesome practice and agility at TrainlineCost optimisation as a by-product of awesome practice and agility at Trainline
Cost optimisation as a by-product of awesome practice and agility at Trainline
 
Running Amazon EC2 Workloads at Scale (CMP402-R1) - AWS re:Invent 2018
Running Amazon EC2 Workloads at Scale (CMP402-R1) - AWS re:Invent 2018Running Amazon EC2 Workloads at Scale (CMP402-R1) - AWS re:Invent 2018
Running Amazon EC2 Workloads at Scale (CMP402-R1) - AWS re:Invent 2018
 
Amazon EC2 Spot- AWS Container Day 2019 Barcelona
Amazon EC2 Spot- AWS Container Day 2019 BarcelonaAmazon EC2 Spot- AWS Container Day 2019 Barcelona
Amazon EC2 Spot- AWS Container Day 2019 Barcelona
 
SRV203 Optimizing Amazon EC2 for Fun and Profit
 SRV203 Optimizing Amazon EC2 for Fun and Profit SRV203 Optimizing Amazon EC2 for Fun and Profit
SRV203 Optimizing Amazon EC2 for Fun and Profit
 
Optimize Amazon EC2 for Fun and Profit - SRV203 - Chicago AWS Summit
Optimize Amazon EC2 for Fun and Profit - SRV203 - Chicago AWS SummitOptimize Amazon EC2 for Fun and Profit - SRV203 - Chicago AWS Summit
Optimize Amazon EC2 for Fun and Profit - SRV203 - Chicago AWS Summit
 
Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...
Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...
Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...
 
Running Amazon EKS Workloads on Amazon EC2 Spot Instances (CMP403-R1) - AWS r...
Running Amazon EKS Workloads on Amazon EC2 Spot Instances (CMP403-R1) - AWS r...Running Amazon EKS Workloads on Amazon EC2 Spot Instances (CMP403-R1) - AWS r...
Running Amazon EKS Workloads on Amazon EC2 Spot Instances (CMP403-R1) - AWS r...
 
以 Amazon EC2 Spot 執行個體有效控制專案成本 (Level: 200)
以 Amazon EC2 Spot 執行個體有效控制專案成本 (Level: 200)以 Amazon EC2 Spot 執行個體有效控制專案成本 (Level: 200)
以 Amazon EC2 Spot 執行個體有效控制專案成本 (Level: 200)
 
Amazon EC2 Spot Instances Workshop
Amazon EC2 Spot Instances WorkshopAmazon EC2 Spot Instances Workshop
Amazon EC2 Spot Instances Workshop
 
Save up to 90% Using Multiple Purchase Options in Amazon EC2 Auto Scaling (CM...
Save up to 90% Using Multiple Purchase Options in Amazon EC2 Auto Scaling (CM...Save up to 90% Using Multiple Purchase Options in Amazon EC2 Auto Scaling (CM...
Save up to 90% Using Multiple Purchase Options in Amazon EC2 Auto Scaling (CM...
 
Getting started with AWS Foundational Services
Getting started with AWS Foundational ServicesGetting started with AWS Foundational Services
Getting started with AWS Foundational Services
 
Building High-Scale Web Apps on Amazon EC2 Fleet (CMP409-R1) - AWS re:Invent ...
Building High-Scale Web Apps on Amazon EC2 Fleet (CMP409-R1) - AWS re:Invent ...Building High-Scale Web Apps on Amazon EC2 Fleet (CMP409-R1) - AWS re:Invent ...
Building High-Scale Web Apps on Amazon EC2 Fleet (CMP409-R1) - AWS re:Invent ...
 
[NEW LAUNCH!] Introducing Amazon EC2 A1 Instances Based on the Arm Architectu...
[NEW LAUNCH!] Introducing Amazon EC2 A1 Instances Based on the Arm Architectu...[NEW LAUNCH!] Introducing Amazon EC2 A1 Instances Based on the Arm Architectu...
[NEW LAUNCH!] Introducing Amazon EC2 A1 Instances Based on the Arm Architectu...
 
Best practices for migrating big data workloads to Amazon EMR - ADB204 - Chic...
Best practices for migrating big data workloads to Amazon EMR - ADB204 - Chic...Best practices for migrating big data workloads to Amazon EMR - ADB204 - Chic...
Best practices for migrating big data workloads to Amazon EMR - ADB204 - Chic...
 
Amazon EC2 Spot Instances
Amazon EC2 Spot InstancesAmazon EC2 Spot Instances
Amazon EC2 Spot Instances
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
Amazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
Amazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
Amazon Web Services
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Amazon Web Services
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
Amazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
Amazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Amazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
Amazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Amazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Lower Costs on Amazon EMR: Auto Scaling, Spot Pricing, & Expert Strategies (ANT385) - AWS re:Invent 2018

  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Lower Costs on Amazon EMR: Auto Scaling, Spot Pricing, & Expert Strategies Bruno Faria Senior EMR Solutions Architect AWS Solutions Architecture A N T 3 8 5
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon EMR pricing With Amazon EMR, you only pay a per-second rate for every second you use. The price is based on the instance type and number of EC2 instances that you deploy, and the region in which you launch your cluster.
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Reserved, Spot, and On-Demand Instances Spot Instances Amazon EC2 Spot Instances offer spare compute capacity available at discounts compared to On- Demand Instances. Reserved Instances Amazon EC2 Reserved Instances provide you the option to make a payment for instances that you want to reserve at a significant discount compared to On- Demand pricing. On-Demand Instances Amazon EC2 On- Demand Instances are instances that you launch and pay by the second.
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Understanding the node types in Amazon EMR Master node: The node that manages the cluster. The master node tracks the status of tasks and monitors the health of the cluster. Core node: The node that runs tasks and stores data in the Hadoop Distributed File System (HDFS) on your cluster. Task node: The node that only runs tasks and does not store data in HDFS. Task nodes are optional.
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Lower costs with Spot and Reserved Instances Spot for task nodes Up to 80% off EC2 On-Demand pricing On-demand for core nodes Standard Amazon EC2 pricing for on-demand capacity Meet SLA at predictable cost Exceed SLA at lower cost
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Performance and hardware Considerations • Transient or long running • Instance types • Cluster size • Application settings • File formats and S3 tuning Master node c5.2xlarge Slave group - Core c5.2xlarge Slave group – Task m5.2xlarge (EC2 Spot)
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Advanced Spot Provisioning with Instance Fleets Master node Core instance fleet Task instance fleet • Provision from a list of instance types with Spot and On-Demand • Launch in the most optimal Availability Zone based on capacity/price • Spot block support
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Transient or long running workloads Transient Long running
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Lower costs with Auto Scaling
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Use Amazon S3 as your persistent data store • Decouple storage and compute • Scale up or down for your compute and storage needs independently • Can run transient Amazon EMR clusters with Amazon EC2 Spot Instances • Designed for 99.999999999% durability • No need to pay for data replication
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon S3 Tips • Partition your data to reduce amount of data scanned • Optimize file sizes to reduce amount S3 requests • Compress data set to minimize bandwidth from S3 to EC2 • Use a columnar file format like Parquet when selecting only a subset of columns
  • 13. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Bruno Faria
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.