SlideShare a Scribd company logo
1 of 24
Harish Ganesan
CTO
8KMiles
2013
P1) This Presentation is
P2) Strongly Inspired by “Guy Ritchie”
Movies
P3) Disclaimer : All images are downloaded from
internet. If you find any of the content / images violating
copyright, please let me know and I will act upon it
immediately
AGENDA
• Case
• Challenge
• Solution
• Learning's
• About us
Case
Cigarette smoking is injurious to health
• Mobile Advertising company, USA
• Forbes 1000 clientele
• TB’s of unstructured data -> Big Data
Problem
Lock
• Hourly ~1 TB
• CDN Logs
• Text Files
• XML Files
• Geo data files
• Server logs
• DB records
STOCK
• Reduce the cost leakage
• How to Save $$$ ?
Challenges
• Daily (was OK), Monthly (Pain) and Historical
analysis ( almost dead )
• How do we Transfer, Store, Analyze and Share ?
• How to optimize costs at this scale ?
Solution
Cigarette smoking is injurious to health
• Use AWS Cloud for hosting Analytics module
• Amazon EMR for unstructured Log Analysis
• Automation using Scripts, Java code and other
tools
Social / 3rd
Party
Feeds/Cloud
Logs
Stage 1: Data Transfer
• Tsunami UDP
• ~1TB un compressed logs
every hour
• High bandwidth EC2’s for
Tsunami UDP
• Other Popular Options :
• Aspera
• AWS Import/Export
• WAN optimization
• AWS Direct Connect
Amazon S3
Logs
Stage 2: Storage
• Amazon Web Services Building Block
– S3
• Scalable Object Store
• Inherently Fault Tolerant
• ~2 TB of compressed logs every day
• S3 RR option for intermediate
outputs
• Amazon Glacier for archivalSocial / 3rd
Party
Feeds/Cloud
Amazon S3
Elastic
MapReduce
Logs
Stage 3: Analyze
• Elastic MapReduce
Service of Amazon
• Minimal Setup time
• Log Analysis
• ~2000 mappers /
750 reducers @
peak
• ~250 m1.xlarge
task nodes (1000
cores, 3750 GB
RAM) @ peakSocial / 3rd
Party
Feeds/Cloud
• Amazon EMR is great
• But adding Spot EC2 is super cool
Wait !!!
What is Amazon Spot ?
13
• Time-flexible, interruption-tolerant tasks
• Bid Price & Spot Price
• M1.xlarge Price Comparison
• $0.480 per Hour – On Demand
• $0.052 per Hour - Spot
• You will never pay more than your
maximum bid price per hour
•Spot Instance may be interrupted
• If interrupted you will not be charged for
any partial hour of usage. (*Free)
Spot Bidding Strategies
14
•Just above Spot Price
•Between Spot Price & On Demand
Price
•On Demand Price
•Above On Demand Price
Spot Price Variations - AZ
Amazon EMR with Spot Instance
Project Master
Instance
Group
Core Instance
Group
Task Instance
Group
Long-running
clusters
on-demand on-demand Spot
Cost-driven
workloads
spot spot Spot
Data-critical
workloads
on-demand on-demand Spot
Application
testing
spot Spot Spot
Amazon S3
Elastic
MapReduce
Social /
3rd Party
Feeds
Logs
Stage 4: Custom EMR Manager
• We created a Custom EMR
Manager
• Choose spot based on:
• Past price trend intelligence
• Choose AZ based on Current
Market Prices
• Choose between Large vs
Extra Large
• Spot Pricing Strategy :
• Set Spot Price = On Demand
Price
• Over board <20% of On
Demand Price at times
• Dynamic Sizing the Core / Task
nodes
• Dynamic EMR Cluster creationCustom EMR
Manager
Some Spot Use Cases
18
• Analytics & Big Data
• Scientific computing
• Web crawling
• Financial model and Analysis
• Testing
• Image & Media Encoding
66 % savings
50 % savings
57 % savings
Learning
• Spot + On demand EC2 is a deadly combination for cost savings
• Every millisecond matters in MR – Tune your code
• Merge Files – Bigger ones are better for processing
More Learning …
• Custom Job Manager was designed by us
• 1 File Per Mapper was better for our case in AWS
• Understand the performance constraints of AWS and
work with it
• Compress data : Both storage and transit(.LZO & Snappy)
Continues…
• Keep configuration data in local memory or Amazon
DynamoDB
• Reducers split files suitable for next job mappers
• Elasticity – Increase/Decrease Task nodes
• Elasticity – Create new EMR Clusters matching the Logs
(Core + Task)
Value
• ~56% cost savings from pure On-Demand model for Core+
Task Nodes
• Automation vastly reduced Labor cost ( initial + on going)
• Customer CXO’s were happy
• AWS Premium Partner
• Solution Experts in
• Cloud Computing
• Big Data
• Identity Management
About US
Shoot your ?
Harish@8kmiles.com
http://harish11g.blogspot.com
@harish11g
harishganesan

More Related Content

What's hot

AWS Summit London 2014 | Partners & Solutions Track | What's New at AWS?
AWS Summit London 2014 | Partners & Solutions Track | What's New at AWS?AWS Summit London 2014 | Partners & Solutions Track | What's New at AWS?
AWS Summit London 2014 | Partners & Solutions Track | What's New at AWS?
Amazon Web Services
 
(AWS) Auto Scaling : Evening Session by Amazon and IntelliGrape Software
(AWS) Auto Scaling : Evening Session by Amazon and IntelliGrape Software(AWS) Auto Scaling : Evening Session by Amazon and IntelliGrape Software
(AWS) Auto Scaling : Evening Session by Amazon and IntelliGrape Software
TO THE NEW | Technology
 
Aws 201:Advanced Breakout Track on HA and DR
Aws 201:Advanced Breakout Track on HA and DRAws 201:Advanced Breakout Track on HA and DR
Aws 201:Advanced Breakout Track on HA and DR
Harish Ganesan
 
AWS Cloud Kata | Bangkok - Getting to Scale on AWS
AWS Cloud Kata | Bangkok - Getting to Scale on AWSAWS Cloud Kata | Bangkok - Getting to Scale on AWS
AWS Cloud Kata | Bangkok - Getting to Scale on AWS
Amazon Web Services
 

What's hot (20)

AWS Summit London 2014 | Partners & Solutions Track | What's New at AWS?
AWS Summit London 2014 | Partners & Solutions Track | What's New at AWS?AWS Summit London 2014 | Partners & Solutions Track | What's New at AWS?
AWS Summit London 2014 | Partners & Solutions Track | What's New at AWS?
 
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
 
AWS Summit London 2014 | Deployment Done Right (300)
AWS Summit London 2014 | Deployment Done Right (300)AWS Summit London 2014 | Deployment Done Right (300)
AWS Summit London 2014 | Deployment Done Right (300)
 
AWS Summit London 2014 | From One to Many - Evolving VPC Design (400)
AWS Summit London 2014 | From One to Many - Evolving VPC Design (400)AWS Summit London 2014 | From One to Many - Evolving VPC Design (400)
AWS Summit London 2014 | From One to Many - Evolving VPC Design (400)
 
(AWS) Auto Scaling : Evening Session by Amazon and IntelliGrape Software
(AWS) Auto Scaling : Evening Session by Amazon and IntelliGrape Software(AWS) Auto Scaling : Evening Session by Amazon and IntelliGrape Software
(AWS) Auto Scaling : Evening Session by Amazon and IntelliGrape Software
 
Aws 201:Advanced Breakout Track on HA and DR
Aws 201:Advanced Breakout Track on HA and DRAws 201:Advanced Breakout Track on HA and DR
Aws 201:Advanced Breakout Track on HA and DR
 
AWS Cloud Kata | Bangkok - Getting to Scale on AWS
AWS Cloud Kata | Bangkok - Getting to Scale on AWSAWS Cloud Kata | Bangkok - Getting to Scale on AWS
AWS Cloud Kata | Bangkok - Getting to Scale on AWS
 
Aws Autoscaling
Aws AutoscalingAws Autoscaling
Aws Autoscaling
 
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
 
Auto scaling with Ruby, AWS, Jenkins and Redis
Auto scaling with Ruby, AWS, Jenkins and RedisAuto scaling with Ruby, AWS, Jenkins and Redis
Auto scaling with Ruby, AWS, Jenkins and Redis
 
Day 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity PlanDay 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity Plan
 
Auto Scaling with Amazon Web Services
Auto Scaling with Amazon Web ServicesAuto Scaling with Amazon Web Services
Auto Scaling with Amazon Web Services
 
Cost Optimisation on AWS
Cost Optimisation on AWSCost Optimisation on AWS
Cost Optimisation on AWS
 
(CMP201) All You Need To Know About Auto Scaling
(CMP201) All You Need To Know About Auto Scaling(CMP201) All You Need To Know About Auto Scaling
(CMP201) All You Need To Know About Auto Scaling
 
How Does Amazon EC2 Auto Scaling Work
How Does Amazon EC2 Auto Scaling WorkHow Does Amazon EC2 Auto Scaling Work
How Does Amazon EC2 Auto Scaling Work
 
Getting Started with Amazon Aurora
Getting Started with Amazon AuroraGetting Started with Amazon Aurora
Getting Started with Amazon Aurora
 
All you need to know about Auto scaling - Pop-up Loft
All you need to know about Auto scaling - Pop-up LoftAll you need to know about Auto scaling - Pop-up Loft
All you need to know about Auto scaling - Pop-up Loft
 
This One Weird API Request Will Save You Thousands
This One Weird API Request Will Save You ThousandsThis One Weird API Request Will Save You Thousands
This One Weird API Request Will Save You Thousands
 
Introduction to AWS Batch
Introduction to AWS BatchIntroduction to AWS Batch
Introduction to AWS Batch
 
Introduction to Amazon EC2 Spot
Introduction to Amazon EC2 SpotIntroduction to Amazon EC2 Spot
Introduction to Amazon EC2 Spot
 

Similar to Cloud Connect 2013- Lock Stock and x Smoking EC2's

Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)
Rasmus Ekman
 
AWS Cost Optimization
AWS Cost OptimizationAWS Cost Optimization
AWS Cost Optimization
Miles Ward
 

Similar to Cloud Connect 2013- Lock Stock and x Smoking EC2's (20)

Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)
 
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
 
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB DayChoosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
 
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel AvivBig Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduce
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
AWS Meet-up Atlanta: AWS Economics
AWS Meet-up Atlanta: AWS EconomicsAWS Meet-up Atlanta: AWS Economics
AWS Meet-up Atlanta: AWS Economics
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
 
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
 
AWS Cost Optimization
AWS Cost OptimizationAWS Cost Optimization
AWS Cost Optimization
 
Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with A...
Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with A...Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with A...
Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with A...
 
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
 
Introduction to Database Services
Introduction to Database ServicesIntroduction to Database Services
Introduction to Database Services
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
AWS APAC Webinar Series: How to Reduce Your Spend on AWS
AWS APAC Webinar Series: How to Reduce Your Spend on AWSAWS APAC Webinar Series: How to Reduce Your Spend on AWS
AWS APAC Webinar Series: How to Reduce Your Spend on AWS
 
AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)
AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)
AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)
 
How to Reduce your Spend on AWS
How to Reduce your Spend on AWSHow to Reduce your Spend on AWS
How to Reduce your Spend on AWS
 

Recently uploaded

Recently uploaded (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Cloud Connect 2013- Lock Stock and x Smoking EC2's

  • 2. P1) This Presentation is P2) Strongly Inspired by “Guy Ritchie” Movies P3) Disclaimer : All images are downloaded from internet. If you find any of the content / images violating copyright, please let me know and I will act upon it immediately
  • 3. AGENDA • Case • Challenge • Solution • Learning's • About us
  • 4. Case Cigarette smoking is injurious to health • Mobile Advertising company, USA • Forbes 1000 clientele • TB’s of unstructured data -> Big Data Problem
  • 5. Lock • Hourly ~1 TB • CDN Logs • Text Files • XML Files • Geo data files • Server logs • DB records
  • 6. STOCK • Reduce the cost leakage • How to Save $$$ ?
  • 7. Challenges • Daily (was OK), Monthly (Pain) and Historical analysis ( almost dead ) • How do we Transfer, Store, Analyze and Share ? • How to optimize costs at this scale ?
  • 8. Solution Cigarette smoking is injurious to health • Use AWS Cloud for hosting Analytics module • Amazon EMR for unstructured Log Analysis • Automation using Scripts, Java code and other tools
  • 9. Social / 3rd Party Feeds/Cloud Logs Stage 1: Data Transfer • Tsunami UDP • ~1TB un compressed logs every hour • High bandwidth EC2’s for Tsunami UDP • Other Popular Options : • Aspera • AWS Import/Export • WAN optimization • AWS Direct Connect
  • 10. Amazon S3 Logs Stage 2: Storage • Amazon Web Services Building Block – S3 • Scalable Object Store • Inherently Fault Tolerant • ~2 TB of compressed logs every day • S3 RR option for intermediate outputs • Amazon Glacier for archivalSocial / 3rd Party Feeds/Cloud
  • 11. Amazon S3 Elastic MapReduce Logs Stage 3: Analyze • Elastic MapReduce Service of Amazon • Minimal Setup time • Log Analysis • ~2000 mappers / 750 reducers @ peak • ~250 m1.xlarge task nodes (1000 cores, 3750 GB RAM) @ peakSocial / 3rd Party Feeds/Cloud
  • 12. • Amazon EMR is great • But adding Spot EC2 is super cool Wait !!!
  • 13. What is Amazon Spot ? 13 • Time-flexible, interruption-tolerant tasks • Bid Price & Spot Price • M1.xlarge Price Comparison • $0.480 per Hour – On Demand • $0.052 per Hour - Spot • You will never pay more than your maximum bid price per hour •Spot Instance may be interrupted • If interrupted you will not be charged for any partial hour of usage. (*Free)
  • 14. Spot Bidding Strategies 14 •Just above Spot Price •Between Spot Price & On Demand Price •On Demand Price •Above On Demand Price
  • 16. Amazon EMR with Spot Instance Project Master Instance Group Core Instance Group Task Instance Group Long-running clusters on-demand on-demand Spot Cost-driven workloads spot spot Spot Data-critical workloads on-demand on-demand Spot Application testing spot Spot Spot
  • 17. Amazon S3 Elastic MapReduce Social / 3rd Party Feeds Logs Stage 4: Custom EMR Manager • We created a Custom EMR Manager • Choose spot based on: • Past price trend intelligence • Choose AZ based on Current Market Prices • Choose between Large vs Extra Large • Spot Pricing Strategy : • Set Spot Price = On Demand Price • Over board <20% of On Demand Price at times • Dynamic Sizing the Core / Task nodes • Dynamic EMR Cluster creationCustom EMR Manager
  • 18. Some Spot Use Cases 18 • Analytics & Big Data • Scientific computing • Web crawling • Financial model and Analysis • Testing • Image & Media Encoding 66 % savings 50 % savings 57 % savings
  • 19. Learning • Spot + On demand EC2 is a deadly combination for cost savings • Every millisecond matters in MR – Tune your code • Merge Files – Bigger ones are better for processing
  • 20. More Learning … • Custom Job Manager was designed by us • 1 File Per Mapper was better for our case in AWS • Understand the performance constraints of AWS and work with it • Compress data : Both storage and transit(.LZO & Snappy)
  • 21. Continues… • Keep configuration data in local memory or Amazon DynamoDB • Reducers split files suitable for next job mappers • Elasticity – Increase/Decrease Task nodes • Elasticity – Create new EMR Clusters matching the Logs (Core + Task)
  • 22. Value • ~56% cost savings from pure On-Demand model for Core+ Task Nodes • Automation vastly reduced Labor cost ( initial + on going) • Customer CXO’s were happy
  • 23. • AWS Premium Partner • Solution Experts in • Cloud Computing • Big Data • Identity Management About US