Your SlideShare is downloading. ×
Cloud Connect 2013- Lock Stock and x Smoking EC2's
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Cloud Connect 2013- Lock Stock and x Smoking EC2's


Published on

This Slide was presented @ Cloud Connect 2013. Lock, Stock and X Smoking EC2's was by inspired by Guy Ritchie movies. It describes how we put Amazon EMR + Spot EC2 instances to use for a customer and …

This Slide was presented @ Cloud Connect 2013. Lock, Stock and X Smoking EC2's was by inspired by Guy Ritchie movies. It describes how we put Amazon EMR + Spot EC2 instances to use for a customer and achieved cost savings while solving a Big Data problem.

Published in: Technology, Business

1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Harish GanesanCTO8KMiles2013
  • 2. P1) This Presentation isP2) Strongly Inspired by “Guy Ritchie”MoviesP3) Disclaimer : All images are downloaded frominternet. If you find any of the content / images violatingcopyright, please let me know and I will act upon itimmediately
  • 3. AGENDA• Case• Challenge• Solution• Learnings• About us
  • 4. CaseCigarette smoking is injurious to health• Mobile Advertising company, USA• Forbes 1000 clientele• TB’s of unstructured data -> Big DataProblem
  • 5. Lock• Hourly ~1 TB• CDN Logs• Text Files• XML Files• Geo data files• Server logs• DB records
  • 6. STOCK• Reduce the cost leakage• How to Save $$$ ?
  • 7. Challenges• Daily (was OK), Monthly (Pain) and Historicalanalysis ( almost dead )• How do we Transfer, Store, Analyze and Share ?• How to optimize costs at this scale ?
  • 8. SolutionCigarette smoking is injurious to health• Use AWS Cloud for hosting Analytics module• Amazon EMR for unstructured Log Analysis• Automation using Scripts, Java code and othertools
  • 9. Social / 3rdPartyFeeds/CloudLogsStage 1: Data Transfer• Tsunami UDP• ~1TB un compressed logsevery hour• High bandwidth EC2’s forTsunami UDP• Other Popular Options :• Aspera• AWS Import/Export• WAN optimization• AWS Direct Connect
  • 10. Amazon S3LogsStage 2: Storage• Amazon Web Services Building Block– S3• Scalable Object Store• Inherently Fault Tolerant• ~2 TB of compressed logs every day• S3 RR option for intermediateoutputs• Amazon Glacier for archivalSocial / 3rdPartyFeeds/Cloud
  • 11. Amazon S3ElasticMapReduceLogsStage 3: Analyze• Elastic MapReduceService of Amazon• Minimal Setup time• Log Analysis• ~2000 mappers /750 reducers @peak• ~250 m1.xlargetask nodes (1000cores, 3750 GBRAM) @ peakSocial / 3rdPartyFeeds/Cloud
  • 12. • Amazon EMR is great• But adding Spot EC2 is super coolWait !!!
  • 13. What is Amazon Spot ?13• Time-flexible, interruption-tolerant tasks• Bid Price & Spot Price• M1.xlarge Price Comparison• $0.480 per Hour – On Demand• $0.052 per Hour - Spot• You will never pay more than yourmaximum bid price per hour•Spot Instance may be interrupted• If interrupted you will not be charged forany partial hour of usage. (*Free)
  • 14. Spot Bidding Strategies14•Just above Spot Price•Between Spot Price & On DemandPrice•On Demand Price•Above On Demand Price
  • 15. Spot Price Variations - AZ
  • 16. Amazon EMR with Spot InstanceProject MasterInstanceGroupCore InstanceGroupTask InstanceGroupLong-runningclusterson-demand on-demand SpotCost-drivenworkloadsspot spot SpotData-criticalworkloadson-demand on-demand SpotApplicationtestingspot Spot Spot
  • 17. Amazon S3ElasticMapReduceSocial /3rd PartyFeedsLogsStage 4: Custom EMR Manager• We created a Custom EMRManager• Choose spot based on:• Past price trend intelligence• Choose AZ based on CurrentMarket Prices• Choose between Large vsExtra Large• Spot Pricing Strategy :• Set Spot Price = On DemandPrice• Over board <20% of OnDemand Price at times• Dynamic Sizing the Core / Tasknodes• Dynamic EMR Cluster creationCustom EMRManager
  • 18. Some Spot Use Cases18• Analytics & Big Data• Scientific computing• Web crawling• Financial model and Analysis• Testing• Image & Media Encoding66 % savings50 % savings57 % savings
  • 19. Learning• Spot + On demand EC2 is a deadly combination for cost savings• Every millisecond matters in MR – Tune your code• Merge Files – Bigger ones are better for processing
  • 20. More Learning …• Custom Job Manager was designed by us• 1 File Per Mapper was better for our case in AWS• Understand the performance constraints of AWS andwork with it• Compress data : Both storage and transit(.LZO & Snappy)
  • 21. Continues…• Keep configuration data in local memory or AmazonDynamoDB• Reducers split files suitable for next job mappers• Elasticity – Increase/Decrease Task nodes• Elasticity – Create new EMR Clusters matching the Logs(Core + Task)
  • 22. Value• ~56% cost savings from pure On-Demand model for Core+Task Nodes• Automation vastly reduced Labor cost ( initial + on going)• Customer CXO’s were happy
  • 23. • AWS Premium Partner• Solution Experts in• Cloud Computing• Big Data• Identity ManagementAbout US
  • 24. Shoot your ?Harish@8kmiles.com