Your SlideShare is downloading. ×
Cloud Connect 2013- Lock Stock and x Smoking EC2's
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Cloud Connect 2013- Lock Stock and x Smoking EC2's

2,439
views

Published on

This Slide was presented @ Cloud Connect 2013. Lock, Stock and X Smoking EC2's was by inspired by Guy Ritchie movies. It describes how we put Amazon EMR + Spot EC2 instances to use for a customer and …

This Slide was presented @ Cloud Connect 2013. Lock, Stock and X Smoking EC2's was by inspired by Guy Ritchie movies. It describes how we put Amazon EMR + Spot EC2 instances to use for a customer and achieved cost savings while solving a Big Data problem.

Published in: Technology, Business

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,439
On Slideshare
0
From Embeds
0
Number of Embeds
42
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Harish GanesanCTO8KMiles2013
  • 2. P1) This Presentation isP2) Strongly Inspired by “Guy Ritchie”MoviesP3) Disclaimer : All images are downloaded frominternet. If you find any of the content / images violatingcopyright, please let me know and I will act upon itimmediately
  • 3. AGENDA• Case• Challenge• Solution• Learnings• About us
  • 4. CaseCigarette smoking is injurious to health• Mobile Advertising company, USA• Forbes 1000 clientele• TB’s of unstructured data -> Big DataProblem
  • 5. Lock• Hourly ~1 TB• CDN Logs• Text Files• XML Files• Geo data files• Server logs• DB records
  • 6. STOCK• Reduce the cost leakage• How to Save $$$ ?
  • 7. Challenges• Daily (was OK), Monthly (Pain) and Historicalanalysis ( almost dead )• How do we Transfer, Store, Analyze and Share ?• How to optimize costs at this scale ?
  • 8. SolutionCigarette smoking is injurious to health• Use AWS Cloud for hosting Analytics module• Amazon EMR for unstructured Log Analysis• Automation using Scripts, Java code and othertools
  • 9. Social / 3rdPartyFeeds/CloudLogsStage 1: Data Transfer• Tsunami UDP• ~1TB un compressed logsevery hour• High bandwidth EC2’s forTsunami UDP• Other Popular Options :• Aspera• AWS Import/Export• WAN optimization• AWS Direct Connect
  • 10. Amazon S3LogsStage 2: Storage• Amazon Web Services Building Block– S3• Scalable Object Store• Inherently Fault Tolerant• ~2 TB of compressed logs every day• S3 RR option for intermediateoutputs• Amazon Glacier for archivalSocial / 3rdPartyFeeds/Cloud
  • 11. Amazon S3ElasticMapReduceLogsStage 3: Analyze• Elastic MapReduceService of Amazon• Minimal Setup time• Log Analysis• ~2000 mappers /750 reducers @peak• ~250 m1.xlargetask nodes (1000cores, 3750 GBRAM) @ peakSocial / 3rdPartyFeeds/Cloud
  • 12. • Amazon EMR is great• But adding Spot EC2 is super coolWait !!!
  • 13. What is Amazon Spot ?13• Time-flexible, interruption-tolerant tasks• Bid Price & Spot Price• M1.xlarge Price Comparison• $0.480 per Hour – On Demand• $0.052 per Hour - Spot• You will never pay more than yourmaximum bid price per hour•Spot Instance may be interrupted• If interrupted you will not be charged forany partial hour of usage. (*Free)
  • 14. Spot Bidding Strategies14•Just above Spot Price•Between Spot Price & On DemandPrice•On Demand Price•Above On Demand Price
  • 15. Spot Price Variations - AZ
  • 16. Amazon EMR with Spot InstanceProject MasterInstanceGroupCore InstanceGroupTask InstanceGroupLong-runningclusterson-demand on-demand SpotCost-drivenworkloadsspot spot SpotData-criticalworkloadson-demand on-demand SpotApplicationtestingspot Spot Spot
  • 17. Amazon S3ElasticMapReduceSocial /3rd PartyFeedsLogsStage 4: Custom EMR Manager• We created a Custom EMRManager• Choose spot based on:• Past price trend intelligence• Choose AZ based on CurrentMarket Prices• Choose between Large vsExtra Large• Spot Pricing Strategy :• Set Spot Price = On DemandPrice• Over board <20% of OnDemand Price at times• Dynamic Sizing the Core / Tasknodes• Dynamic EMR Cluster creationCustom EMRManager
  • 18. Some Spot Use Cases18• Analytics & Big Data• Scientific computing• Web crawling• Financial model and Analysis• Testing• Image & Media Encoding66 % savings50 % savings57 % savings
  • 19. Learning• Spot + On demand EC2 is a deadly combination for cost savings• Every millisecond matters in MR – Tune your code• Merge Files – Bigger ones are better for processing
  • 20. More Learning …• Custom Job Manager was designed by us• 1 File Per Mapper was better for our case in AWS• Understand the performance constraints of AWS andwork with it• Compress data : Both storage and transit(.LZO & Snappy)
  • 21. Continues…• Keep configuration data in local memory or AmazonDynamoDB• Reducers split files suitable for next job mappers• Elasticity – Increase/Decrease Task nodes• Elasticity – Create new EMR Clusters matching the Logs(Core + Task)
  • 22. Value• ~56% cost savings from pure On-Demand model for Core+Task Nodes• Automation vastly reduced Labor cost ( initial + on going)• Customer CXO’s were happy
  • 23. • AWS Premium Partner• Solution Experts in• Cloud Computing• Big Data• Identity ManagementAbout US
  • 24. Shoot your ?Harish@8kmiles.comhttp://harish11g.blogspot.com@harish11gharishganesan