This Slide was presented @ Cloud Connect 2013. Lock, Stock and X Smoking EC2's was by inspired by Guy Ritchie movies. It describes how we put Amazon EMR + Spot EC2 instances to use for a customer and achieved cost savings while solving a Big Data problem.
Cloud Connect 2013- Lock Stock and x Smoking EC2's
P1) This Presentation isP2) Strongly Inspired by “Guy Ritchie”MoviesP3) Disclaimer : All images are downloaded frominternet. If you find any of the content / images violatingcopyright, please let me know and I will act upon itimmediately
AGENDA• Case• Challenge• Solution• Learnings• About us
CaseCigarette smoking is injurious to health• Mobile Advertising company, USA• Forbes 1000 clientele• TB’s of unstructured data -> Big DataProblem
Lock• Hourly ~1 TB• CDN Logs• Text Files• XML Files• Geo data files• Server logs• DB records
STOCK• Reduce the cost leakage• How to Save $$$ ?
Challenges• Daily (was OK), Monthly (Pain) and Historicalanalysis ( almost dead )• How do we Transfer, Store, Analyze and Share ?• How to optimize costs at this scale ?
SolutionCigarette smoking is injurious to health• Use AWS Cloud for hosting Analytics module• Amazon EMR for unstructured Log Analysis• Automation using Scripts, Java code and othertools
Social / 3rdPartyFeeds/CloudLogsStage 1: Data Transfer• Tsunami UDP• ~1TB un compressed logsevery hour• High bandwidth EC2’s forTsunami UDP• Other Popular Options :• Aspera• AWS Import/Export• WAN optimization• AWS Direct Connect
Amazon S3LogsStage 2: Storage• Amazon Web Services Building Block– S3• Scalable Object Store• Inherently Fault Tolerant• ~2 TB of compressed logs every day• S3 RR option for intermediateoutputs• Amazon Glacier for archivalSocial / 3rdPartyFeeds/Cloud
• Amazon EMR is great• But adding Spot EC2 is super coolWait !!!
What is Amazon Spot ?13• Time-flexible, interruption-tolerant tasks• Bid Price & Spot Price• M1.xlarge Price Comparison• $0.480 per Hour – On Demand• $0.052 per Hour - Spot• You will never pay more than yourmaximum bid price per hour•Spot Instance may be interrupted• If interrupted you will not be charged forany partial hour of usage. (*Free)
Spot Bidding Strategies14•Just above Spot Price•Between Spot Price & On DemandPrice•On Demand Price•Above On Demand Price
Amazon S3ElasticMapReduceSocial /3rd PartyFeedsLogsStage 4: Custom EMR Manager• We created a Custom EMRManager• Choose spot based on:• Past price trend intelligence• Choose AZ based on CurrentMarket Prices• Choose between Large vsExtra Large• Spot Pricing Strategy :• Set Spot Price = On DemandPrice• Over board <20% of OnDemand Price at times• Dynamic Sizing the Core / Tasknodes• Dynamic EMR Cluster creationCustom EMRManager
Some Spot Use Cases18• Analytics & Big Data• Scientific computing• Web crawling• Financial model and Analysis• Testing• Image & Media Encoding66 % savings50 % savings57 % savings
Learning• Spot + On demand EC2 is a deadly combination for cost savings• Every millisecond matters in MR – Tune your code• Merge Files – Bigger ones are better for processing
More Learning …• Custom Job Manager was designed by us• 1 File Per Mapper was better for our case in AWS• Understand the performance constraints of AWS andwork with it• Compress data : Both storage and transit(.LZO & Snappy)
Continues…• Keep configuration data in local memory or AmazonDynamoDB• Reducers split files suitable for next job mappers• Elasticity – Increase/Decrease Task nodes• Elasticity – Create new EMR Clusters matching the Logs(Core + Task)
Value• ~56% cost savings from pure On-Demand model for Core+Task Nodes• Automation vastly reduced Labor cost ( initial + on going)• Customer CXO’s were happy
• AWS Premium Partner• Solution Experts in• Cloud Computing• Big Data• Identity ManagementAbout US
Shoot your ?Harish@8kmiles.comhttp://harish11g.blogspot.com@harish11gharishganesan