• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Cloud Connect 2013- Lock Stock and x Smoking EC2's
 

Cloud Connect 2013- Lock Stock and x Smoking EC2's

on

  • 2,110 views

This Slide was presented @ Cloud Connect 2013. Lock, Stock and X Smoking EC2's was by inspired by Guy Ritchie movies. It describes how we put Amazon EMR + Spot EC2 instances to use for a customer and ...

This Slide was presented @ Cloud Connect 2013. Lock, Stock and X Smoking EC2's was by inspired by Guy Ritchie movies. It describes how we put Amazon EMR + Spot EC2 instances to use for a customer and achieved cost savings while solving a Big Data problem.

Statistics

Views

Total Views
2,110
Views on SlideShare
1,523
Embed Views
587

Actions

Likes
0
Downloads
0
Comments
0

35 Embeds 587

http://harish11g.blogspot.in 248
http://harish11g.blogspot.com 114
http://harish11g.blogspot.co.uk 73
http://feeds.feedburner.com 29
http://cloud.feedly.com 26
http://www.linkedin.com 19
http://digg.com 10
https://twitter.com 9
http://harish11g.blogspot.de 6
http://harish11g.blogspot.sg 5
http://harish11g.blogspot.com.au 5
http://harish11g.blogspot.fr 4
http://harish11g.blogspot.ca 4
http://www.newsblur.com 3
https://web.tweetdeck.com 3
http://feedproxy.google.com 3
http://harish11g.blogspot.com.br 3
http://harish11g.blogspot.com.es 3
http://harish11g.blogspot.cz 2
http://harish11g.blogspot.nl 2
https://www.linkedin.com 2
http://harish11g.blogspot.ru 1
http://harish11g.blogspot.pt 1
http://harish11g.blogspot.ie 1
http://harish11g.blogspot.tw 1
http://harish11g.blogspot.jp 1
http://harish11g.blogspot.hk 1
http://www.hanrss.com 1
http://harish11g.blogspot.mx 1
http://127.0.0.1 1
http://harish11g.blogspot.gr 1
http://www.feedly.com 1
http://www.feedspot.com 1
http://www.kred.com 1
http://harish11g.blogspot.se 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Cloud Connect 2013- Lock Stock and x Smoking EC2's Cloud Connect 2013- Lock Stock and x Smoking EC2's Presentation Transcript

    • Harish GanesanCTO8KMiles2013
    • P1) This Presentation isP2) Strongly Inspired by “Guy Ritchie”MoviesP3) Disclaimer : All images are downloaded frominternet. If you find any of the content / images violatingcopyright, please let me know and I will act upon itimmediately
    • AGENDA• Case• Challenge• Solution• Learnings• About us
    • CaseCigarette smoking is injurious to health• Mobile Advertising company, USA• Forbes 1000 clientele• TB’s of unstructured data -> Big DataProblem
    • Lock• Hourly ~1 TB• CDN Logs• Text Files• XML Files• Geo data files• Server logs• DB records
    • STOCK• Reduce the cost leakage• How to Save $$$ ?
    • Challenges• Daily (was OK), Monthly (Pain) and Historicalanalysis ( almost dead )• How do we Transfer, Store, Analyze and Share ?• How to optimize costs at this scale ?
    • SolutionCigarette smoking is injurious to health• Use AWS Cloud for hosting Analytics module• Amazon EMR for unstructured Log Analysis• Automation using Scripts, Java code and othertools
    • Social / 3rdPartyFeeds/CloudLogsStage 1: Data Transfer• Tsunami UDP• ~1TB un compressed logsevery hour• High bandwidth EC2’s forTsunami UDP• Other Popular Options :• Aspera• AWS Import/Export• WAN optimization• AWS Direct Connect
    • Amazon S3LogsStage 2: Storage• Amazon Web Services Building Block– S3• Scalable Object Store• Inherently Fault Tolerant• ~2 TB of compressed logs every day• S3 RR option for intermediateoutputs• Amazon Glacier for archivalSocial / 3rdPartyFeeds/Cloud
    • Amazon S3ElasticMapReduceLogsStage 3: Analyze• Elastic MapReduceService of Amazon• Minimal Setup time• Log Analysis• ~2000 mappers /750 reducers @peak• ~250 m1.xlargetask nodes (1000cores, 3750 GBRAM) @ peakSocial / 3rdPartyFeeds/Cloud
    • • Amazon EMR is great• But adding Spot EC2 is super coolWait !!!
    • What is Amazon Spot ?13• Time-flexible, interruption-tolerant tasks• Bid Price & Spot Price• M1.xlarge Price Comparison• $0.480 per Hour – On Demand• $0.052 per Hour - Spot• You will never pay more than yourmaximum bid price per hour•Spot Instance may be interrupted• If interrupted you will not be charged forany partial hour of usage. (*Free)
    • Spot Bidding Strategies14•Just above Spot Price•Between Spot Price & On DemandPrice•On Demand Price•Above On Demand Price
    • Spot Price Variations - AZ
    • Amazon EMR with Spot InstanceProject MasterInstanceGroupCore InstanceGroupTask InstanceGroupLong-runningclusterson-demand on-demand SpotCost-drivenworkloadsspot spot SpotData-criticalworkloadson-demand on-demand SpotApplicationtestingspot Spot Spot
    • Amazon S3ElasticMapReduceSocial /3rd PartyFeedsLogsStage 4: Custom EMR Manager• We created a Custom EMRManager• Choose spot based on:• Past price trend intelligence• Choose AZ based on CurrentMarket Prices• Choose between Large vsExtra Large• Spot Pricing Strategy :• Set Spot Price = On DemandPrice• Over board <20% of OnDemand Price at times• Dynamic Sizing the Core / Tasknodes• Dynamic EMR Cluster creationCustom EMRManager
    • Some Spot Use Cases18• Analytics & Big Data• Scientific computing• Web crawling• Financial model and Analysis• Testing• Image & Media Encoding66 % savings50 % savings57 % savings
    • Learning• Spot + On demand EC2 is a deadly combination for cost savings• Every millisecond matters in MR – Tune your code• Merge Files – Bigger ones are better for processing
    • More Learning …• Custom Job Manager was designed by us• 1 File Per Mapper was better for our case in AWS• Understand the performance constraints of AWS andwork with it• Compress data : Both storage and transit(.LZO & Snappy)
    • Continues…• Keep configuration data in local memory or AmazonDynamoDB• Reducers split files suitable for next job mappers• Elasticity – Increase/Decrease Task nodes• Elasticity – Create new EMR Clusters matching the Logs(Core + Task)
    • Value• ~56% cost savings from pure On-Demand model for Core+Task Nodes• Automation vastly reduced Labor cost ( initial + on going)• Customer CXO’s were happy
    • • AWS Premium Partner• Solution Experts in• Cloud Computing• Big Data• Identity ManagementAbout US
    • Shoot your ?Harish@8kmiles.comhttp://harish11g.blogspot.com@harish11gharishganesan