Making Hadoop Enterprise ready with Amazon Elastic Map/Reduce<br />Simone Brunozzi<br />Technology Evangelist, Amazon Web ...
What is Elastic MapReduce <br />Use Cases<br />Service Features<br />New Feature Announcements <br />Elastic MapReduce Eco...
Enables customers to easily, securely and cost-effectively process vast amounts of data.<br />Spin-up 10s or 100s or even ...
<ul><li>Launch and monitor job flows
AWS Management Console
Command line interface
REST API </li></li></ul><li>Why use Amazon Elastic MapReduce<br />Elastic MapReduce removes MUCK from Big Data processing<...
Problems customers solve with Elastic MapReduce<br />Data mining and BI<br />Log processing, click stream analysis, simila...
Web-Scale Data warehousing<br />
Hadoop 0.20<br />Pig 0.6<br />Hive 0.5<br />Cascading 1.1<br />ELASTIC MAPREDUCE – SUPPORTED CONFIGURATIONS<br />Hadoop 0....
Apache Hive <br />Batch and Interactive Mode<br />Support Hive Steps<br />Integration with Elastic MapReduce Client and Ma...
Apache Pig <br />Batch and interactive mode<br />Support Pig Steps<br />Integration with Elastic MapReduce Client and Mana...
Enterprise customers need more flexibility<br />Configuring Clusters<br />Running Clusters<br />Paying for clusters <br />...
Amazon Elastic MapReduce features	<br />Bootstrap actions<br />Run arbitrary scripts before job flow begins <br />Run on a...
Enterprise customers need more flexibility<br />Configuring Clusters<br />Running Clusters<br />Paying for clusters <br />...
Amazon Elastic MapReduce - new features	<br />Preannounce: Expand running clusters<br />Increase number of nodes in a runn...
Use Case: Increase speed of running job flows<br />Speed up job flow execution in response to changing requirements<br />D...
Amazon Elastic MapReduce - new features	<br />Shrink running clusters<br />Decrease number of nodes in a running job flow<...
Use Case: Agile Data Warehouse Cluster<br />Customize cluster size to support varying resource needs (e.g., query support ...
Enterprise customers need more flexibility<br />Configuring Clusters<br />Running Clusters<br />Paying for clusters <br />...
Amazon Elastic MapReduce Price<br />
What is a Spot Instance?<br />Way to purchase & consume EC2 instances based on compute value<br />Reduce your computing co...
M2.xlarge instance pricing history<br />Amazon EC2 On-Demand price for the same instance is $0.50<br />
Amazon Elastic MapReduce – new feature	<br />Spot pricing support for Elastic MapReduce job flows<br />Specify the price y...
Use Case: Manage cost of running job flows<br />Start with 4 On-Demand instances of type m2.xlarge<br />Expand the cluster...
Enterprise customers need more flexibility<br />Configuring Clusters<br />Running Clusters<br />Paying for clusters <br />...
Elastic MapReduce Ecosystem<br />Ecosystem is growing<br />Integrated development environments for Hadoop<br />Tools desig...
Big Data Intelligence software <br />For developers and analysts to work faster and easier<br />Purpose built for all popu...
Free version from<br />www.karmasphere.com<br />Karmasphere Studio<br />Professional Edition<br />Analyst Edition<br />Ric...
Robust Hive implementation
Syntax checking, diagnostics, schema browser, JDBC4 compliance, multi-threaded and concurrent
No cluster changes
Works over proxies and firewalls
Integrated Hadoop monitoring</li></li></ul><li>Datameer Analytics Solution <br />Big data analytics leveraging native Hado...
Web Logs<br />Social Media<br />CRM<br />Sales<br />Excel Files<br />Customer Data<br />Datameer Analytics Solution <br />...
MicroStrategy is a Global Leader in Business Intelligence<br />Corporate Overview<br />Founded in 1989<br />Largest indepe...
Upcoming SlideShare
Loading in...5
×

Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi

2,636

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,636
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
84
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi

  1. 1. Making Hadoop Enterprise ready with Amazon Elastic Map/Reduce<br />Simone Brunozzi<br />Technology Evangelist, Amazon Web Services, APAC<br />twitter: @simon<br />Blog: www.brunozzi.com<br />
  2. 2. What is Elastic MapReduce <br />Use Cases<br />Service Features<br />New Feature Announcements <br />Elastic MapReduce Ecosystem<br />AGENDA<br />
  3. 3. Enables customers to easily, securely and cost-effectively process vast amounts of data.<br />Spin-up 10s or 100s or even 1000s of instances<br />Process 10s or 100s of Terabytes of data<br />Hosted Hadoop framework running on the web-scale infrastructure of Amazon.<br />What is Amazon Elastic MapReduce<br />
  4. 4. <ul><li>Launch and monitor job flows
  5. 5. AWS Management Console
  6. 6. Command line interface
  7. 7. REST API </li></li></ul><li>Why use Amazon Elastic MapReduce<br />Elastic MapReduce removes MUCK from Big Data processing<br />Hard to manage compute clusters<br />Hard to tune Hadoop<br />Hard to monitor running Job Flows<br />Hard to debug Hadoop jobs<br />Hadoop issues prevent smooth operation in the cloud<br />
  8. 8. Problems customers solve with Elastic MapReduce<br />Data mining and BI<br />Log processing, click stream analysis, similarities, advertizing<br />Data warehousing applications<br />Bio-informatics (Genome analysis) <br />Financial simulation (Monte Carlo simulation)<br />File processing (resize jpegs)<br />Web indexing<br />
  9. 9. Web-Scale Data warehousing<br />
  10. 10. Hadoop 0.20<br />Pig 0.6<br />Hive 0.5<br />Cascading 1.1<br />ELASTIC MAPREDUCE – SUPPORTED CONFIGURATIONS<br />Hadoop 0.18<br />Pig 0.3<br />Hive 0.4<br />Cascading 1.1<br />
  11. 11. Apache Hive <br />Batch and Interactive Mode<br />Support Hive Steps<br />Integration with Elastic MapReduce Client and Management Console<br />Load table partitions automatically to/from Amazon S3<br />Optimized data writes to Amazon S3<br />Reference resources such as streaming scripts located on Amazon S3<br />Specify an off-instance metadata store <br />Support variables defined directly in Hive script <br />Supports JDBC and ODBC connections<br />ELASTIC MAPREDUCE – HIVE FEATURES<br />
  12. 12. Apache Pig <br />Batch and interactive mode<br />Support Pig Steps<br />Integration with Elastic MapReduce Client and Management Console<br />Concurrent access to multiple file systems (HDFS, Amazon S3)<br />Reference resources in Amazon S3 directly from Pig script<br />Several User Defined Functions in Piggy Bank<br />ELASTIC MAPREDUCE – PIG FEATURES<br />
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17. Enterprise customers need more flexibility<br />Configuring Clusters<br />Running Clusters<br />Paying for clusters <br />Enterprise customers need more tools <br />Application development <br />Data analytics<br />Enterprise customers need support options<br />Forums support is not enough<br />Amazon Elastic MapReduce For Enterprise <br />
  18. 18. Amazon Elastic MapReduce features <br />Bootstrap actions<br />Run arbitrary scripts before job flow begins <br />Run on all nodes before data processing begins <br />Used for <br />Hadoop configuration (site-conf, Hadoop-conf, etc.)<br />Cluster configuration (memory, swap, etc.)<br />Application/packages installation (app-get install r-base) <br />Several pre-defined bootstrap actions available<br />
  19. 19.
  20. 20. Enterprise customers need more flexibility<br />Configuring Clusters<br />Running Clusters<br />Paying for clusters <br />Enterprise customers need more tools <br />Application development <br />Data analytics<br />Enterprise customers need support options<br />Forum support is not enough <br />Amazon Elastic MapReduce For Enterprise <br />
  21. 21. Amazon Elastic MapReduce - new features <br />Preannounce: Expand running clusters<br />Increase number of nodes in a running cluster<br />Increase processing speed<br />Increasing HDFS size<br />
  22. 22. Use Case: Increase speed of running job flows<br />Speed up job flow execution in response to changing requirements<br />Dynamically balance cost versus performance without restarting a job<br />PREANNOUNCE – EXPAND/SHRINK CLUSTERS<br />Job Flow<br />Job Flow<br />Job Flow<br />3 Hours<br />Allocate <br />4 instances<br />Expand to <br />25 instances<br />Expand to <br />9 instances<br />Time remaining:<br />Time remaining:<br />14 Hours<br />7 Hours<br />Time remaining:<br />
  23. 23. Amazon Elastic MapReduce - new features <br />Shrink running clusters<br />Decrease number of nodes in a running job flow<br />Different capacity requirements from step to step<br />Automatically regulate capacity between steps <br />
  24. 24. Use Case: Agile Data Warehouse Cluster<br />Customize cluster size to support varying resource needs (e.g., query support during the day versus batch processing overnight)<br />Leverage flexibility to reduce costs and increase cluster utilization<br />EXPAND/SHRINK CLUSTERS<br />Data Warehouse<br />(Batch Processing)<br />Data Warehouse<br />(Steady State)<br />Data Warehouse<br />(Steady State)<br />Allocate <br />9 instances<br />Expand to <br />25 instances<br />Shrink to <br />9 instances<br />
  25. 25. Enterprise customers need more flexibility<br />Configuring Clusters<br />Running Clusters<br />Paying for clusters <br />Enterprise customers need more tools <br />Application development <br />Data analytics<br />Enterprise customers need support options<br />Forums support is not enough<br />Amazon Elastic MapReduce For Enterprise <br />
  26. 26. Amazon Elastic MapReduce Price<br />
  27. 27. What is a Spot Instance?<br />Way to purchase & consume EC2 instances based on compute value<br />Reduce your computing costs<br />Bid for unused EC2 capacity<br />Control your costs<br />Differences from On-Demand Instances:<br />Request – maximum price bid<br />Spot Price – what you pay<br />Termination<br />
  28. 28. M2.xlarge instance pricing history<br />Amazon EC2 On-Demand price for the same instance is $0.50<br />
  29. 29. Amazon Elastic MapReduce – new feature <br />Spot pricing support for Elastic MapReduce job flows<br />Specify the price you want to pay for instances<br />Elastic MapReduce takes care of <br />Provisioning<br />Node addition and removal to/from the cluster<br />Can mix On-Demand and Spot instances in the same cluster<br />
  30. 30. Use Case: Manage cost of running job flows<br />Start with 4 On-Demand instances of type m2.xlarge<br />Expand the cluster with 5 Spot Nodes<br />Cost without Spot:<br />4 instances *14 hrs * $0.50 = $28<br />Cost with Spot:<br />4 instances *7 hrs * $0.50 = $13 +<br />5 instances * 7 hrs * $0.25 = $8.75<br />Total = $21.75<br />Savings: ~22%<br />Integration with EC2 Spot<br />Job Flow<br />Job Flow<br />Allocate <br />4 instances<br />Expand to <br />9 instances<br />Time remaining:<br />Time remaining:<br />14 Hours<br />7 Hours<br />
  31. 31. Enterprise customers need more flexibility<br />Configuring Clusters<br />Running Clusters<br />Paying for clusters <br />Enterprise customers need more tools <br />Application development <br />Data analytics<br />Enterprise customers need support options<br />Forums support is not enough<br />Amazon Elastic MapReduce For Enterprise <br />
  32. 32. Elastic MapReduce Ecosystem<br />Ecosystem is growing<br />Integrated development environments for Hadoop<br />Tools designed for data analytics<br />Broad support for Amazon Elastic MapReduce <br />
  33. 33. Big Data Intelligence software <br />For developers and analysts to work faster and easier<br />Purpose built for all popular Hadoop distros and versions<br />Tightly integrated with Elastic MapReduce (since 2009)<br />Built on Karmasphere Application Framework™<br />Native Hadoop client-side platform<br />Karmasphere<br />
  34. 34. Free version from<br />www.karmasphere.com<br />Karmasphere Studio<br />Professional Edition<br />Analyst Edition<br />Rich graphical environment<br />Develop, debug and deploy easily<br />Visualize, manipulate & diagnose<br />Jobs, clusters & file systems<br />Broad and deep Elastic MapReduce support<br />Rapid development<br />Comprehensive profiling<br />Rich debugging<br /><ul><li>SQL interface for ad hoc analysis
  35. 35. Robust Hive implementation
  36. 36. Syntax checking, diagnostics, schema browser, JDBC4 compliance, multi-threaded and concurrent
  37. 37. No cluster changes
  38. 38. Works over proxies and firewalls
  39. 39. Integrated Hadoop monitoring</li></li></ul><li>Datameer Analytics Solution <br />Big data analytics leveraging native Hadoop<br />Extreme scale and performance<br />Seamless elastic scale on Amazon Elastic MapReduce<br />Empowering business users<br />UI Driven <br />no programming, no modeling, no schema, no ETL<br />
  40. 40. Web Logs<br />Social Media<br />CRM<br />Sales<br />Excel Files<br />Customer Data<br />Datameer Analytics Solution <br />Amazon Elastic MapReduce <br />
  41. 41. MicroStrategy is a Global Leader in Business Intelligence<br />Corporate Overview<br />Founded in 1989<br />Largest independent public BI vendor (NASDAQ: MSTR)<br />Positioned in the Gartner “Leader Quadrant” for BI Platforms<br />Over 1 million business users at over 3,000 organizations<br />The MicroStrategy 9 business intelligence platform enables mobile apps, dashboards, reporting and analytics with your business data<br />Build once, deliver instantly and securely any time, to any device<br />
  42. 42. What can you do with MicroStrategy and Amazon Elastic MapReduce?<br />Deliver insights to a broader range of users. <br />End users interact with a point-and-click interface to query data without writing HiveQL or MapReduce jobs<br />Use cases:<br />Mobile Apps: Floor manager accesses order details stored in Amazon Elastic MapReduce through a custom iPhone App <br />Dashboards: End user starts with a Dynamic Dashboard populated from data mart or data warehouse. The user then drills to a detail report that executes in Amazon Elastic MapReduce.<br />Reporting: Application developer builds a parameterized HiveQL report, then schedules it to execute. Jobs execute against Amazon Elastic MapReduce and MicroStrategy sends out exception based alerts via email to end users.<br />Analysis: Application developer populates a multidimensional cache in MicroStrategy with results of a HiveQL query. End user uses MicroStrategy’s web interface to slice-and-dice through results without going back to Hadoop.<br />
  43. 43. How can I learn more?<br />Try it!<br />Free MicroStrategy software is available at: http://www.microstrategy.com/freereportingsoftware<br />Get More information about Microstrategy solutions for Amazon Elastic MapReduce http://aws.amazon.com/solutions/solution-providers/microstrategy<br />
  44. 44. Enterprise customers need more flexibility<br />Configuring Clusters<br />Running Clusters<br />Paying for clusters <br />Enterprise customers need more tools <br />Application development <br />Data analytics<br />Enterprise customers need more support options<br />Forums support is not enough<br />Amazon Elastic MapReduce For Enterprise <br />
  45. 45. Elastic MapReduce - Support<br />
  46. 46. Enterprise customers need more flexibility<br />Configuring Clusters<br />Running Clusters<br />Paying for clusters <br />Enterprise customers need more tools <br />Application development <br />Data analytics<br />Enterprise customers need more support options<br />Forums support is not enough<br />Amazon Elastic MapReduce For Enterprise <br />
  47. 47. Making Hadoop Enterprise ready with Amazon Elastic Map/Reduce<br />Simone Brunozzi<br />Technology Evangelist, Amazon Web Services, APAC<br />twitter: @simon<br />Blog: www.brunozzi.com<br />
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×