Your SlideShare is downloading. ×
0
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi

2,625

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,625
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
84
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Making Hadoop Enterprise ready with Amazon Elastic Map/Reduce<br />Simone Brunozzi<br />Technology Evangelist, Amazon Web Services, APAC<br />twitter: @simon<br />Blog: www.brunozzi.com<br />
  • 2. What is Elastic MapReduce <br />Use Cases<br />Service Features<br />New Feature Announcements <br />Elastic MapReduce Ecosystem<br />AGENDA<br />
  • 3. Enables customers to easily, securely and cost-effectively process vast amounts of data.<br />Spin-up 10s or 100s or even 1000s of instances<br />Process 10s or 100s of Terabytes of data<br />Hosted Hadoop framework running on the web-scale infrastructure of Amazon.<br />What is Amazon Elastic MapReduce<br />
  • 4. <ul><li>Launch and monitor job flows
  • 5. AWS Management Console
  • 6. Command line interface
  • 7. REST API </li></li></ul><li>Why use Amazon Elastic MapReduce<br />Elastic MapReduce removes MUCK from Big Data processing<br />Hard to manage compute clusters<br />Hard to tune Hadoop<br />Hard to monitor running Job Flows<br />Hard to debug Hadoop jobs<br />Hadoop issues prevent smooth operation in the cloud<br />
  • 8. Problems customers solve with Elastic MapReduce<br />Data mining and BI<br />Log processing, click stream analysis, similarities, advertizing<br />Data warehousing applications<br />Bio-informatics (Genome analysis) <br />Financial simulation (Monte Carlo simulation)<br />File processing (resize jpegs)<br />Web indexing<br />
  • 9. Web-Scale Data warehousing<br />
  • 10. Hadoop 0.20<br />Pig 0.6<br />Hive 0.5<br />Cascading 1.1<br />ELASTIC MAPREDUCE – SUPPORTED CONFIGURATIONS<br />Hadoop 0.18<br />Pig 0.3<br />Hive 0.4<br />Cascading 1.1<br />
  • 11. Apache Hive <br />Batch and Interactive Mode<br />Support Hive Steps<br />Integration with Elastic MapReduce Client and Management Console<br />Load table partitions automatically to/from Amazon S3<br />Optimized data writes to Amazon S3<br />Reference resources such as streaming scripts located on Amazon S3<br />Specify an off-instance metadata store <br />Support variables defined directly in Hive script <br />Supports JDBC and ODBC connections<br />ELASTIC MAPREDUCE – HIVE FEATURES<br />
  • 12. Apache Pig <br />Batch and interactive mode<br />Support Pig Steps<br />Integration with Elastic MapReduce Client and Management Console<br />Concurrent access to multiple file systems (HDFS, Amazon S3)<br />Reference resources in Amazon S3 directly from Pig script<br />Several User Defined Functions in Piggy Bank<br />ELASTIC MAPREDUCE – PIG FEATURES<br />
  • 13.
  • 14.
  • 15.
  • 16.
  • 17. Enterprise customers need more flexibility<br />Configuring Clusters<br />Running Clusters<br />Paying for clusters <br />Enterprise customers need more tools <br />Application development <br />Data analytics<br />Enterprise customers need support options<br />Forums support is not enough<br />Amazon Elastic MapReduce For Enterprise <br />
  • 18. Amazon Elastic MapReduce features <br />Bootstrap actions<br />Run arbitrary scripts before job flow begins <br />Run on all nodes before data processing begins <br />Used for <br />Hadoop configuration (site-conf, Hadoop-conf, etc.)<br />Cluster configuration (memory, swap, etc.)<br />Application/packages installation (app-get install r-base) <br />Several pre-defined bootstrap actions available<br />
  • 19.
  • 20. Enterprise customers need more flexibility<br />Configuring Clusters<br />Running Clusters<br />Paying for clusters <br />Enterprise customers need more tools <br />Application development <br />Data analytics<br />Enterprise customers need support options<br />Forum support is not enough <br />Amazon Elastic MapReduce For Enterprise <br />
  • 21. Amazon Elastic MapReduce - new features <br />Preannounce: Expand running clusters<br />Increase number of nodes in a running cluster<br />Increase processing speed<br />Increasing HDFS size<br />
  • 22. Use Case: Increase speed of running job flows<br />Speed up job flow execution in response to changing requirements<br />Dynamically balance cost versus performance without restarting a job<br />PREANNOUNCE – EXPAND/SHRINK CLUSTERS<br />Job Flow<br />Job Flow<br />Job Flow<br />3 Hours<br />Allocate <br />4 instances<br />Expand to <br />25 instances<br />Expand to <br />9 instances<br />Time remaining:<br />Time remaining:<br />14 Hours<br />7 Hours<br />Time remaining:<br />
  • 23. Amazon Elastic MapReduce - new features <br />Shrink running clusters<br />Decrease number of nodes in a running job flow<br />Different capacity requirements from step to step<br />Automatically regulate capacity between steps <br />
  • 24. Use Case: Agile Data Warehouse Cluster<br />Customize cluster size to support varying resource needs (e.g., query support during the day versus batch processing overnight)<br />Leverage flexibility to reduce costs and increase cluster utilization<br />EXPAND/SHRINK CLUSTERS<br />Data Warehouse<br />(Batch Processing)<br />Data Warehouse<br />(Steady State)<br />Data Warehouse<br />(Steady State)<br />Allocate <br />9 instances<br />Expand to <br />25 instances<br />Shrink to <br />9 instances<br />
  • 25. Enterprise customers need more flexibility<br />Configuring Clusters<br />Running Clusters<br />Paying for clusters <br />Enterprise customers need more tools <br />Application development <br />Data analytics<br />Enterprise customers need support options<br />Forums support is not enough<br />Amazon Elastic MapReduce For Enterprise <br />
  • 26. Amazon Elastic MapReduce Price<br />
  • 27. What is a Spot Instance?<br />Way to purchase & consume EC2 instances based on compute value<br />Reduce your computing costs<br />Bid for unused EC2 capacity<br />Control your costs<br />Differences from On-Demand Instances:<br />Request – maximum price bid<br />Spot Price – what you pay<br />Termination<br />
  • 28. M2.xlarge instance pricing history<br />Amazon EC2 On-Demand price for the same instance is $0.50<br />
  • 29. Amazon Elastic MapReduce – new feature <br />Spot pricing support for Elastic MapReduce job flows<br />Specify the price you want to pay for instances<br />Elastic MapReduce takes care of <br />Provisioning<br />Node addition and removal to/from the cluster<br />Can mix On-Demand and Spot instances in the same cluster<br />
  • 30. Use Case: Manage cost of running job flows<br />Start with 4 On-Demand instances of type m2.xlarge<br />Expand the cluster with 5 Spot Nodes<br />Cost without Spot:<br />4 instances *14 hrs * $0.50 = $28<br />Cost with Spot:<br />4 instances *7 hrs * $0.50 = $13 +<br />5 instances * 7 hrs * $0.25 = $8.75<br />Total = $21.75<br />Savings: ~22%<br />Integration with EC2 Spot<br />Job Flow<br />Job Flow<br />Allocate <br />4 instances<br />Expand to <br />9 instances<br />Time remaining:<br />Time remaining:<br />14 Hours<br />7 Hours<br />
  • 31. Enterprise customers need more flexibility<br />Configuring Clusters<br />Running Clusters<br />Paying for clusters <br />Enterprise customers need more tools <br />Application development <br />Data analytics<br />Enterprise customers need support options<br />Forums support is not enough<br />Amazon Elastic MapReduce For Enterprise <br />
  • 32. Elastic MapReduce Ecosystem<br />Ecosystem is growing<br />Integrated development environments for Hadoop<br />Tools designed for data analytics<br />Broad support for Amazon Elastic MapReduce <br />
  • 33. Big Data Intelligence software <br />For developers and analysts to work faster and easier<br />Purpose built for all popular Hadoop distros and versions<br />Tightly integrated with Elastic MapReduce (since 2009)<br />Built on Karmasphere Application Framework™<br />Native Hadoop client-side platform<br />Karmasphere<br />
  • 34. Free version from<br />www.karmasphere.com<br />Karmasphere Studio<br />Professional Edition<br />Analyst Edition<br />Rich graphical environment<br />Develop, debug and deploy easily<br />Visualize, manipulate & diagnose<br />Jobs, clusters & file systems<br />Broad and deep Elastic MapReduce support<br />Rapid development<br />Comprehensive profiling<br />Rich debugging<br /><ul><li>SQL interface for ad hoc analysis
  • 35. Robust Hive implementation
  • 36. Syntax checking, diagnostics, schema browser, JDBC4 compliance, multi-threaded and concurrent
  • 37. No cluster changes
  • 38. Works over proxies and firewalls
  • 39. Integrated Hadoop monitoring</li></li></ul><li>Datameer Analytics Solution <br />Big data analytics leveraging native Hadoop<br />Extreme scale and performance<br />Seamless elastic scale on Amazon Elastic MapReduce<br />Empowering business users<br />UI Driven <br />no programming, no modeling, no schema, no ETL<br />
  • 40. Web Logs<br />Social Media<br />CRM<br />Sales<br />Excel Files<br />Customer Data<br />Datameer Analytics Solution <br />Amazon Elastic MapReduce <br />
  • 41. MicroStrategy is a Global Leader in Business Intelligence<br />Corporate Overview<br />Founded in 1989<br />Largest independent public BI vendor (NASDAQ: MSTR)<br />Positioned in the Gartner “Leader Quadrant” for BI Platforms<br />Over 1 million business users at over 3,000 organizations<br />The MicroStrategy 9 business intelligence platform enables mobile apps, dashboards, reporting and analytics with your business data<br />Build once, deliver instantly and securely any time, to any device<br />
  • 42. What can you do with MicroStrategy and Amazon Elastic MapReduce?<br />Deliver insights to a broader range of users. <br />End users interact with a point-and-click interface to query data without writing HiveQL or MapReduce jobs<br />Use cases:<br />Mobile Apps: Floor manager accesses order details stored in Amazon Elastic MapReduce through a custom iPhone App <br />Dashboards: End user starts with a Dynamic Dashboard populated from data mart or data warehouse. The user then drills to a detail report that executes in Amazon Elastic MapReduce.<br />Reporting: Application developer builds a parameterized HiveQL report, then schedules it to execute. Jobs execute against Amazon Elastic MapReduce and MicroStrategy sends out exception based alerts via email to end users.<br />Analysis: Application developer populates a multidimensional cache in MicroStrategy with results of a HiveQL query. End user uses MicroStrategy’s web interface to slice-and-dice through results without going back to Hadoop.<br />
  • 43. How can I learn more?<br />Try it!<br />Free MicroStrategy software is available at: http://www.microstrategy.com/freereportingsoftware<br />Get More information about Microstrategy solutions for Amazon Elastic MapReduce http://aws.amazon.com/solutions/solution-providers/microstrategy<br />
  • 44. Enterprise customers need more flexibility<br />Configuring Clusters<br />Running Clusters<br />Paying for clusters <br />Enterprise customers need more tools <br />Application development <br />Data analytics<br />Enterprise customers need more support options<br />Forums support is not enough<br />Amazon Elastic MapReduce For Enterprise <br />
  • 45. Elastic MapReduce - Support<br />
  • 46. Enterprise customers need more flexibility<br />Configuring Clusters<br />Running Clusters<br />Paying for clusters <br />Enterprise customers need more tools <br />Application development <br />Data analytics<br />Enterprise customers need more support options<br />Forums support is not enough<br />Amazon Elastic MapReduce For Enterprise <br />
  • 47. Making Hadoop Enterprise ready with Amazon Elastic Map/Reduce<br />Simone Brunozzi<br />Technology Evangelist, Amazon Web Services, APAC<br />twitter: @simon<br />Blog: www.brunozzi.com<br />

×