• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi
 

Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi

on

  • 2,865 views

 

Statistics

Views

Total Views
2,865
Views on SlideShare
2,715
Embed Views
150

Actions

Likes
1
Downloads
81
Comments
0

1 Embed 150

http://d.hatena.ne.jp 150

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi Presentation Transcript

    • Making Hadoop Enterprise ready with Amazon Elastic Map/Reduce
      Simone Brunozzi
      Technology Evangelist, Amazon Web Services, APAC
      twitter: @simon
      Blog: www.brunozzi.com
    • What is Elastic MapReduce
      Use Cases
      Service Features
      New Feature Announcements
      Elastic MapReduce Ecosystem
      AGENDA
    • Enables customers to easily, securely and cost-effectively process vast amounts of data.
      Spin-up 10s or 100s or even 1000s of instances
      Process 10s or 100s of Terabytes of data
      Hosted Hadoop framework running on the web-scale infrastructure of Amazon.
      What is Amazon Elastic MapReduce
      • Launch and monitor job flows
      • AWS Management Console
      • Command line interface
      • REST API
    • Why use Amazon Elastic MapReduce
      Elastic MapReduce removes MUCK from Big Data processing
      Hard to manage compute clusters
      Hard to tune Hadoop
      Hard to monitor running Job Flows
      Hard to debug Hadoop jobs
      Hadoop issues prevent smooth operation in the cloud
    • Problems customers solve with Elastic MapReduce
      Data mining and BI
      Log processing, click stream analysis, similarities, advertizing
      Data warehousing applications
      Bio-informatics (Genome analysis)
      Financial simulation (Monte Carlo simulation)
      File processing (resize jpegs)
      Web indexing
    • Web-Scale Data warehousing
    • Hadoop 0.20
      Pig 0.6
      Hive 0.5
      Cascading 1.1
      ELASTIC MAPREDUCE – SUPPORTED CONFIGURATIONS
      Hadoop 0.18
      Pig 0.3
      Hive 0.4
      Cascading 1.1
    • Apache Hive
      Batch and Interactive Mode
      Support Hive Steps
      Integration with Elastic MapReduce Client and Management Console
      Load table partitions automatically to/from Amazon S3
      Optimized data writes to Amazon S3
      Reference resources such as streaming scripts located on Amazon S3
      Specify an off-instance metadata store
      Support variables defined directly in Hive script
      Supports JDBC and ODBC connections
      ELASTIC MAPREDUCE – HIVE FEATURES
    • Apache Pig
      Batch and interactive mode
      Support Pig Steps
      Integration with Elastic MapReduce Client and Management Console
      Concurrent access to multiple file systems (HDFS, Amazon S3)
      Reference resources in Amazon S3 directly from Pig script
      Several User Defined Functions in Piggy Bank
      ELASTIC MAPREDUCE – PIG FEATURES
    • Enterprise customers need more flexibility
      Configuring Clusters
      Running Clusters
      Paying for clusters
      Enterprise customers need more tools
      Application development
      Data analytics
      Enterprise customers need support options
      Forums support is not enough
      Amazon Elastic MapReduce For Enterprise
    • Amazon Elastic MapReduce features
      Bootstrap actions
      Run arbitrary scripts before job flow begins
      Run on all nodes before data processing begins
      Used for
      Hadoop configuration (site-conf, Hadoop-conf, etc.)
      Cluster configuration (memory, swap, etc.)
      Application/packages installation (app-get install r-base)
      Several pre-defined bootstrap actions available
    • Enterprise customers need more flexibility
      Configuring Clusters
      Running Clusters
      Paying for clusters
      Enterprise customers need more tools
      Application development
      Data analytics
      Enterprise customers need support options
      Forum support is not enough
      Amazon Elastic MapReduce For Enterprise
    • Amazon Elastic MapReduce - new features
      Preannounce: Expand running clusters
      Increase number of nodes in a running cluster
      Increase processing speed
      Increasing HDFS size
    • Use Case: Increase speed of running job flows
      Speed up job flow execution in response to changing requirements
      Dynamically balance cost versus performance without restarting a job
      PREANNOUNCE – EXPAND/SHRINK CLUSTERS
      Job Flow
      Job Flow
      Job Flow
      3 Hours
      Allocate
      4 instances
      Expand to
      25 instances
      Expand to
      9 instances
      Time remaining:
      Time remaining:
      14 Hours
      7 Hours
      Time remaining:
    • Amazon Elastic MapReduce - new features
      Shrink running clusters
      Decrease number of nodes in a running job flow
      Different capacity requirements from step to step
      Automatically regulate capacity between steps
    • Use Case: Agile Data Warehouse Cluster
      Customize cluster size to support varying resource needs (e.g., query support during the day versus batch processing overnight)
      Leverage flexibility to reduce costs and increase cluster utilization
      EXPAND/SHRINK CLUSTERS
      Data Warehouse
      (Batch Processing)
      Data Warehouse
      (Steady State)
      Data Warehouse
      (Steady State)
      Allocate
      9 instances
      Expand to
      25 instances
      Shrink to
      9 instances
    • Enterprise customers need more flexibility
      Configuring Clusters
      Running Clusters
      Paying for clusters
      Enterprise customers need more tools
      Application development
      Data analytics
      Enterprise customers need support options
      Forums support is not enough
      Amazon Elastic MapReduce For Enterprise
    • Amazon Elastic MapReduce Price
    • What is a Spot Instance?
      Way to purchase & consume EC2 instances based on compute value
      Reduce your computing costs
      Bid for unused EC2 capacity
      Control your costs
      Differences from On-Demand Instances:
      Request – maximum price bid
      Spot Price – what you pay
      Termination
    • M2.xlarge instance pricing history
      Amazon EC2 On-Demand price for the same instance is $0.50
    • Amazon Elastic MapReduce – new feature
      Spot pricing support for Elastic MapReduce job flows
      Specify the price you want to pay for instances
      Elastic MapReduce takes care of
      Provisioning
      Node addition and removal to/from the cluster
      Can mix On-Demand and Spot instances in the same cluster
    • Use Case: Manage cost of running job flows
      Start with 4 On-Demand instances of type m2.xlarge
      Expand the cluster with 5 Spot Nodes
      Cost without Spot:
      4 instances *14 hrs * $0.50 = $28
      Cost with Spot:
      4 instances *7 hrs * $0.50 = $13 +
      5 instances * 7 hrs * $0.25 = $8.75
      Total = $21.75
      Savings: ~22%
      Integration with EC2 Spot
      Job Flow
      Job Flow
      Allocate
      4 instances
      Expand to
      9 instances
      Time remaining:
      Time remaining:
      14 Hours
      7 Hours
    • Enterprise customers need more flexibility
      Configuring Clusters
      Running Clusters
      Paying for clusters
      Enterprise customers need more tools
      Application development
      Data analytics
      Enterprise customers need support options
      Forums support is not enough
      Amazon Elastic MapReduce For Enterprise
    • Elastic MapReduce Ecosystem
      Ecosystem is growing
      Integrated development environments for Hadoop
      Tools designed for data analytics
      Broad support for Amazon Elastic MapReduce
    • Big Data Intelligence software
      For developers and analysts to work faster and easier
      Purpose built for all popular Hadoop distros and versions
      Tightly integrated with Elastic MapReduce (since 2009)
      Built on Karmasphere Application Framework™
      Native Hadoop client-side platform
      Karmasphere
    • Free version from
      www.karmasphere.com
      Karmasphere Studio
      Professional Edition
      Analyst Edition
      Rich graphical environment
      Develop, debug and deploy easily
      Visualize, manipulate & diagnose
      Jobs, clusters & file systems
      Broad and deep Elastic MapReduce support
      Rapid development
      Comprehensive profiling
      Rich debugging
      • SQL interface for ad hoc analysis
      • Robust Hive implementation
      • Syntax checking, diagnostics, schema browser, JDBC4 compliance, multi-threaded and concurrent
      • No cluster changes
      • Works over proxies and firewalls
      • Integrated Hadoop monitoring
    • Datameer Analytics Solution
      Big data analytics leveraging native Hadoop
      Extreme scale and performance
      Seamless elastic scale on Amazon Elastic MapReduce
      Empowering business users
      UI Driven
      no programming, no modeling, no schema, no ETL
    • Web Logs
      Social Media
      CRM
      Sales
      Excel Files
      Customer Data
      Datameer Analytics Solution
      Amazon Elastic MapReduce
    • MicroStrategy is a Global Leader in Business Intelligence
      Corporate Overview
      Founded in 1989
      Largest independent public BI vendor (NASDAQ: MSTR)
      Positioned in the Gartner “Leader Quadrant” for BI Platforms
      Over 1 million business users at over 3,000 organizations
      The MicroStrategy 9 business intelligence platform enables mobile apps, dashboards, reporting and analytics with your business data
      Build once, deliver instantly and securely any time, to any device
    • What can you do with MicroStrategy and Amazon Elastic MapReduce?
      Deliver insights to a broader range of users.
      End users interact with a point-and-click interface to query data without writing HiveQL or MapReduce jobs
      Use cases:
      Mobile Apps: Floor manager accesses order details stored in Amazon Elastic MapReduce through a custom iPhone App
      Dashboards: End user starts with a Dynamic Dashboard populated from data mart or data warehouse. The user then drills to a detail report that executes in Amazon Elastic MapReduce.
      Reporting: Application developer builds a parameterized HiveQL report, then schedules it to execute. Jobs execute against Amazon Elastic MapReduce and MicroStrategy sends out exception based alerts via email to end users.
      Analysis: Application developer populates a multidimensional cache in MicroStrategy with results of a HiveQL query. End user uses MicroStrategy’s web interface to slice-and-dice through results without going back to Hadoop.
    • How can I learn more?
      Try it!
      Free MicroStrategy software is available at: http://www.microstrategy.com/freereportingsoftware
      Get More information about Microstrategy solutions for Amazon Elastic MapReduce http://aws.amazon.com/solutions/solution-providers/microstrategy
    • Enterprise customers need more flexibility
      Configuring Clusters
      Running Clusters
      Paying for clusters
      Enterprise customers need more tools
      Application development
      Data analytics
      Enterprise customers need more support options
      Forums support is not enough
      Amazon Elastic MapReduce For Enterprise
    • Elastic MapReduce - Support
    • Enterprise customers need more flexibility
      Configuring Clusters
      Running Clusters
      Paying for clusters
      Enterprise customers need more tools
      Application development
      Data analytics
      Enterprise customers need more support options
      Forums support is not enough
      Amazon Elastic MapReduce For Enterprise
    • Making Hadoop Enterprise ready with Amazon Elastic Map/Reduce
      Simone Brunozzi
      Technology Evangelist, Amazon Web Services, APAC
      twitter: @simon
      Blog: www.brunozzi.com