Your SlideShare is downloading. ×
0
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce" by Sharad Agrawal
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce" by Sharad Agrawal
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce" by Sharad Agrawal
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce" by Sharad Agrawal
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce" by Sharad Agrawal
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce" by Sharad Agrawal
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce" by Sharad Agrawal
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce" by Sharad Agrawal
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce" by Sharad Agrawal
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce" by Sharad Agrawal
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce" by Sharad Agrawal
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce" by Sharad Agrawal
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce" by Sharad Agrawal
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce" by Sharad Agrawal
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce" by Sharad Agrawal
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce" by Sharad Agrawal
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce" by Sharad Agrawal
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce" by Sharad Agrawal

2,320

Published on

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,320
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
79
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. The Next Generation of
    Hadoop Map-Reduce
    Sharad Agarwal
    sharadag@yahoo-inc.com
    sharad@apache.org
  • 2. About Me
    Hadoop Committer and PMC member
    Architect at Yahoo!
  • 3. Hadoop Map-Reduce Today
    JobTracker
    Manages cluster resources and job scheduling
    TaskTracker
    Per-node agent
    Manage tasks
  • 4. Current Limitations
    Scalability
    Maximum Cluster size – 4,000 nodes
    Maximum concurrent tasks – 40,000
    Coarse synchronization in JobTracker
    Single point of failure
    Failure kills all queued and running jobs
    Jobs need to be re-submitted by users
    Restart is very tricky due to complex state
    Hard partition of resources into map and reduce slots
  • 5. Current Limitations
    Lacks support for alternate paradigms
    Iterative applications implemented using Map-Reduce are 10x slower.
    Example: K-Means, PageRank
    Lack of wire-compatible protocols
    Client and cluster must be of same version
    Applications and workflows cannot migrate to different clusters
  • 6. Next Generation Map-Reduce Requirements
    Reliability
    Availability
    Scalability - Clusters of 6,000 machines
    Each machine with 16 cores, 48G RAM, 24TB disks
    100,000 concurrent tasks
    10,000 concurrent jobs
    Wire Compatibility
    Agility & Evolution – Ability for customers to control upgrades to the grid software stack.
  • 7. Next Generation Map-Reduce – Design Centre
    Split up the two major functions of JobTracker
    Cluster resource management
    Application life-cycle management
    Map-Reduce becomes user-land library
  • 8. Architecture
  • 9. Architecture
    Resource Manager
    Global resource scheduler
    Hierarchical queues
    Node Manager
    Per-machine agent
    Manages the life-cycle of container
    Container resource monitoring
    Application Master
    Per-application
    Manages application scheduling and task execution
    E.g. Map-Reduce Application Master
  • 10. Improvements vis-à-vis current Map-Reduce
    Scalability
    Application life-cycle management is very expensive
    Partition resource management and application life-cycle management
    Application management is distributed
    Hardware trends - Currently run clusters of 4,000 machines
    6,000 2012 machines > 12,000 2009 machines
    <8 cores, 16G, 4TB> v/s <16+ cores, 48/96G, 24TB>
  • 11. Improvements vis-à-vis current Map-Reduce
    Availability
    Application Master
    Optional failover via application-specific checkpoint
    Map-Reduce applications pick up where they left off
    Resource Manager
    No single point of failure - failover via ZooKeeper
    Application Masters are restarted automatically
  • 12. Improvements vis-à-vis current Map-Reduce
    Wire Compatibility
    Protocols are wire-compatible
    Old clients can talk to new servers
    Rolling upgrades
  • 13. Improvements vis-à-vis current Map-Reduce
    Agility / Evolution
    Map-Reduce now becomes a user-land library
    Multiple versions of Map-Reduce can run in the same cluster (ala Apache Pig)
    Faster deployment cycles for improvements
    Customers upgrade Map-Reduce versions on their schedule
  • 14. Improvements vis-à-vis current Map-Reduce
    Utilization
    Generic resource model
    Memory
    CPU
    Disk b/w
    Network b/w
    Remove fixed partition of map and reduce slots
  • 15. Improvements vis-à-vis current Map-Reduce
    Support for programming paradigms other than Map-Reduce
    MPI
    Master-Worker
    Machine Learning
    Iterative processing
    Enabled by allowing use of paradigm-specific Application Master
    Run all on the same Hadoop cluster
  • 16. Summary
    The next generation of Map-Reduce takes Hadoop to the next level
    Scale-out even further
    High availability
    Cluster Utilization
    Support for paradigms other than Map-Reduce
  • 17. Questions?

×