Hadoop Virtualization ExtensionsJunping DuSr.MTS, VMware, Inc                                   © 2009 VMware Inc. All rig...
Project HVE (Hadoop Virtualization Extensions) Refine Hadoop for running on virtualized infrastructure    • Enable multip...
Current Network Topology                                /                D1                             D1                ...
High Level View on HVE changes4
Additional network topology layer to aware virtuliazation                                                                 ...
“Virtualization Aware” Replica Placement Policy                                           Updated Policies:               ...
“Virtualization Aware” Replica Choosing Policy                                           Distances for data locality:     ...
“Virtualization Aware” Balancer Policy                                • Balancer policies contains two levels             ...
“Virtualization Aware” Task Scheduling Policy                                          Get task split for TaskTracker or  ...
HVE Effects on Reliability and Performance10
Summary Hadoop Virtualization Extensions • Network Topology with additional layer • Replica placement/removal/choosing po...
References Hadoop at VMware • www.vmware.com/hadoop Project Serengeti • projectserengeti.org Umbrella JIRA for HVE • ht...
Q&A     Thank you!13
Upcoming SlideShare
Loading in …5
×

Hadoop virtualization extensions hadoop world meetup

1,039 views

Published on

  • Be the first to comment

  • Be the first to like this

Hadoop virtualization extensions hadoop world meetup

  1. 1. Hadoop Virtualization ExtensionsJunping DuSr.MTS, VMware, Inc © 2009 VMware Inc. All rights reserved
  2. 2. Project HVE (Hadoop Virtualization Extensions) Refine Hadoop for running on virtualized infrastructure • Enable multiple-layer network topology • Enable resource sharing • Enable compute/data node separation without losing locality Patches are contributed back to Apache Hadoop Community • http://www.vmware.com/hadoop • Umbrella JIRA: HADOOP-8468 • Sub JIRAs: HADOOP-8469, HADOOP-8470, HADOOP-8817, HDFS-3495, HDFS-3498, HDFS-3461, MAPREDUCE-4660, YARN-18, etc.2
  3. 3. Current Network Topology / D1 D1 • D = data center • R = rack R1 R2 R3 R4 • H = host H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12However, you have more choices on virtualized infrastructure • C = compute node (TaskTracker) • D = data node 3
  4. 4. High Level View on HVE changes4
  5. 5. Additional network topology layer to aware virtuliazation • D = data center • R = rack • NG = node group / • HG = node D1 D2 R1 R2 R3 R4 NG1 NG2 NG3 NG4 NG5 NG6 NG7 NG8N1 N2 N3 N4 N5 N6 N7 N8 N9 N10 N11 N12 N135
  6. 6. “Virtualization Aware” Replica Placement Policy Updated Policies: • No replicas are placed on the same node or nodes under the same node group • 1st replica is on the local node or one of nodes under the same node group of the writer • 2nd replica is on a remote rack of the 1st replica • 3rd replica is on the same rack as the 2nd replica • Remaining replicas are placed randomly across rack to meet minimum restriction.6
  7. 7. “Virtualization Aware” Replica Choosing Policy Distances for data locality: • Node local (0) • Node group local (2) • Rack local (4) • Off rack (6)7
  8. 8. “Virtualization Aware” Balancer Policy • Balancer policies contains two levels choosing policy - choosing node pairs of source and target, in sequence of: local node group, local rack, off rack - choosing blocks to move within node pair, a replica block is not a good candidate if another replica is on the target node or on the same node group of the target node8
  9. 9. “Virtualization Aware” Task Scheduling Policy Get task split for TaskTracker or NodeManager in following sequences: • Node local • Node group local • Rack local • Off rack It works well with • FifoScheduler • FairScheduler • Capacity scheduler9
  10. 10. HVE Effects on Reliability and Performance10
  11. 11. Summary Hadoop Virtualization Extensions • Network Topology with additional layer • Replica placement/removal/choosing policies extension • Balancer policy extension • Task Scheduling policy extension HVE effect • Reliability – multiple DN VMs per host • Performance – DN/CN separation case11
  12. 12. References Hadoop at VMware • www.vmware.com/hadoop Project Serengeti • projectserengeti.org Umbrella JIRA for HVE • https://issues.apache.org/jira/browse/HADOOP-8468 Serengeti Hadoop on vSphere • Talks @ Hadoop World, Hadoop Summit • White Papers Spring for Apache Hadoop • http://blog.springsource.org/2012/02/29/introducing-spring-hadoop12
  13. 13. Q&A Thank you!13

×