Your SlideShare is downloading. ×
0
Hadoop virtualization extensions hadoop world meetup
Hadoop virtualization extensions hadoop world meetup
Hadoop virtualization extensions hadoop world meetup
Hadoop virtualization extensions hadoop world meetup
Hadoop virtualization extensions hadoop world meetup
Hadoop virtualization extensions hadoop world meetup
Hadoop virtualization extensions hadoop world meetup
Hadoop virtualization extensions hadoop world meetup
Hadoop virtualization extensions hadoop world meetup
Hadoop virtualization extensions hadoop world meetup
Hadoop virtualization extensions hadoop world meetup
Hadoop virtualization extensions hadoop world meetup
Hadoop virtualization extensions hadoop world meetup
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Hadoop virtualization extensions hadoop world meetup

591

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
591
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
37
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Hadoop Virtualization ExtensionsJunping DuSr.MTS, VMware, Inc © 2009 VMware Inc. All rights reserved
  • 2. Project HVE (Hadoop Virtualization Extensions) Refine Hadoop for running on virtualized infrastructure • Enable multiple-layer network topology • Enable resource sharing • Enable compute/data node separation without losing locality Patches are contributed back to Apache Hadoop Community • http://www.vmware.com/hadoop • Umbrella JIRA: HADOOP-8468 • Sub JIRAs: HADOOP-8469, HADOOP-8470, HADOOP-8817, HDFS-3495, HDFS-3498, HDFS-3461, MAPREDUCE-4660, YARN-18, etc.2
  • 3. Current Network Topology / D1 D1 • D = data center • R = rack R1 R2 R3 R4 • H = host H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12However, you have more choices on virtualized infrastructure • C = compute node (TaskTracker) • D = data node 3
  • 4. High Level View on HVE changes4
  • 5. Additional network topology layer to aware virtuliazation • D = data center • R = rack • NG = node group / • HG = node D1 D2 R1 R2 R3 R4 NG1 NG2 NG3 NG4 NG5 NG6 NG7 NG8N1 N2 N3 N4 N5 N6 N7 N8 N9 N10 N11 N12 N135
  • 6. “Virtualization Aware” Replica Placement Policy Updated Policies: • No replicas are placed on the same node or nodes under the same node group • 1st replica is on the local node or one of nodes under the same node group of the writer • 2nd replica is on a remote rack of the 1st replica • 3rd replica is on the same rack as the 2nd replica • Remaining replicas are placed randomly across rack to meet minimum restriction.6
  • 7. “Virtualization Aware” Replica Choosing Policy Distances for data locality: • Node local (0) • Node group local (2) • Rack local (4) • Off rack (6)7
  • 8. “Virtualization Aware” Balancer Policy • Balancer policies contains two levels choosing policy - choosing node pairs of source and target, in sequence of: local node group, local rack, off rack - choosing blocks to move within node pair, a replica block is not a good candidate if another replica is on the target node or on the same node group of the target node8
  • 9. “Virtualization Aware” Task Scheduling Policy Get task split for TaskTracker or NodeManager in following sequences: • Node local • Node group local • Rack local • Off rack It works well with • FifoScheduler • FairScheduler • Capacity scheduler9
  • 10. HVE Effects on Reliability and Performance10
  • 11. Summary Hadoop Virtualization Extensions • Network Topology with additional layer • Replica placement/removal/choosing policies extension • Balancer policy extension • Task Scheduling policy extension HVE effect • Reliability – multiple DN VMs per host • Performance – DN/CN separation case11
  • 12. References Hadoop at VMware • www.vmware.com/hadoop Project Serengeti • projectserengeti.org Umbrella JIRA for HVE • https://issues.apache.org/jira/browse/HADOOP-8468 Serengeti Hadoop on vSphere • Talks @ Hadoop World, Hadoop Summit • White Papers Spring for Apache Hadoop • http://blog.springsource.org/2012/02/29/introducing-spring-hadoop12
  • 13. Q&A Thank you!13

×