Poster for NSDI


Published on

My Poster for the poster session at NSDI 2006

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Poster for NSDI

  1. 1. PlanetenWachHundNetz: Locality-Aware MapReduce for Peer-to-Peer Robust Systems Group Vitus Lorenz-Meyer, Eric Freudenthal University of Texas at El Paso Other Approaches From distributed databases Key-based Tree Routing based on node keys • Children of node n with key k are nodes with keys closest to k with n-1 prefix bits in common with k • Node n is its own child! Aggregation tree Stochastically balanced Motivation Prefix tree • Efficient reduction of data from self- organized (P2P) sources Problem • Dynamic membership • Static tree (for parallel prefix) Non-existent node 001 (A) inappropriate •Not locality aware! •Only 1 global tree. Problems w/ Known Techniques Finger-table based tree • Do not localize network traffic 000 takes the role • Tree edges selected from nodes • Branching factor not controlled of its parent 001 finger tables (towards query root). • State lost when parent nodes fail/leave Paths to 111 Non-optimized • Details on right side of figure fan-out! Message arrives Our Goals at closest node 000 (B) • Exploit locality, minimize remote communication Resulting Aggre- gation tree • Efficient parallelism: Control of Message ends up at self: branching factor No more nodes -> discard • Minimal state-loss when nodes enter/leave • Self-healing continuous queries Implementation •Unique tree for every query •Locality awareness from DHT • Left-tree •but does not force locality • Node role determined by key •Challenge: what to do if nodes • Locality/clustering: Coral or Oasis enter/leave Goal: Efficient Aggregation Tree Our Solution: Key-based MapReduce (KMR) Overview Finding nodes Locality-aware Clustering MapReduce (Google) • Associate aggregation tree with a operation-specific root key. n = key of some node Characteristics t = root-id of aggregation tree Generalized Parallel Prefix Coral (locality-aware DsHT) π = n’s position among leaves (from left) • Rooted at host whose key is closest to root key • Associative & communitive • π=n⊕t Coral partitions its members in • One tree per query, distributes load over all nodes for multiple operations mapped to an key of n’s parent at height h: π / 2h ⊕ t clusters based on connection queries at a time aggregation tree latency • Nodes that are roots of sub-trees are their own left-children all the To find parent, n searches for smallest height such way down to their appropriate place among leaves. that parent is in the tree. • Google‘s map-reduce framework Aggregation tree at each • Each node is leaf and root of the sub-tree at depth d if it has the d’th  Data is mapped to low-dimensional Collecting data bit of root-ID flipped; node may also substitute for missing parents. cluster tuples • Set of nodes and MR-specific root-key fully defines a unique tree (complete • Each node reduces messages from children, sends result(s) to parent •  Tuples are recursively combined Build cluster aggregation trees knowledge not required: lookup parent, sibling, children) • using an associative reduction A single member of a cluster-tree Challenges Approach algorithm that emits a (summary) serves as aggregation tuple representative • Find children DHT lookup • Find parent DHT lookup Example: Challenges • Establishing new tree at exit/departure DHT lookup (key-space determines topology) • Counting occurrence of a particular • Choosing sub-cluster word in a document: Building the Tree representatives  Provide map algorithm that • Constructing aggregation tree for emits the tuple ‘(1)’ for every Send build-message to all siblings down the tree sub-cluster Continuous Queries occurrence Each node forwards build-message to all its siblings all the way down • Avoiding inefficient setup broadcast  Provide a reduce algorithm • Recall that node is its own child: siblings include its children! Maintaining tree consistency under churn that sums these tuples Non-existing parent • Each node periodically checks roles of self and immediate relatives • Closest node fulfills its role • Churn (membership change) must result in tree-node role change • Therefore, search for parent will discover this node • Change propagated by DHT finger tables Non-existing child • If child not in DHT, then node is a leaf Challenges Approach Finding children • Use DHT • What roles to check when periodically check (lookup) self, children, parent only when its assuming role parent → child, old → new • Who notifies who