Vitus Masters Defense

846 views
761 views

Published on

 Final defense of my masters thesis at UTEP, 2006

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
846
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Kurz erklären wies zum Nahmen kam, dass Eric den schon 5x geändert hat. Was heißt instrumentation?
  • Vitus Masters Defense

    1. 1. PlanetenWachHundNetz Instrumentation Infrastructure for PlanetLab Vitus Lorenz-Meyer
    2. 2. Peer-to-Peer <ul><li>Distributed on open internet </li></ul><ul><li>All participants both receive & provide services to/from others </li></ul><ul><li>Not centrally administered </li></ul><ul><li>Membership changes over time (churn) </li></ul><ul><li>Example: file sharing (napster, gnutella,…) </li></ul><ul><ul><li>Any node can publish a named file </li></ul></ul><ul><ul><li>Any node can obtain file from another node who has it. </li></ul></ul><ul><ul><li>Range of strategies to find nodes containing desired content </li></ul></ul>Vitus Lorenz-Meyer: Thesis defense University of Texas @ El Paso Problem
    3. 3. The Problem <ul><li>P2P systems hard to tune, requires understanding of complex behavior </li></ul><ul><ul><li>Requires instrumentation & analysis </li></ul></ul><ul><li>Many P2P systems constructed without scalable instrumentation infrastructure </li></ul><ul><ul><li>Frequently done in ad-hoc manner </li></ul></ul><ul><ul><ul><li>Data transmitted to single collection & analysis node </li></ul></ul></ul><ul><ul><ul><li>Inadequate for understanding behavior of large systems of many (hundreds to MILLIONS) of nodes </li></ul></ul></ul><ul><li>My work: development of a flexible tool to enable scalable instrumentation </li></ul><ul><ul><li>algorithms, data structures </li></ul></ul>Rel. Work Vitus Lorenz-Meyer: Thesis defense University of Texas @ El Paso P2P
    4. 4. Related work (1 of 2: cousins) <ul><li>Distributed Database Mngmt. Systems </li></ul><ul><ul><li>Select data at sources </li></ul></ul><ul><ul><li>Optimize joins (run near to sources…) </li></ul></ul><ul><ul><li>Commercially used in non-p2p configurations </li></ul></ul><ul><ul><li>P2P (research): PIER, Sophia </li></ul></ul><ul><li>Sensor Networks </li></ul><ul><ul><li>Unmanaged radio-connected nodes </li></ul></ul><ul><ul><ul><li>provide “network” of surveilance </li></ul></ul></ul><ul><ul><li>SQL; Compiled into a 3-step process </li></ul></ul><ul><ul><li>Software communicates through same mechanism </li></ul></ul><ul><ul><li>IrisNet, TAG </li></ul></ul>Vitus Lorenz-Meyer: Thesis defense University of Texas @ El Paso
    5. 5. Related work (2of2: Siblings) <ul><li>Aggregation Overlays </li></ul><ul><ul><li>Information collection subsystem </li></ul></ul><ul><ul><li>Nodes provide information tuples </li></ul></ul><ul><ul><ul><li>Internal aggregation language </li></ul></ul></ul><ul><ul><ul><li>Computed using parallel prefix of pre-defined assoc/comm ops </li></ul></ul></ul><ul><ul><li>Astrolabe, SDIMS, SOMO </li></ul></ul><ul><li>Google’s MapReduce </li></ul><ul><ul><li>Data selection & aggregation in distributed system </li></ul></ul><ul><ul><ul><li>User provides “map” and “reduce” program </li></ul></ul></ul><ul><ul><li>Not fully p2p (resource mgmt. overlay) </li></ul></ul>
    6. 6. High-level Approach <ul><li>User specifiable programs like MapReduce </li></ul><ul><li>Split data collection into 3 ‘phases’ </li></ul><ul><ul><li>Generate values on all nodes </li></ul></ul><ul><ul><li>Pairwise aggregation throughout system </li></ul></ul><ul><ul><li>Evaluate results </li></ul></ul><ul><li>emit measured vals (val,num=1) </li></ul><ul><li>Aggregate: (val1+val2,num1+num2) </li></ul><ul><li>Evaluate (avg) (val / num) </li></ul><ul><li>Easy to use: user provides 3 programs (scripts) </li></ul>Impl. Rel. Work
    7. 7. Illustration of Binary Aggregation Impl. Rel. Work
    8. 8. Why is this hard in P2P? <ul><li>Problem: membership churn </li></ul><ul><ul><li>Nodes continuously enter & leave system </li></ul></ul><ul><li>Nobody in charge (p2p) </li></ul><ul><ul><li>Nobody knows membership list! </li></ul></ul><ul><li>Exposes following challenges </li></ul><ul><li>Finding all participating nodes </li></ul><ul><li>Constructing an (appx) balanced tree </li></ul>Impl. Rel. Work
    9. 9. Building Structure Upon Anarchy: Key Based Routing Vitus Lorenz-Meyer: Thesis defense University of Texas @ El Paso Goal Appr. 0  2 160 2 158 2 159 2 159 + 2 158
    10. 10. “Chord” Routing Vitus Lorenz-Meyer: Thesis defense University of Texas @ El Paso Goal Appr.
    11. 11. Chord lookup Vitus Lorenz-Meyer: Thesis defense University of Texas @ El Paso Goal Appr.
    12. 12. Building a tree upon KBR Vitus Lorenz-Meyer: Thesis defense University of Texas @ El Paso Goal Appr. a a a b b d e a b d e g i f h d e a b f g h i
    13. 13. Building a tree: FTT & KBT <ul><li>FTT: finger-based tree </li></ul><ul><li>Operation associated with a “target” node </li></ul><ul><li>Systems send data to finger closest to target </li></ul><ul><li>Ambiguous </li></ul><ul><ul><ul><li>Depends on all nodes’ fingertables </li></ul></ul></ul><ul><li>Tree useful only for aggregation </li></ul>Vitus Lorenz-Meyer: Thesis defense University of Texas @ El Paso Goal Appr. <ul><li>KBT: Maps tree on key-space </li></ul><ul><ul><li>Operation associated w/ target node </li></ul></ul><ul><ul><li>System/tree-node mapping: </li></ul></ul><ul><ul><ul><li>Node assigned to node w/ nearest key </li></ul></ul></ul><ul><ul><li>Non-ambiguous </li></ul></ul><ul><ul><li>Tree useful for both dissemination & aggregation </li></ul></ul><ul><ul><li>Single, global tree </li></ul></ul>
    14. 14. Our Structure 101… 111… 101… 111… 101… 110… 100… 001… 011… 001… 011… 001… 010… 000… 101… 011… 001… 110… As 001..! Goal Appr. <ul><li>KMR: Subset of KBT, rooted at specific node </li></ul><ul><li>One tree / root </li></ul><ul><ul><li>Better load-balancing </li></ul></ul><ul><li>Tree fully determined by set of active nodes and root </li></ul>
    15. 15. Implementation details <ul><li>PWHN-Server layered on FreePastry </li></ul><ul><li>PWHN-Client connects to PWHN-Server and makes query </li></ul><ul><li>Callee builds tree making itself root </li></ul>Vitus Lorenz-Meyer: Thesis defense University of Texas @ El Paso Goal S S S S S S S S C Appr.
    16. 16. Our Goal <ul><li>Develop toolkit for data collection/aggregation in P2P networks </li></ul><ul><ul><li>Useful for PlanetLab-community </li></ul></ul><ul><li>Extend MR’s model to P2P </li></ul><ul><ul><li>K.I.S.S. </li></ul></ul><ul><ul><ul><li>Users provide programs for gen/agg/eval </li></ul></ul></ul><ul><li>Use techniques from P2P </li></ul><ul><ul><li>Construct aggregation tree upon key-based-routing </li></ul></ul>Vitus Lorenz-Meyer: Thesis defense University of Texas @ El Paso Example Impl. details
    17. 17. Example (1) <ul><li>First implementation: </li></ul><ul><ul><li>Script version, flat, to test approach </li></ul></ul><ul><li>Example 1: Overall average system load </li></ul><ul><ul><ul><li>Gen emits (1,<1load>,<5load>,<15load>) for each server </li></ul></ul></ul><ul><ul><ul><li>Agg adds all numbers </li></ul></ul></ul><ul><ul><ul><li>Eval divides last 3 numbers by first to get average </li></ul></ul></ul>Vitus Lorenz-Meyer: Thesis defense University of Texas @ El Paso Evaluation Goal
    18. 18. Example (2) <ul><li>PWHN client (Java) </li></ul><ul><ul><li>Can start and stop server </li></ul></ul><ul><ul><li>Used for specifying all programs and parameters (Servers, username for flat, method) </li></ul></ul><ul><ul><li>Front-end for connecting to servers and making query </li></ul></ul><ul><ul><li>Allows saving and graphically representing result </li></ul></ul>Vitus Lorenz-Meyer: Thesis defense University of Texas @ El Paso Evaluation Goal
    19. 19. Example (3) <ul><li>Graphing of queried results </li></ul>Vitus Lorenz-Meyer: Thesis defense University of Texas @ El Paso Evaluation Goal Bar Chart Color bubbles on world map
    20. 20. Example (4) <ul><li>Graphing of tree </li></ul>Vitus Lorenz-Meyer: Thesis defense University of Texas @ El Paso Evaluation Goal Graphing of paths of the query
    21. 21. Evaluation <ul><li>Minimize disruption </li></ul><ul><ul><li>Minimize incoming bytes to client </li></ul></ul>Vitus Lorenz-Meyer: Thesis defense University of Texas @ El Paso Synopsis Examples <ul><li>More efficient </li></ul><ul><ul><li>Lower average fan-in of aggregation tree </li></ul></ul>
    22. 22. Evaluation: Fern <ul><li>Global Update latency histogram </li></ul>Vitus Lorenz-Meyer: Thesis defense University of Texas @ El Paso Synopsis Examples 10 clients 701 clients
    23. 23. Summary <ul><li>PWHN - Instrumentation toolkit </li></ul><ul><ul><li>Extends MR’s model to P2P </li></ul></ul><ul><ul><li>Uses P2P techniques (DHTs) </li></ul></ul><ul><ul><li>Combines FTT and KBT to be more efficient </li></ul></ul><ul><ul><li>Conclusion: Useful tool that is more efficient than to build infrastructure into software </li></ul></ul><ul><ul><li>What did I do? </li></ul></ul><ul><ul><ul><li>Survey of systems that provide aggregation in dynamic networks </li></ul></ul></ul><ul><ul><ul><li>Classification and naming of aggregation trees upon DHTs </li></ul></ul></ul><ul><ul><ul><li>Design and implementation of my own tool (KMR/PWHN) </li></ul></ul></ul>Vitus Lorenz-Meyer: Thesis defense University of Texas @ El Paso Examples
    24. 24. Questions Vitus Lorenz-Meyer: Thesis defense University of Texas @ El Paso Synopsis
    25. 25. Related: MR <ul><li>Google’s MapReduce was designed for static networks </li></ul><ul><ul><li>Allows arbitrary programs for aggregation </li></ul></ul><ul><li>We observe that MR’s approach is practical, but was not designed for P2P </li></ul><ul><li>Example: Count words in website for index </li></ul><ul><ul><li>“ Map” for each word: emit (<word>,1) </li></ul></ul><ul><ul><li>“ Reduce” for [(<word>,1)…]: add 1’s and emit (<word>,<count>) </li></ul></ul>Vitus Lorenz-Meyer: Thesis defense University of Texas @ El Paso Goal Ex: Coral
    26. 26. Example: Coral <ul><li>Load balancing P2P-CDN implemented as a HTTP web-proxy </li></ul><ul><li>Content democratizing for small-scale servers that can’t afford akamai </li></ul><ul><ul><li>“ shlash-dot effect” </li></ul></ul><ul><li>Design did not include monitoring </li></ul><ul><ul><li>was later retro-fitted onto Coral </li></ul></ul><ul><li>Killer-App. on PlanetLab, but centralized approach did not scale </li></ul>Vitus Lorenz-Meyer: Thesis defense University of Texas @ El Paso P2P Problem
    27. 27. Implementation: KMR <ul><li>Key-based MapReduce </li></ul><ul><ul><li>Physical root node </li></ul></ul><ul><ul><li>Bit of parent is negated for each level </li></ul></ul><ul><ul><li>‘ Left-tree’ </li></ul></ul>Vitus Lorenz-Meyer: Thesis defense University of Texas @ El Paso Examples Approach
    28. 28. Impl.: KMR usage <ul><li>“ Down”: internal nodes send one message to sibling </li></ul><ul><li>“ Up”: Only one message to parent </li></ul><ul><li>Non-existent nodes </li></ul><ul><ul><li>Messages end up at closest nodes </li></ul></ul><ul><ul><ul><li>Knows to overtake role of parent </li></ul></ul></ul>Vitus Lorenz-Meyer: Thesis defense University of Texas @ El Paso Examples Approach
    29. 29. Wakeup Comic
    30. 30. Problem Detected (last night) <ul><li>FreePastry doesn’t have expected semantics </li></ul><ul><ul><li>Finds node with numerically closest key </li></ul></ul><ul><ul><ul><li>Rather than most clockwise node less than key </li></ul></ul></ul><ul><ul><li>Range-based algorithm inappropriate </li></ul></ul><ul><li>FreePastry finger tables contain nodes with differing length common prefixes </li></ul><ul><ul><li>Useful for finding nodes with longer common prefix with requested destination than destination node </li></ul></ul><ul><ul><li>Permits use of alternate (preferred) algorithm </li></ul></ul>

    ×