Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Branch-and-bound nearest neighbor searching over             unbalanced trie-structured overlays                          ...
P2P Evolution20022001   2001               2001               2001   DHTs2000                                         2000...
Distributed Hash Table (DHT)                               3
DHT Frameworks Evolution                • Rectangular queries support                • Peers only on leaves2003:   PGrid  ...
Nearest neighbor search                          5
Given a distributed data set how can we find the k most similar data to a query?     “k-Nearest Neighbor Search”          ...
Applications                     Distributed      GIS                     Databases  Statistical      Recommendation Class...
Related Work1. Naïve algorithm: Central peer collects data and   performs k-NN searching2. K-nn search algorithm over CAN3...
ContentsGRaSP              k-NN Evaluation                     Conclusions                                   9
GRaSP        10
GRaSP                      Building the trie ...Hierarchical space partition:        1       Peer p joins            2    ...
GRaSPSpace Partition              Volume-balancedBefore                  Data-balanced                                  13...
GRaSPSpace Partition for a 3-sided query                                      14
GRaSPSpace Partition for a 3-sided query                                      15
GRaSPSpace Partition for a 3-sided query                                      16
GRaSP                  Data InsertionWe insert a key k into all peers who own regions                 that contain k      ...
GRaSP                           Routing Tables    Each peer knows a peer ineach complementary subtrie ...    0100 = 1    0...
GRaSP                             Routing  “In order to route a message from peer p to peer q, the message isforwarded fro...
ContentsGRaSP              k-NN Evaluation                     Conclusions                                   20
Searching Algorithm           Branch-and-bound algorithmPriority queue PQ of candidate peers holding answer  better than t...
Searching Algorithm          Parallel Searching vs Iterative Searching      Parallel Searching requires huge message state...
Searching Algorithm                      23
Searching AlgorithmBranch-and-bound algorithm             1?      d(q,s(1)) < d(q,a)             00?     d(q,s(1)) > d(q,a...
Latency Complexity Theorem                 Latency = |T|O(logn)Support Set T:                                        25
Latency Complexity Theorem                                    ProofPeers visited:Peers in T:                              ...
ContentsGRaSP              k-NN Evaluation                     Conclusions                                   27
Performance EvaluationTaking into account number of dimensionsLow          Medium               High                      ...
Performance Evaluation                   Metrics•   Data Fairness Index•   Latency•   Max Throughput•   Fringe Size (mean,...
Low dimensionsLow   Medium       High                          30
Low dimensions                  WorkloadsDatasets• Greece, data-balanced partition,  k=1/10/100• Greece, volume-balanced p...
Low dimensionsWhich space partition is the best?     Volume-          Data-     balanced        balanced                  ...
Low dimensions                            Data FI                                                              vs         ...
Low dimensions                               Latency                                                              vs      ...
Low dimensions                              Fringe Size                                                              vs   ...
Low dimensions                           Max Throughput                                                             vs    ...
Low dimensionsWhich space partition is the best? Volume-               Data- balanced             balanced                ...
Low dimensions      k?                 38
Low dimensions                                                Fringe Size                How is the size of the fringe    ...
Low dimensions                Latency                   How is the latency affected?      vs                              ...
Low dimensions                Max Throughput           How is the Max. Throughput affected?         vs                    ...
Low dimensions … efficient routing!                                  42
Medium dimensionsLow    Medium        High                            43
Medium dimensions                 WorkloadsDatasets• Uniform, volume-balanced partition, k=1• ColorMoments, data-balanced ...
Medium dimensionsHow is the size of the fringe         affected?                                45
Medium dimensionsHow is the size of the fringe         affected?                                       46    ColorMoments,...
Medium dimensions         How is the size of the fringe                  affected?             Uniform, volume-balanced, k...
Medium dimensions   Data Fairness Index                         48
Medium dimensions   Data Fairness Index                                     49  ColorMoments, data-balanced, k=1
Medium dimensions   Data Fairness Index                                  50  Uniform, volume-balanced, k=1
Medium dimensions      Latency                    51
Medium dimensions          Latency                                     52  ColorMoments, data-balanced, k=1
Medium dimensions           Latency                                  53  Uniform, volume-balanced, k=1
Medium dimensions                 LatencyLatency is high but near to the optimum!                                      54
Medium dimensions   Max. Throughput                     55
Medium dimensions    Max. Throughput                                     56  ColorMoments, data-balanced, k=1
Medium dimensions     Max. Throughput                                  57  Uniform, volume-balanced, k=1
Medium dimensions … not efficient    routing but near optimum!Its still good enough for practical             applications...
High dimensionsLow    Medium       High                           59
High dimensions                 Curse of dimensionality          “When the dimensionality increases,                 the v...
ContentsGRaSP              k-NN Evaluation                     Conclusions                                   61
Conclusions                APISearching                    Data                 Trie (k-NN)                    Ins/Rem Que...
Future Work Approximate k-NN searching for high    dimensionsRedundancy                      63
THANK YOU QUESTIONS ?               64
Upcoming SlideShare
Loading in …5
×

Branch and-bound nearest neighbor searching over unbalanced trie-structured overlays

889 views

Published on

Master presentation of Mike Argyriou in Technological University of Crete about
Branch and-bound nearest neighbor searching over unbalanced trie-structured overlays.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Branch and-bound nearest neighbor searching over unbalanced trie-structured overlays

  1. 1. Branch-and-bound nearest neighbor searching over unbalanced trie-structured overlays Master’s Thesis Presentation Technical University of Crete 4.2.2013Author: Michail ArgyriouSupervisor: Ass’t Prof. Vasilis Samoladas
  2. 2. P2P Evolution20022001 2001 2001 2001 DHTs2000 20001999 19991998 Centralized Semi-distributed Fully-distributed 2
  3. 3. Distributed Hash Table (DHT) 3
  4. 4. DHT Frameworks Evolution • Rectangular queries support • Peers only on leaves2003: PGrid • High-dimensional queries support with space filling curves • Height-balanced search tree limitation2006: VBI • No height-balanced search tree limitation • Abstract types of data and queries • Data: point, rectangular2008: GRaSP • Queries: point, 3-sided, n-d rectangular 4
  5. 5. Nearest neighbor search 5
  6. 6. Given a distributed data set how can we find the k most similar data to a query? “k-Nearest Neighbor Search” 6
  7. 7. Applications Distributed GIS Databases Statistical Recommendation Classification SystemsCluster analysis Similarity Scores 7
  8. 8. Related Work1. Naïve algorithm: Central peer collects data and performs k-NN searching2. K-nn search algorithm over CAN3. Distributed quad-based index  each quadtree block is uniquely identified by its centroid  mapped to Chord  k-NN search algorithm 8
  9. 9. ContentsGRaSP k-NN Evaluation Conclusions 9
  10. 10. GRaSP 10
  11. 11. GRaSP Building the trie ...Hierarchical space partition: 1 Peer p joins 2 Finds a bootstrapping peer q Space region s(q) splits into s(q0) and 3 s(q1) 11
  12. 12. GRaSPSpace Partition Volume-balancedBefore Data-balanced 13Before
  13. 13. GRaSPSpace Partition for a 3-sided query 14
  14. 14. GRaSPSpace Partition for a 3-sided query 15
  15. 15. GRaSPSpace Partition for a 3-sided query 16
  16. 16. GRaSP Data InsertionWe insert a key k into all peers who own regions that contain k 17
  17. 17. GRaSP Routing Tables Each peer knows a peer ineach complementary subtrie ... 0100 = 1 0100 = 00 0100 = 011 0100 = 0101 18
  18. 18. GRaSP Routing “In order to route a message from peer p to peer q, the message isforwarded from p to a neighbor peer included in a known subtrie closer to peer q. From r it is recursively forwarded to q.” 19
  19. 19. ContentsGRaSP k-NN Evaluation Conclusions 20
  20. 20. Searching Algorithm Branch-and-bound algorithmPriority queue PQ of candidate peers holding answer better than the k-th answer found so far  Fringe 1. Branch Step: expand PQ 2. Bound Step: prune PQ 21
  21. 21. Searching Algorithm Parallel Searching vs Iterative Searching Parallel Searching requires huge message state!Iterative Searching prunes larger regions of the data space! 22
  22. 22. Searching Algorithm 23
  23. 23. Searching AlgorithmBranch-and-bound algorithm 1? d(q,s(1)) < d(q,a) 00? d(q,s(1)) > d(q,a) 011? d(q,s(1)) > d(q,a) 0101? d(q,s(1)) < d(q,a) 24
  24. 24. Latency Complexity Theorem Latency = |T|O(logn)Support Set T: 25
  25. 25. Latency Complexity Theorem ProofPeers visited:Peers in T: |T| peers Find peer in the complementary subtrie: O(logn) 26
  26. 26. ContentsGRaSP k-NN Evaluation Conclusions 27
  27. 27. Performance EvaluationTaking into account number of dimensionsLow Medium High 28
  28. 28. Performance Evaluation Metrics• Data Fairness Index• Latency• Max Throughput• Fringe Size (mean, max) 29
  29. 29. Low dimensionsLow Medium High 30
  30. 30. Low dimensions WorkloadsDatasets• Greece, data-balanced partition, k=1/10/100• Greece, volume-balanced partition, k=1Querysets• Synthetic queries• For a network size of n peers we asked n/3 queries 31
  31. 31. Low dimensionsWhich space partition is the best? Volume- Data- balanced balanced 32
  32. 32. Low dimensions Data FI vs Space Partition Which space partition is the best?Greece ... 33 Data-balanced partition Volume-balanced partition
  33. 33. Low dimensions Latency vs Space Partition Which space partition is the best?Greece, k=1 ... 34 Data-balanced partition Volume-balanced partition
  34. 34. Low dimensions Fringe Size vs Space Partition Which space partition is the best?Greece, k=1 ... 35 Data-balanced partition Volume-balanced partition
  35. 35. Low dimensions Max Throughput vs Space Partition Which space partition is the best?Greece, k=1 ... 36 Data-balanced partition Volume-balanced partition
  36. 36. Low dimensionsWhich space partition is the best? Volume- Data- balanced balanced 37
  37. 37. Low dimensions k? 38
  38. 38. Low dimensions Fringe Size How is the size of the fringe vs affected? kGreece, data-balanced partition ... 39 k=1 k=10 k=100
  39. 39. Low dimensions Latency How is the latency affected? vs kGreece, data-balanced partition ... 40 k=1 k=10 k=100
  40. 40. Low dimensions Max Throughput How is the Max. Throughput affected? vs kGreece, data-balanced partition ... 41 k=1 k=10 k=100
  41. 41. Low dimensions … efficient routing! 42
  42. 42. Medium dimensionsLow Medium High 43
  43. 43. Medium dimensions WorkloadsDatasets• Uniform, volume-balanced partition, k=1• ColorMoments, data-balanced partition, k=1Querysets• Synthetic queries• For a network size of n peers we asked n/3 queries 44
  44. 44. Medium dimensionsHow is the size of the fringe affected? 45
  45. 45. Medium dimensionsHow is the size of the fringe affected? 46 ColorMoments, data-balanced, k=1
  46. 46. Medium dimensions How is the size of the fringe affected? Uniform, volume-balanced, k=1Mean Fringe Size Max. Fringe Size 47
  47. 47. Medium dimensions Data Fairness Index 48
  48. 48. Medium dimensions Data Fairness Index 49 ColorMoments, data-balanced, k=1
  49. 49. Medium dimensions Data Fairness Index 50 Uniform, volume-balanced, k=1
  50. 50. Medium dimensions Latency 51
  51. 51. Medium dimensions Latency 52 ColorMoments, data-balanced, k=1
  52. 52. Medium dimensions Latency 53 Uniform, volume-balanced, k=1
  53. 53. Medium dimensions LatencyLatency is high but near to the optimum! 54
  54. 54. Medium dimensions Max. Throughput 55
  55. 55. Medium dimensions Max. Throughput 56 ColorMoments, data-balanced, k=1
  56. 56. Medium dimensions Max. Throughput 57 Uniform, volume-balanced, k=1
  57. 57. Medium dimensions … not efficient routing but near optimum!Its still good enough for practical applications! 58
  58. 58. High dimensionsLow Medium High 59
  59. 59. High dimensions Curse of dimensionality “When the dimensionality increases, the volume of the spaceincreases so fast that the available data becomes sparse.” 60
  60. 60. ContentsGRaSP k-NN Evaluation Conclusions 61
  61. 61. Conclusions APISearching Data Trie (k-NN) Ins/Rem Query Space Data Types Types Partition Metric Space 62
  62. 62. Future Work Approximate k-NN searching for high dimensionsRedundancy 63
  63. 63. THANK YOU QUESTIONS ? 64

×