Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

On the Internet Delay Space Dimensionality


Published on

We investigate the dimensionality properties of the Internet delay space, i.e., the matrix of measured round-trip latencies between Internet hosts. Previous work on network coordinates has indicated that this matrix can be embedded,
with reasonably low distortion, into a 4- to 9-dimensional Euclidean space. The application of Principal Component Analysis (PCA) reveals the same dimensionality values. Our work addresses the question: to what extent is the dimensionality an intrinsic property of the delay space, defined without reference to a host metric such as Euclidean space? Is the intrinsic dimensionality of the Internet delay space
approximately equal to the dimension determined using embedding techniques or PCA? If not, what explains the discrepancy? What properties of the network contribute to
its overall dimensionality? Using datasets obtained via the King [14] method, we study different measures of dimensionality to establish the following conclusions. First, based on its power-law behavior, the structure of the delay space can be better characterized by fractal measures. Second, the intrinsic dimension is significantly smaller than the value
predicted by the previous studies; in fact by our measures it is less than 2. Third, we demonstrate a particular way in which the AS topology is reflected in the delay space; subnetworks composed of hosts which share an upstream Tier-1 autonomous system in common possess lower dimensionality than the combined delay space. Finally, we observe that
fractal measures, due to their sensitivity to non-linear structures, display higher precision for measuring the influence of subtle features of the delay space geometry.

Published in: Education, Technology, Business
  • Be the first to comment

  • Be the first to like this

On the Internet Delay Space Dimensionality

  1. 1. InetDim: Characterizing the Internet Delay Space Dimensionality Bruno Abrahao Robert Kleinberg Cornell University
  2. 2. <ul><li>Develop a geometric framework for exploring properties of networks </li></ul><ul><li>Understanding allows us to make predictions on the consequences of the growth and evolution of the network and guide the design of distributed systems. </li></ul>InetDim Project
  3. 3. <ul><li>Concerned with properties of the Internet that contribute to the overall dimensionality of its latency space. </li></ul><ul><li>B. Abrahao and R. Kleinberg, On the Internet Delay Space Dimensionality , In Proc. of ACM/SIGCOMM Internet Measurement Conference (IMC 2008) </li></ul>Current Investigation
  4. 4. <ul><li>Internet Delay Space </li></ul><ul><ul><li>matrix of all-pairs round trip times between Internet hosts </li></ul></ul><ul><li>Dimensionality </li></ul><ul><ul><li>We ’ll consider several definitions in this talk </li></ul></ul><ul><ul><li>Value which abstracts notion of network complexity </li></ul></ul>Definitions
  5. 5. <ul><li>Why to study the Internet delay space dimensionality? </li></ul>Question
  6. 6. <ul><li>Network embedding (GNP, Vivaldi) </li></ul><ul><ul><li>Models the network as contained in a vector space </li></ul></ul><ul><ul><li>Estimate real distances with geometric distances </li></ul></ul><ul><li>Elegant, compact, relative success </li></ul><ul><li>BUT they suffer from </li></ul><ul><ul><li>inherent embedding distortion </li></ul></ul><ul><ul><li>disappointing accuracy [Lua et al ., IMC ’05][Ledlie-Gardner-Seltzer, NSDI’07] </li></ul></ul>Coordinate-based positioning systems
  7. 7. <ul><li>Meridian [Wong et al. SIGCOMM ’04] </li></ul><ul><li>Iplane [Madhyastha et al. OSDI ’06] </li></ul><ul><li>Relatively more accurate </li></ul><ul><li>BUT </li></ul><ul><ul><li>Measurement intensive </li></ul></ul><ul><ul><li>Strong scaling assumptions </li></ul></ul>Measurement based positioning systems
  8. 8. <ul><li>Dimensionality of target space is a tunable parameter </li></ul><ul><ul><li>Critical parameter influencing accuracy, convergence, stability [Dabek et al ., SIGCOMM ’04] [Ledlie-Gardner-Seltzer, NSDI’07] </li></ul></ul><ul><li>Question : What is the optimal value? </li></ul><ul><ul><li>too low: high distortion </li></ul></ul><ul><ul><li>too high: inefficiency </li></ul></ul>Motivation 1: Coordinate-based positioning systems
  9. 9. <ul><ul><li>Prior work : delay space can be embedded into an 5- to 9-dimensional Euclidean space with “reasonably” low distortion [Ng-Zhang’01; Tang-Crovella’03] </li></ul></ul><ul><ul><li>Is this value optimal? </li></ul></ul><ul><li>Invariant with scaling? </li></ul><ul><ul><li>For worst-case metric space, dimensionality increases logarithmic with the cardinality of the metric [Bourgain 75] </li></ul></ul>Motivation 1: Coordinate-based positioning systems
  10. 10. <ul><li>Scaling assumptions underlying Meridian </li></ul><ul><ul><li>Latency space is a metric of bounded doubling dimension D </li></ul></ul>Motivation 2: Measurement-based positining systems
  11. 11. <ul><li>Question 1 : What properties of the Internet contribute to its dimensionality? And to what extent? </li></ul><ul><li>Question 2 : Is the Internet delay space dimensionality homogeneous or it is made up of many lower-dimensional pieces? </li></ul><ul><ul><li>Hierarchical embedding [Zhang ‘06] </li></ul></ul>Motivation 3: Internet properties
  12. 12. <ul><li>How to generate synthetic realistic delay spaces? </li></ul><ul><li>Previous work [Zhang, IMC ’06] </li></ul><ul><ul><li>Statistical properties preserved </li></ul></ul><ul><li>Generative models? </li></ul>Motivation 4: Synthetic delay space generation
  13. 13. <ul><li>Part I : Studies different methods for characterizing the Internet delay space geometry </li></ul><ul><li>Part II : Demonstrates how to use these tools for predicting dimensionality shifts cause by structural properties </li></ul>Outline
  14. 14. <ul><li>Uses raw measurements collected via the King method by Meridian (5200 hosts) and P2PSim (1953 hosts) </li></ul><ul><ul><li>Filter out pairs with < 10 measurements total in both directions </li></ul></ul><ul><ul><li>Latency = median of measurements in both directions </li></ul></ul><ul><ul><li>Eliminate ambiguity which arises from missing values </li></ul></ul><ul><ul><li>Approximate the largest clique with the remaining pairs </li></ul></ul><ul><ul><li>After filtering </li></ul></ul><ul><ul><ul><li>Meridian : 2385 hosts </li></ul></ul></ul><ul><ul><ul><li>P2PSim : 298 hosts </li></ul></ul></ul>Datasets 1/2
  15. 15. <ul><li>Major limitation is availability of datasets </li></ul><ul><li>Other publicly available datasets: Dimes, Planetlab, DS2, King (Harvard) , … </li></ul><ul><ul><li>Small cliques, collected over long periods of time </li></ul></ul><ul><li>Combined with geolocation obtained by querying </li></ul>Datasets 2/2
  16. 16. Meridian geolocation visualization
  17. 17. <ul><li>Question </li></ul><ul><li>How to estimate the Internet Delay Space dimensionality? </li></ul><ul><li>What methods were applied in prior work? What are their assumptions? </li></ul><ul><ul><li>data can be approximately embedded into a low-dimensional Euclidean space </li></ul></ul><ul><ul><li>the distance matrix can be accurately approximated by a low-rank matrix </li></ul></ul>Part I: Dimensionality measures
  18. 18. Embedding Dimension <ul><li>Benefit : recovers coordinates of points </li></ul><ul><li>Suggests a dimensionality value between 4 and 7 </li></ul><ul><li>The dimensionality can be estimated using an embedding algorithm, such as Vivaldi </li></ul>Meridian
  19. 19. Embedding Dimension Shortcomings (1/2) <ul><li>Embeddings using 8D and beyond are worse than 7D! </li></ul><ul><li>Curse of dimensionality : embedding algorithm is overwhelmed with so many degrees of freedom </li></ul>Meridian
  20. 20. <ul><li>Existing network embedding algorithms are suboptimal and slow to converge! </li></ul><ul><li>Finding an embedding that minimizes distortion is intractable [Matoušek-Sidiropoulos ‘08] </li></ul><ul><li>Fails if the measured distances reflect a metric other than Euclidean distance </li></ul><ul><li>Algorithm often produces lots of empty space , unnecessarily inflating the estimate of dimensionality </li></ul>Embedding Dimension Shortcomings (2/2)
  21. 21. <ul><li>Approximates the distance matrix by a low-rank matrix </li></ul><ul><li>Rotates the axes so that data variance is better captured by the components </li></ul>Principal Component Analysis Meridian
  22. 22. <ul><li>Assumes that the dimensions are independent </li></ul><ul><ul><li>e.g. surface of a sphere is reported as a 3D object </li></ul></ul><ul><li>The dimensionality cut-off is non-obvious to determine </li></ul>Principal Component Analysis ? ? ? Meridian
  23. 23. <ul><li>Question : How to avoid assuming a specific host metric space or assuming linearity? </li></ul><ul><li>If a dataset exhibits power-law behavior in its statistical or structural properties, one can model and measure its dimensionality using Fractal Measures </li></ul>Intrinsic notion of dimensionality
  24. 24. <ul><li>Power-law behavior over two decimal orders of magnitude </li></ul>Intrinsic dimensionality metrics <ul><ul><li>Correlation Fractal Dimension (pair-count plot) </li></ul></ul><ul><ul><li>Fractal measures indicate dimensionality values less than 2 </li></ul></ul>Meridian P2PSim Includes almost all intra-continental distances (usec)
  25. 25. Correlation Dimension <ul><li># samples within distance r is proportional to r </li></ul>x r
  26. 26. Correlation Dimension <ul><li># samples within distance r is proportional to </li></ul>x r
  27. 27. Sampling from a unit square
  28. 28. <ul><li>There is an infinite family of fractal dimensions, parameterized by q </li></ul><ul><li>Some practical dimensions are </li></ul><ul><ul><li>Hausdorff Dimension </li></ul></ul><ul><ul><li>Shannon ’s Entropy </li></ul></ul><ul><ul><li>Correlation Dimension </li></ul></ul><ul><li>For manifolds, i.e., spaces locally modeled on , the fractal dimension coincides with d </li></ul>Fractal Dimension Family Easy to measure Sensitive to non-linear behavior
  29. 29. <ul><li>How much does the embedded structure deviate from the original space? </li></ul>Correlation Dimension of Embedded Matrix
  30. 30. <ul><li>Question : What doe the fractal behavior implies about the Internet? Is it self-similar? Recursive? </li></ul><ul><ul><li>[Li, Alderson, Willinger, Doyle, 2004] </li></ul></ul><ul><ul><li>[Mitzenmacher, 2004] </li></ul></ul><ul><li>Fractal behavior may also arise from non-recursive structures </li></ul><ul><ul><li>e.g., snowflakes, coastlines, surface of the human brain, etc. </li></ul></ul>Fractals and Internet Models
  31. 31. <ul><li>Questions </li></ul><ul><li>What features contribute to the delay space behavior? To what extent? </li></ul><ul><li>Why is it useful to study the Internet through the lens of the fractal measures? </li></ul>Part II: Dimensionality Components
  32. 32. <ul><li>Power-law extending over two orders of magnitude. </li></ul><ul><li>Geolocation does not account for the whole delay space dimensionality, although it is a strong component </li></ul>Geographic Location
  33. 33. <ul><li>Transit links between Tier-1 AS ’es contained on a significant fraction of Internet routes </li></ul><ul><li>Decomposition into overlapping subsets, each rooted at a Tier-1 network </li></ul>Dimensionality Reducing Decomposition <ul><li>Notice: not a partition, due to multihomed networks </li></ul><ul><li>Superposition of these pieces may inflate the dimensionality </li></ul>
  34. 34. <ul><li>Approximation to ground truth decomposition using a snapshot of the inferred topology [Oliveira et al., SIGMETRICS ’08] </li></ul>Dimensionality Reducing Decomposition
  35. 35. <ul><li>We measured each piece separately using </li></ul><ul><ul><li>Embedding dimension </li></ul></ul><ul><ul><li>PCA </li></ul></ul><ul><ul><li>Correlation dimension </li></ul></ul><ul><ul><li>Hausdorff dimension (see paper) </li></ul></ul>Dimensionality Reducing Decomposition
  36. 36. Dimensionality Reducing Decomposition Results <ul><li>Embedding dimension and PCA were insensitive to decomposition </li></ul>
  37. 37. Dimensionality Reducing Decomposition Results <ul><li>Fractal measures capture the structural change and report a dimensionality reduction </li></ul><ul><li>Power-law is preserved over the same range for subsets, however, with different exponents </li></ul>
  38. 38. <ul><li>Is the reduction in dimensionality due to other side effects of the decomposition? </li></ul><ul><ul><li>Pieces of smaller diameter? </li></ul></ul><ul><ul><ul><li>Decompose the network according to geographical and latency-based clustering </li></ul></ul></ul>Sanity Check 1
  39. 39. <ul><li>Diverts the set from the power-law behavior </li></ul>Distance-based Clustering <ul><li>Not comparable to topology-based decomposition </li></ul>
  40. 40. <ul><li>Is the reduction in dimensionality due to other side effects of the decomposition? </li></ul><ul><ul><li>Pieces of smaller cardinality [Bourgain ’ 75] ? </li></ul></ul><ul><ul><ul><li>Generated 139 random subnetworks with smaller cardinality (i.e., the median of Tier-1 subnetwork the sizes) </li></ul></ul></ul>Sanity Check 2
  41. 41. Random subsets with lower cardinality <ul><li>Random subsets of lower cardinality do not explain the dimensionality reduction </li></ul>
  42. 42. <ul><li>Fractal tool reports a dimensionality shift which was not captured by Embedding Dimension or PCA </li></ul><ul><li>What explain the discrepancy? </li></ul>Dimensionality Paradox
  43. 43. <ul><li>Dimensionality reducing decomposition is real and not an artifact of the fractal methodology </li></ul>Isomap Space is rich in non-linearity
  44. 44. <ul><li>Dimensionality is critical aspect influencing performance of algorithms </li></ul><ul><li>Fractal nature and non-linear behavior: Internet delay space dimensionality better characterized by fractal measures </li></ul><ul><ul><li>Lightweight </li></ul></ul><ul><ul><li>Sensitive to Internet features and structural changes </li></ul></ul><ul><li>Properties influencing dimensionality </li></ul><ul><li>Internet delay space dimensionality is not homogeneous </li></ul>Conclusion
  45. 45. <ul><li>Inetdim Project Website </li></ul><ul><ul><li>King datasets annotated with IP addresses, code, more info </li></ul></ul><ul><ul><li> </li></ul></ul>Further Info