Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Stationarity
is the new speed
The network lights are green,
but the users are screaming…
Why? Non-stationarity!
2About Martin Geddes
18 September 2017
© Martin Geddes Consulting Ltd
I am a computer scientist,
telecoms expert, writer a...
3
The purpose of this presentation
• Our goal is to share exemplars of important broadband Internet access
performance phe...
4Stationarity — The most important
networking term you’ve never heard of
• All network and distributed application control...
5
Initial example: Satellite in Asia
Stationary
Good for web browsing
Non-stationary
Poor for web browsing
Same satellite,...
Where do these examples of
Internet non-stationarity come from?
The world’s only network performance science company
www.p...
7
An important caveat
• Typical ‘best effort’ broadband Internet access services lack intentional
semantics for performanc...
8
G
S
V
Geographic delay
Size of packet delay
Variable delay due to load
Key terminology used: G, S and V
The components o...
Single experiment runs
of 5 minutes each
Examples of non-stationarity
10VDSL
‘Service holiday’ in upstream
11
Help me to understand…
…what it shows? …what it means? …what to do about it?
There are sudden big ‘jumps’
in the delay ...
12VDSL
Anomalies in upstream
13
Help me to understand…
…what it shows? …what it means? …what to do about it?
Short bursts of ‘weirdness’ in
the upstrea...
14VDSL
100 second outage
15
Help me to understand…
…what it shows? …what it means? …what to do about it?
We see a burst of activity and
then a 100 ...
16DSL
Downstream extreme non-stationarity
17
Help me to understand…
…what it shows? …what it means? …what to do about it?
Here we see extreme levels
of downstream n...
18High-speed access
Queues form — ‘bufferbloat’
19
Help me to understand…
…what it shows? …what it means? …what to do about it?
This is a ‘classic’ pattern of
the network...
204G via tethered WiFi
Huge queues and service breaks
21
Help me to understand…
…what it shows? …what it means? …what to do about it?
This is a more extreme
version of the prev...
22DSL
Overload driving periodic delay spikes
23
Help me to understand…
…what it shows? …what it means? …what to do about it?
Every 60 seconds there is a
sudden non-sta...
24WiFi + DSL
Burst of activity every 10 seconds
25
Help me to understand…
…what it shows? …what it means? …what to do about it?
Every 10 seconds there is a
short non-stat...
26
VDSL: Stationary but high delay
27
Help me to understand…
…what it shows? …what it means? …what to do about it?
In this case the network is
highly station...
28
Downstream packet fragmentation
29
Help me to understand…
…what it shows? …what it means? …what to do about it?
In this chart, we have plotted
the delay a...
30
Measurement clock drift
31
Help me to understand…
…what it shows? …what it means? …what to do about it?
The variable portion of delay
(V) in the u...
Longitudinal data
Broken into G and S
Examples of non-stationarity over days or weeks
335G isn’t happening on
UK “superfast” infrastructure!
34
Help me to understand…
…what it shows? …what it means? …what to do about it?
The time taken to serialise
and deserialis...
35VDSL doesn’t deliver stable performance
Evidence of non-stationarity in G and S
36
Help me to understand…
…what it shows? …what it means? …what to do about it?
This data shows how neither
the geographic...
37GPON looking good!
Shows how the Internet has ‘weather’
38
Help me to understand…
…what it shows? …what it means? …what to do about it?
This shows the geographic
delay (in green)...
39GPON access to AWS in other country
Notice load-balancing effects
40
Help me to understand…
…what it shows? …what it means? …what to do about it?
With G (in green) we see
banded striations...
Structural analysis
Cluster plot of G vs S
Core Internet path in Europe
42Even the core Internet
links aren’t stable!
Geographic delay (G) (ms)
Sizedelay(S)(ms)
Stationary
Non-stationary
43
Help me to understand…
…what it shows? …what it means? …what to do about it?
This is an analysis of
longitudinal data o...
Appendix
Technical details of the metrics and measurements
45About these high-fidelity
measurements of network non-stationarity
• They are selected from a variety of projects done f...
46How are these extremely precise
network quality measurements made?
Packet flow
“wind tunnel”
Test traffic with
special s...
47These are not ‘speed tests’:
We are measuring quality, not quantity
• A ‘speed test’ is like asking if your
electricity ...
48Additional reading about the
measurement techniques and ∆Q
• For the core science and mathematics of ∆Q see
qualityatten...
49
To learn more…
Engineered experiences for broadband
www.justright.network
Bespoke measurement and modelling
www.pnsol.c...
Upcoming SlideShare
Loading in …5
×

Stationarity is the new speed

837 views

Published on

The goal of this presentation is to share exemplars of important broadband Internet access performance phenomena. In particular, we highlight the critical role of stationarity.

When they have non-stationarity, networks are useless for most applications. We show real-world examples of both stationarity and non-stationarity, and discuss the implications for broadband stakeholders.

These phenomena are only visible when using state-of-the-art high-fidelity metrics and measures that capture instantaneous flow.

Published in: Internet
  • Be the first to comment

Stationarity is the new speed

  1. 1. Stationarity is the new speed The network lights are green, but the users are screaming… Why? Non-stationarity!
  2. 2. 2About Martin Geddes 18 September 2017 © Martin Geddes Consulting Ltd I am a computer scientist, telecoms expert, writer and consultant. I collaborate with other leading practitioners in the communications industry. Together we create game- changing new ideas, technologies and businesses. martingedd.es
  3. 3. 3 The purpose of this presentation • Our goal is to share exemplars of important broadband Internet access performance phenomena. • This is for learning and training purposes only. They are not meant to be exhaustive or even representative of Internet performance issues. • In particular, we highlight the critical role of stationarity. When they have non-stationarity, networks are useless for most applications. • We show real-world examples of both stationarity and non- stationarity, and discuss the implications for broadband stakeholders. • These phenomena are only visible when using state-of-the-art high-fidelity metrics and measures that capture instantaneous flow.
  4. 4. 4Stationarity — The most important networking term you’ve never heard of • All network and distributed application control protocols depend on the statistical stability of the network, i.e. stationarity. • ‘Stationarity’ is a standard term in statistics, see its definition here. • When the past and future are strongly related, then the protocols can successfully predict the future from the past and act ‘sensibly’. • When the past and the future become unrelated, the protocols make too many ‘bad guesses’. • This non-stationarity causes congestion control systems and codecs to act ‘stupidly’ and break down. • There is no clever protocol or application ‘fix’ for non-stationarity.
  5. 5. 5 Initial example: Satellite in Asia Stationary Good for web browsing Non-stationary Poor for web browsing Same satellite, same location, similar time, different services
  6. 6. Where do these examples of Internet non-stationarity come from? The world’s only network performance science company www.pnsol.com
  7. 7. 7 An important caveat • Typical ‘best effort’ broadband Internet access services lack intentional semantics for performance; that is to say, it is emergent. • That means they legitimately can do anything! (And they often will!) • Hence these phenomena are not (necessarily) ‘faults’, but are simply “how the Internet works” (or doesn’t, as the case may be). • These type of phenomena are widespread, and are the result of systemic issues with network architecture, protocols and operation. • As a consequence, no blame or negative publicity should be attached to the specific countries, operators or bearer technologies concerned.
  8. 8. 8 G S V Geographic delay Size of packet delay Variable delay due to load Key terminology used: G, S and V The components of packet delay Packet size One-waydelay
  9. 9. Single experiment runs of 5 minutes each Examples of non-stationarity
  10. 10. 10VDSL ‘Service holiday’ in upstream
  11. 11. 11 Help me to understand… …what it shows? …what it means? …what to do about it? There are sudden big ‘jumps’ in the delay where the upstream takes a ‘service holiday’ and stops processing packets for ~10 seconds. As a result a queue builds up: the diagonal slope is the queue emptying. This ’jump’ is non-stationarity. We don’t know the reason, as it could be the customer premises equipment (CPE) or the line. Our suspicion it is the former, with the CPE becoming preoccupied with some task other than packet processing. But it could be DSLAM cross-talk between copper lines affecting the transmission protocol. The service doesn’t have intentional semantics, so this is not strictly a ‘fault’. The industry needs to gain the operational capability to see this happening. That means adopting high-fidelity metrics, and working together to mature and scale these state-of-the-art measurement systems.
  12. 12. 12VDSL Anomalies in upstream
  13. 13. 13 Help me to understand… …what it shows? …what it means? …what to do about it? Short bursts of ‘weirdness’ in the upstream. These sudden transitions are an examples of non-stationarity. This general kind of “weird stuff happens!” pattern shows up in ~1% of the experiment runs, across all technologies we have tested. A ‘speed test’ wouldn’t be affected by this, but interactive applications would be affected. This kind of anomaly is the result of unpredictable performance of TCP/IP and current flow control protocols and scheduling algorithms. This kind of data would be lost in the usual averaged performance metrics. Operators and regulators need to not only increase the fidelity of their metrics, but also need to be able to isolate which direction the issue is happening in.
  14. 14. 14VDSL 100 second outage
  15. 15. 15 Help me to understand… …what it shows? …what it means? …what to do about it? We see a burst of activity and then a 100 second outage. Obviously the outage affects any application’s ability to function. The transitions from “working” to “not working” are non- stationarity that will cause adaptive protocols to malfunction. When you buy “best effort” this is what you get – no lower bound on quality whatsoever. You would have no real grounds for complaint that this service is not what you paid for. The industry needs to move from a network-centric viewpoint to a user-centric one. What matters is fitness-for-purpose, and the metrics and measures need to reflect the user perspective.
  16. 16. 16DSL Downstream extreme non-stationarity
  17. 17. 17 Help me to understand… …what it shows? …what it means? …what to do about it? Here we see extreme levels of downstream non- stationarity. There is no obvious structure. The upstream and downstream are very different (not shown). This is an ISP service whose performance has essentially collapsed. It is not usable for most interactive applications. However, a speed test might well return a perfectly acceptable result (certainly in one direction, and possibly both)! The regulatory system needs to differentiate between a service that is available in the network’s terms (packets are flowing) and the user’s terms (usable for desired applications).
  18. 18. 18High-speed access Queues form — ‘bufferbloat’
  19. 19. 19 Help me to understand… …what it shows? …what it means? …what to do about it? This is a ‘classic’ pattern of the network being overdriven, and a queue suddenly forming. The ‘upslope’ is very sleep, and the dissipation of the queue in the ‘downslope’ is much slower. The sudden variability is non- stationarity. This is what is colloquially known as ‘bufferbloat’. It is the result of poor choices of scheduling and resource control in networks. This problem is extremely widespread. The solution to this problem is to schedule traffic better and avoid over-saturation of resources. Unfortunately, the chosen means of scheduling by the broadband industry (Active Queue Management) merely shifts the symptoms around, rather than truly addressing its root engineering causes.
  20. 20. 204G via tethered WiFi Huge queues and service breaks
  21. 21. 21 Help me to understand… …what it shows? …what it means? …what to do about it? This is a more extreme version of the previous example, with large queues forming and slowly draining creating very high non- stationarity. Some of the gaps indicate many of the test packets are being lost. (The losses are being recorded, just not shown.) When we compose different access media (in this case two different wireless systems), their performance interacts in ways that may not be desirable or under the control of the ISP. The reality is that methods like tethering are frequently used by end users. Both the network architects and regulatory system need to take account of these modes of use in their design and operation.
  22. 22. 22DSL Overload driving periodic delay spikes
  23. 23. 23 Help me to understand… …what it shows? …what it means? …what to do about it? Every 60 seconds there is a sudden non-stationary burst in the upstream of around 0.5 seconds, with a corresponding burst in the downstream (not shown) of around 0.2 seconds. This would appear to be some load-related issue, since it is bidirectional and shows up as V. Some application is presumably waking up every minute and suddenly applying a heavy brief load that over-saturates the link resources. There is a lack of performance isolation of applications on both the customer network and the ISP service. The measurement system needs to be able to differentiate between these causes with probes at the hand-off point.
  24. 24. 24WiFi + DSL Burst of activity every 10 seconds
  25. 25. 25 Help me to understand… …what it shows? …what it means? …what to do about it? Every 10 seconds there is a short non-stationary burst of delay, in both directions, seen as vertical ‘stacks’. This is a normal part of the operation of Wi-Fi networks, which typically scan every 10 seconds. When the radio is busy doing one thing (scanning), it cannot be doing another thing (transmitting). Plug directly into your router over a fixed Ethernet cable if it’s a problem. For regulators, separating out the contribution of the home network from the ISP network requires suitable boundary probe measures.
  26. 26. 26 VDSL: Stationary but high delay
  27. 27. 27 Help me to understand… …what it shows? …what it means? …what to do about it? In this case the network is highly stationary. However, it also has a high base delay of over 25ms. The high base delay means that a significant proportion of the ‘quality budget’ for things like long-distance VoIP is already used up; your Skype calls to New Zealand from US/Europe may not work. Stationarity is necessary, but not sufficient, for applications to perform to the standard required. We can only begin to optimise networks once we have a baseline of stationarity. Otherwise, we have no stable properties from which to determine cause and effect. ISP services need to define their stable ‘quality floor’, which is a proxy for their fitness-for-purpose.
  28. 28. 28 Downstream packet fragmentation
  29. 29. 29 Help me to understand… …what it shows? …what it means? …what to do about it? In this chart, we have plotted the delay against the packet size, rather than against time. We can see a sudden jump at around 1300 bytes. This is another form of non- stationarity in the distribution of performance. This is an example of packet fragmentation happening. It would affect the performance of many interactive and real-time applications. Generic speed tests are hopelessly poor at detecting these phenomena. Performance is sensitive to packet size, but not all performance tests take account of this. We have seen major operators use a single packet size for all their tests. The industry needs to adopt rigorous scientific management of performance.
  30. 30. 30 Measurement clock drift
  31. 31. 31 Help me to understand… …what it shows? …what it means? …what to do about it? The variable portion of delay (V) in the upstream direction is slowly declining over the period of the experiment. This initially looks like some kind of non-stationarity. This one is us playing a trick on you! The network is in fact stationary. What this data actually shows is ‘clock drift’ in the measurement system, due to using low-cost apparatus with less stable internal clocks. It is a sign that our data is ‘real’ and ‘honest’. We can correct for this kind of issue, but have chosen not to here. First you need to know whether what you are looking at is ‘real’ or an artefact of the measurement process. To capture high- quality data you must identify and quantify the ’junk and infidelity’ of your metrics, measures and performance models.
  32. 32. Longitudinal data Broken into G and S Examples of non-stationarity over days or weeks
  33. 33. 335G isn’t happening on UK “superfast” infrastructure!
  34. 34. 34 Help me to understand… …what it shows? …what it means? …what to do about it? The time taken to serialise and deserialise packets over network links (S – shown in green) is stable. However, the base delay of the network is constantly rising each day, before resetting once it reaches some maximum. This is non- stationarity of G (shown in blue). The VDSL system is self- optimising in some way that means ‘G’ is not stationary. However, small cells assume stationary ‘G’ in order for their timing systems to work. Nobody can blame the operator or regulator for this, since stationarity of G was never a policy or engineering requirement. However, the assumption in 5G business plans is that cheap and ubiquitous backhaul will be available. This highlights the need for forward planning in core infrastructure.
  35. 35. 35VDSL doesn’t deliver stable performance Evidence of non-stationarity in G and S
  36. 36. 36 Help me to understand… …what it shows? …what it means? …what to do about it? This data shows how neither the geographic delay (here, in green), nor the size- related delay (in blue), have stable properties. The gap is where no measurements were recorded; it is not an outage. This non-stationarity is like a building with wet rot in the basement, and dry rot in the windows. It means adaptive and learning protocols cannot operate well over the long run. The regulatory system needs a performance management upgrade to ensure that national infrastructure is fit-for- purpose over the long run.
  37. 37. 37GPON looking good! Shows how the Internet has ‘weather’
  38. 38. 38 Help me to understand… …what it shows? …what it means? …what to do about it? This shows the geographic delay (in green) moving within tight bounds, whilst the size-related delay (in blue) is consistent. The scale makes it look like there is higher variability than there really is. This is essentially a good service with relatively stationary properties. What it does illustrate is how the the Internet has ‘weather’, with frequent variability. Vendors, operators and regulators need to get a grip on their ‘meteorology’ and ‘climatology’. There is a need for ‘geoengineering’ of these systems to deliver the ‘weather’ properties we desire. This requires ISPs to define their intentional semantics and manage to that requirement.
  39. 39. 39GPON access to AWS in other country Notice load-balancing effects
  40. 40. 40 Help me to understand… …what it shows? …what it means? …what to do about it? With G (in green) we see banded striations, where there are two distinct levels of delay at the same time. There are also both outliers of G and S (in blue). This is non-stationarity since there is high variation in the delay of G and S. The G effect is the result of load balancing, with packets taking different routes. This is fine if you take one rout today vs another tomorrow. However, if you hash IP/port, there can be massive inconsistency, with video going one way, audio another, and being out of sync. Hence this can be a significant QoE impact. Is this behaviour within specification or not? Given there typically is no performance specification, it must all be acceptable. The industry needs to consider the stationarity of G, S and V both independently as well as collectively. Mere round-trip times, jitter and average loss rates are not enough!
  41. 41. Structural analysis Cluster plot of G vs S Core Internet path in Europe
  42. 42. 42Even the core Internet links aren’t stable! Geographic delay (G) (ms) Sizedelay(S)(ms) Stationary Non-stationary
  43. 43. 43 Help me to understand… …what it shows? …what it means? …what to do about it? This is an analysis of longitudinal data of a portion of the core Internet backbone over a 10Gbit/sec path. It is a cluster diagram of G vs S. The S is small, and tightly managed. However, we see some “normal outliers” (big abnormal ones have been excluded). We would expect S to be constant with respect to G, but it isn’t. The widespread assumption that core networks are dead stable is false: there is non- stationarity. We need to move to digital supply chain management to be able to manage these performance phenomena over multiple technical and management boundaries.
  44. 44. Appendix Technical details of the metrics and measurements
  45. 45. 45About these high-fidelity measurements of network non-stationarity • They are selected from a variety of projects done for both private clients as well as publicly-funded research projects. • The measurements are done in the upstream and downstream, capturing the loss and delay of individual test packets, and their resulting probability distribution. • There has been no attempt to replicate these specific phenomena, analyse their temporal frequency or spatial distribution, or isolate their root causes. • We have not addressed packet loss in these teaching examples, but it is fully incorporated into the measurement and modelling methodology.
  46. 46. 46How are these extremely precise network quality measurements made? Packet flow “wind tunnel” Test traffic with special statistical properties Packet flow “functional MRI scan” High-resolution space and time observations Quality attenuation science New ∆Q mathematics and methods for data analysis
  47. 47. 47These are not ‘speed tests’: We are measuring quality, not quantity • A ‘speed test’ is like asking if your electricity supply can power an overnight storage heater. • A ‘stationarity test’ is like asking if the power supply is of sufficient stability to drive a motor at a constant speed. • The latter contains far more information than the former. • For more on the inherent limitations of broadband speed tests see here.
  48. 48. 48Additional reading about the measurement techniques and ∆Q • For the core science and mathematics of ∆Q see qualityattenuation.science or the PhD of Dr Dave Reeve. • How to X-ray a telecoms network shows our measurement method and tools. • Fundamentals of network performance engineering for G, S & V. • What is ‘stationarity’, and why does it matter? • Examples of using high-fidelity ∆Q metrics at CERN (video at 40 million frames/sec) and Kent Public Service Network • The properties and mathematics of data transport quality • Network performance optimisation using high-fidelity measures
  49. 49. 49 To learn more… Engineered experiences for broadband www.justright.network Bespoke measurement and modelling www.pnsol.com Educational services and consultancy www.martingeddes.com

×