Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
© Martin Geddes Consulting Ltd 2015 Page 1
Network Performance Management
The journey from skilled craft to hard science
S...
© Martin Geddes Consulting Ltd 2015 Page 2
Understanding the problem
To get a grasp of the underlying issue, you need to z...
© Martin Geddes Consulting Ltd 2015 Page 3
The ultracomputing prize
The set of enablers for ultracomputing is listed in Ap...
© Martin Geddes Consulting Ltd 2015 Page 4
Appendix A: The ultracomputing opportunity
The nature of future on-demand enter...
© Martin Geddes Consulting Ltd 2015 Page 5
 For cloud-service providers: where to place the application functionality – h...
© Martin Geddes Consulting Ltd 2015 Page 6
 Model: we have a robust calculus that lets us predict performance of systems ...
© Martin Geddes Consulting Ltd 2015 Page 7
Appendix B: Performance-related business processes
Concept-to-market
 Profitab...
© Martin Geddes Consulting Ltd 2015 Page 8
Appendix C: Sample curriculum for network performance engineering
Framework for...
© Martin Geddes Consulting Ltd 2015 Page 9
Appendix D: The enablers you need for ultracomputing
Thinkware: capability prim...
Upcoming SlideShare
Loading in …5
×

Network performance - skilled craft to hard science

737 views

Published on

This document describes the technical and business journey for network operators wanting to turn network performance from a skilled craft into hard science.

Published in: Technology
  • Want to preview some of our plans? You can get 50 Woodworking Plans and a 440-Page "The Art of Woodworking" Book... Absolutely FREE ♣♣♣ http://tinyurl.com/y3hc8gpw
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • The #1 Woodworking Resource With Over 16,000 Plans, Download 50 FREE Plans... ♥♥♥ http://ishbv.com/tedsplans/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • There are over 16,000 woodworking plans that comes with step-by-step instructions and detailed photos, Click here to take a look ♣♣♣ http://tinyurl.com/yy9yh8fu
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Want to preview some of our plans? You can get 50 Woodworking Plans and a 440-Page "The Art of Woodworking" Book... Absolutely FREE ➤➤ http://ishbv.com/tedsplans/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Network performance - skilled craft to hard science

  1. 1. © Martin Geddes Consulting Ltd 2015 Page 1 Network Performance Management The journey from skilled craft to hard science Summary Networks supply both connectivity and performance to external users’ distributed computing applications. Networks internally can also be considered as large distributed supercomputers. There is an underlying physical and technical reality to network operation, both externally and internally. Networks’ ability to deliver performance-based value is subject to constraints imposed by that reality: physics; maths; deployed technology; economics; and regulation. For physics, there is an established science that models this one aspect of that reality. The new science of network performance relates the performance elements of all of these aspects. What we offer is insight into this science, and knowledge transfer of key skills. We are also offering expertise, honed by practical experience, of applying the science at various scales in the real world. This is backed by a practical and proven toolset. This enables a step-change in customer experience and cost structure. That step change is possible because network performance science is a paradigm change. Broadband and the Internet have the counter-intuitive and paradoxical aspects of quantum physics. That is because packet-based statistical multiplexing is stochastic in its nature. This contrasts with the familiar “classical” world of circuits. Current broadband network design, economics and operations are all unconsciously still tied to a circuit paradigm. Engaging with this paradigm change poses both an intellectual and practical challenge. Taking radically new technology and inserting it into the existing paradigm of people and processes does not deliver the hoped-for outcome. The science can only be absorbed through cycles of action-oriented learning that evolve at the same speed as the people and processes. There is no magic “knowledge pill” or network mechanism that can short-circuit this. Failure to engage with this paradigm change represents a failure to engage with technical reality. This poses a serious (indeed existential) threat to operators. All technological inputs of the delivery chain are becoming commoditised: Intel/ARM CPUs, fibre, standard packet and SDN software, etc. Furthermore, the present construction of some of these is in conflict with the external constraints imposed by reality, notably those of the mathematics of statistical multiplexing. This forms an unsustainable technical and economic business model. The alternative is to align with the technical reality, and fully exploit the opportunities it offers. Mastery of this will result in valuable intellectual property: how to integrate all the elements, and embed the resulting service into profitable digital supply chains. This cannot come from your equipment suppliers. By definition anything they offer is a non-differentiating commodity. We call this the “ultracomputing” challenge (see Appendix A).
  2. 2. © Martin Geddes Consulting Ltd 2015 Page 2 Understanding the problem To get a grasp of the underlying issue, you need to zoom out to the highest level. There are three key conceptual lenses that we can view the network performance issue through:  Timescales: The network is a resource allocation system at all timescales (10^-9 to 10^9 seconds). Networks perform “trades” to match supply to demand.  Skills sets: The three core skills are to measure, model and manipulate performance.  Business processes: All OSS/BSS processes fall into one of three buckets: concept-to- market (C2M), lead-to-cash (L2C) and trouble-to-resolution (T2R). A detailed list is offered in Appendix B. Success comes from using the right skills sets to configure the right business processes to perform the “right” trades at the right timescales. There is a complex mapping and inter- dependency, which varies from operator to operator. The challenge is to acquire the capabilities in the appropriate order. So where to begin?  Timescales: It is easier to work with longer timescales than shorter ones.  Skills sets: Until you can measure the right thing, and understand what it means, scientific modelling and manipulation is impossible.  Business processes: in-life management is typically the greatest source of cost pain, and you can’t deploy advanced assured services if you can’t do their in-life management, so trouble-to-resolution is the place to start. Thus rather than trying to bite off the whole problem of a paradigm shift, there are some clear corner cases that are the best candidates with which to begin the capability transformation: fault isolation and (problematic) capacity planning. The journey Step zero is to select a “problem” network (ideally with a “must retain” client account that is coming up for contract renewal). We prefer to work on a B2B (or telco-to-telco) case, such as a large corporation outsourcing their IT. 1. People first. Engage in network performance science education. Begin with one day of fundamentals, plus one day of practical training in measurement. (This forms the start of a curriculum shown in Appendix C.) 2. Process next. We work on skills transfer on fault isolation. Demonstrate ROI from the science-led approach that could not have otherwise been obtained. 3. Technology last. Create your own measurement system that can reproduce this at scale. We can supply the tools. (You could build your own, but it would delay things by years.) Once this is in place, further cycles are possible with expanded scope: capacity planning & supply chain management for more sophisticated products (e.g. VPN, UC).
  3. 3. © Martin Geddes Consulting Ltd 2015 Page 3 The ultracomputing prize The set of enablers for ultracomputing is listed in Appendix D. If you have these, what can you expect? In the business domain, you can be like a Fedex of the digital logistics world, controlling compete supply chains. The value is in the coordination of the trading spaces to match supply and demand, not the capex-heavy asset ownership. In the network domain, you can be like a Maersk of networks, where you have massive increasing returns to scale from being able to aggregate heterogeneous demand and multiplex that traffic. In the science and technology domain, you can be like a GE/Rolls Royce, with “power plants” for information translocation and distributed computing whose “ultracomputing” performance envelope greatly exceeds that of rivals.
  4. 4. © Martin Geddes Consulting Ltd 2015 Page 4 Appendix A: The ultracomputing opportunity The nature of future on-demand enterprise service delivery is a qualitative change in technical difficulty compared to past. This dictates that a whole new skillset must be learnt, which we call ultracomputing. This takes the essence of supercomputing, but scales it up to a highly distributed and virtualised environment. Those who master ultracomputing skills will achieve cost and performance for cloud-based services that far exceed their rivals, with much lower risk. This appendix describes the challenge, the opportunity, and how we can help you seize the ultracomputing prize. Key understanding: all networks are large-scale distributed parallel supercomputers In supercomputing, you have a large number of interconnected nodes involved in computation and communication. They simultaneously work on many inter-dependent problems. The system must remain stable at all loads, produce outputs within bounded timeframes, and be resilient to component or process failure. The same requirements are being placed on telecoms networks. However, in telecoms networks the relative costs of the component computation and communications technologies continually vary. Furthermore, the interconnection between these functions can no longer be assumed to be carried over dedicated circuits, as all traffic is now over a common statistically shared transmission medium. The cost structure and performance of the transmission can vary from one territory to the next, as well as dynamically over time. As a result, the optimal location of each function in the distributed architecture also can change. The performance is specific to each network configuration, rather than generic protocol behaviour. Ultracomputing thus demands a new discipline: the performance engineering of complete large-scale dynamic distributed architectures. Critically, this is distinct from the engineering of any of the sub-components. The challenge: find the optimal resource trade-offs at all timescales This optimal location of any function in an ultracomputing environment depends upon both the desired customer experience and total cost of operation. The customer experience depends on the quality of experience (QoE) performance hazards; the total cost of ownership depends on the cost of mitigating or addressing those hazards, and the level of financial predictability that results. The ultracomputer has to enable the appropriate resource trade-offs using a distributed resource allocation model. This plays out differently for each part of the service ecosystem:  For network operators: where to place caches, radio controllers, or internet breakout.  For the content distributors: where to place delivery systems, when/whether to use multicast, where to place transcoders (from centrally down to every set top box).
  5. 5. © Martin Geddes Consulting Ltd 2015 Page 5  For cloud-service providers: where to place the application functionality – how much local, and how much remote, given that functional splitting increases implementation complexity. In the ultracomputing world, the design space is now large, irregular, and involves interactions of sub-systems from many vendors. The current virtualisation trend has magnified the issue over how best to allocate resources. Once a function can be located in many places, the total number of combinations becomes too high to test and validate empirically before deployment. Ultracomputing therefore demands new skills to model and manage distributed systems at scale, and the trade-offs involved at all timescales, from design to configuration to operation. The ultracomputing skills set In ultracomputing you need to be able to perform the following design and engineering activities:  Reason ex-ante about complete systems, and the interaction of all their sub- components.  Understand and model the predictable region of operation and their failure modes under load, so as not to cause localised or widespread failure.  Understand how finite communication and computation resources are constrained by both capacity and schedulability factors, and model the complex range of interactions against these two constraints.  Know whether demand can be scheduled to get the resources it wants in the timescales it requires.  Manage both the resources of the external user processes, as well as the internal communication and coordination resources, which are all multiplexed together.  Allocate resources for all the above using a coherent distributed resource management system. Regrettably, the telecoms and IT industries have both yet to conceptually and practically grasp these issues. Yet the application of known mathematical and performance engineering techniques can resolve the technical problems: how to decompose the system, understand the trade-offs being made, optimise for specific cost or user experience outcomes, and operate these complex systems with a high degree of predictability even in overload. How we can help you We have made fundamental conceptual and practical breakthroughs in network performance science. We (uniquely) know how to measure, model and manipulate systems at the ultracomputing scale.  Measure: we can perform network “X-rays” to get high-resolution “pictures” of the performance of network elements, and how they (de)compose in both space and time.
  6. 6. © Martin Geddes Consulting Ltd 2015 Page 6  Model: we have a robust calculus that lets us predict performance of systems before they are built or re-configured. This is the essence of service orchestration in ultracomputing.  Manipulate: we have proven algorithms that can safely drive these systems into saturation, where they are most profitable, whilst still delivering assured outcomes. These techniques have been proven at clients such as Boeing, BT, CERN and a tier 1 mobile operator.
  7. 7. © Martin Geddes Consulting Ltd 2015 Page 7 Appendix B: Performance-related business processes Concept-to-market  Profitability metric space + cost/value metrics  M&A performance due diligence  Performance-aware service design and capture of operational performance invariants  Performance-aware failure modes effects analysis (both business and civil contingencies)  Deployability analysis (from lab to real world, in varying environments)  Scalability rewards and risks (managing all levels of variation + effects on scalability)  Product development pipeline & product portfolio management  Service performance architecture (beyond functional correctness to include non- functional characteristics; turns performance into a first-class entity, not an afterthought)  Performance-centric design (ex-ante performance engineering of outcomes + resource costs)  Quality arbitrage management: defence; offense; partner management  Supply chain performance management (horizontal along chain; vertical to suppliers; how to structure the performance aspects of the contracts). Lead-to-cash  Performance fraud management  SLA/QTA compliance  Per customer service trending  Performance invariant assurance (evidencing SLA compliance)  Service performance resource accounting (cost of supporting the service; opportunity cost of running the service) – a performance equivalent of the “bill of materials”.  Service cost pricing: options pricing, time-volume pricing Trouble-to-resolution  Root cause “fault” isolation (as many “faults” are design issues)  Litigation + liability management  Performance variation management  Regulatory conformance of performance; explicit and implicit reputation management; evidence of delivery
  8. 8. © Martin Geddes Consulting Ltd 2015 Page 8 Appendix C: Sample curriculum for network performance engineering Framework for customer experience and service quality performance management  Basic technical understanding of network performance  The relationship between customer experience and service quality  Relationship of organisational roles in delivering optimal business outcomes Design aspects  Technical design - How to relate cross-sectional capacities of edge and core to performance - How to manage performance risk within and between management domains  Service design - How to correctly quantify costs on common infrastructure of service and service growth - How best to aggregate users and service in order to achieve cost reduction Contracting  How to construct supply chains that deliver consistent end-to-end performance  How to assure service delivery from a contractual perspective Deployment and provisioning  How to manage the end-to-end performance aspects of turnout & acceptance testing  How to incorporate performance into your initial deployment processes Ongoing service assurance  How to tie high-level performance outcomes to low-level network performance metrics  How to manage service quality and performance trade-offs  How to distinguish between demand-side and supply-side performance issues Fault isolation  How to determine the root cause of performance issues  How to predict performance hazards by trending low-level metrics for hazard arming
  9. 9. © Martin Geddes Consulting Ltd 2015 Page 9 Appendix D: The enablers you need for ultracomputing Thinkware: capability primitives  ΔQ. A metric and algebra that captures and measures outcomes at every level of abstraction (both network to end user).  Quantitative Translocation Agreements (QTAs). These capture the relationship of supply and demand, at every level and timescale.  Predictable Region of Operation (PRO). Need to manage system within this, especially where system has performance “turning points”.  Performance hazard hierarchy and analysis. This describes how to think about performance hazards and their mitigation and various levels of size and abstraction; central to the management problem and make rational design and operational decisions.  ΔQ QTA aggregation. This is an input from service planning into capacity planning, telling you how convolve services and customers into an aggregate requirement. This then tells you how much of the resource you need at different physical locations to achieve the outcome for a given level of performance.  ΔQ resource models. Serial-parallel flow graphs and application performance. An algebra and calculus for combining process behaviour and network behaviour into outcomes and costs. Wetware  Stochasticians  Modellers  Business process design & operational management  Performance accounting function: o Quality czar (Chief Quality Officer) o Performance hazard modelling Software  QTA library (for different applications and bearer combinations; distributing ΔQ budgets and performance hazard arming)  Measurement repository & analysis  QTA-based analytics  Multipoint measurement data processing  QTA-based breach hazard warnings Hardware  Probes & test stream data generators  QTA-aware SDN orchestration  Computation resources for measurement & modelling  Communication resources for passing data & control plane  New approaches and mechanisms to short timescale ΔQ trading

×