Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

CARE initiative technical prospectus

60 views

Published on

When we get water, electricity, or gas delivered to our home or place of work we expect it to have predictable quality. Why isn't this also true of broadband? The answer is we don't (yet) have the "glue" to integrate performance in digital supply chains.

Published in: Technology
  • Be the first to comment

CARE initiative technical prospectus

  1. 1. Cloud Access Reliability Engineering Initiative Interoperable SLAs for digital supply chains Technical Prospectus 4th November 2018
  2. 2. CLOUD SERVICE (SaaS/PaaS) TELCO & CLOUD ACCESS CUSTOMER PREMISES DEVICE APPLICATION SOFTWARE Overview of the problem: digital supply chains exist to support distributed applications 2 DIGITALSUPPLYCHAIN The increasingly variability has making it harder to reason about performance and engineer reliability to meet that demand. We have to decide where to put the compute, and when to communicate. The number of design and configuration degrees of freedom are rising: location, capacity, scheduling, loss vs delay. Questions about too much demand or not enough supply have become hard to answer. Safety margins have become opaque. The trade-offs of cost and performance lack predictable outcomes. The resulting unreliability is driving user frustration – the “motherbuffer”, which in turn creates costly workarounds. The main one has been “spend money on bandwidth” – but this doesn’t work to solve growing variability. The answer is the software industry equivalent of “containerisation”: what the “rack” does for cloud hardware, we need for cloud services. But how? Users have portfolios of cloud applications that they wish to use. These have a mixture of availability demands. The telecoms-cloud industry is building supply using a variety of technologies, such as 5G slicing, SDN/NFV, SD-WAN, serverless computing, and WiFi. That supply is becoming ever more dynamic (higher speeds, more statistical resource sharing, more wireless) and more distributed (e.g. network function virtualisation, edge apps). Technologies like cloud also have a start-up latency for the container versus “bare metal” computing, as well as having to wait for shared resources (as with packet data).
  3. 3. LAN Access network Longhaul network Cloud application container SD-WAN VPN DATA CENTRE USER DEVICE EXPERIENCE Presentation layer runtime APPLICATION PROVIDER TELCO & CLOUD ACCESS CUSTOMER PREMISES APPLICATION PROVIDER Digital supply chains are systems of supply and demand… just like any other industry 3 DEMANDFORCOPYINGDATADOWNSTREAM DEMANDFORCOPYINGDATAUPSTEAM SUPPLYOFINFORMATIONDOWNSTREAM SUPPLYOFINFORMATIONUPSTREAM Industries typically have a standard unit of supply and demand that meaningfully “adds up” and interoperates Natural gas BTU Electricity MW Water litre Corn bushel Sugar pound Oil barrel Ethanol gallon Copper ton Wool kilogram Gold ounce Shipping 40ft container Cloud application access = ?
  4. 4. Wi-Fi Ethernet xDSL Cable FTTx 2G/3G/4G/5G Public Internet Private cloud global access MPLS/Carrier Ethernet/etc. Hosted app Public or private cloud Serverless functions SD-WAN No SD-WAN VPN No VPN DISTRIBUTED APPLICATION SOFTWARE DISTRIBUTED APPLICATION SOFTWARE VDI Web browser UC/VoIP AppTV APPLICATION PROVIDER TELCO & CLOUD ACCESS CUSTOMER PREMISES APPLICATION PROVIDER Digital supply chains involve complex interactions between multiple technology stacks 4 HORIZONTAL INTEROPERABILITY VERTICALINTEROPERABILITY ICT suppliers want to manage “vertical” interoperability: How do I deliver enough reliability end-to-end to meet the customer’s need? How can I optimise cost for my sub-path without sacrificing that reliability? End users demand “horizontal” interoperability: How do I know what on- premise capabilities I need? How do I select a service provider and network access technology and know my applications will work?
  5. 5. Wi-Fi Ethernet xDSL Cable FTTx 2G/3G/4G/5G Public Internet Private cloud global access MPLS/Carrier Ethernet/etc. Hosted app Public or private cloud Serverless functions SD-WAN No SD-WAN VPN No VPN DISTRIBUTED APPLICATION SOFTWARE DISTRIBUTED APPLICATION SOFTWARE VDI Web browser UC/VoIP AppTV APPLICATION PROVIDER TELCO & CLOUD ACCESS CUSTOMER PREMISES APPLICATION PROVIDER “Vertical” interoperability needs standardised and (de)composable “availability SLAs” 5 The end-to-end requirement is for a bounded probability of packet latency and loss. The concept of “quality attenuation” unifies latency and loss into a single mathematical object, analogous to how complex numbers bring together real and imaginary numbers. ”Quality attenuation” can be expressed as composable “availability SLAs” (when using ∆Q metrics) that “add up” along the end-to-end path. END-TO-ENDQUALITYREQUIREMENT
  6. 6. Wi-Fi Ethernet xDSL Cable FTTx 2G/3G/4G/5G Public Internet Private cloud global access MPLS/Carrier Ethernet/etc. Hosted app Public or private cloud Serverless functions SD-WAN No SD-WAN VPN No VPN DISTRIBUTED APPLICATION SOFTWARE DISTRIBUTED APPLICATION SOFTWARE VDI Web browser UC/VoIP AppTV APPLICATION PROVIDER TELCO & CLOUD ACCESS CUSTOMER PREMISES APPLICATION PROVIDER “Horizontal” interoperability decouples operational implementation from the availability SLA 6 HORIZONTAL INTEROPERABILITY Availability is something you can only lose: the baseline is “always available”, and every element can only detract from that standard of perfection! We can use the “availability SLAs” to create a “budget” for quality attenuation for each sub-system and element of the system. ”Horizontal” interoperability means we can safely make independent operational choices over how to meet that “budget”. As long as we are “under budget” (to an agreed probability) then we know the end-to-end availability requirement will still be met.
  7. 7. Wi-Fi Ethernet xDSL Cable FTTx 2G/3G/4G/5G Public Internet Private cloud global access MPLS/Carrier Ethernet/etc. Hosted app Public or private cloud Serverless functions SD-WAN No SD-WAN VPN No VPN DISTRIBUTED APPLICATION SOFTWARE DISTRIBUTED APPLICATION SOFTWARE VDI Web browser UC/VoIP AppTV APPLICATION PROVIDER TELCO & CLOUD ACCESS CUSTOMER PREMISES APPLICATION PROVIDER Our industry challenge: to develop a supply chain quality management system 7 HORIZONTAL INTEROPERABILITY VERTICALINTEROPERABILITY Relating “vertical” to “horizontal” needs a common language of metrics (for SLAs) and measures (for operations): Standardised metrics Standardised SLAs Standardised operational measurement methods Standardised service lifecycle management processes Standardised network quality assurance mechanisms
  8. 8. DISTRIBUTED COMPUTING DISTRIBUTED COMPUTING DISTRIBUTED COMPUTING DISTRIBUTED COMPUTING DISTRIBUTED COMPUTING DISTRIBUTED COMPUTING DISTRIBUTED COMPUTING DISTRIBUTED COMPUTING DISTRIBUTED COMPUTING DISTRIBUTED COMPUTING DISTRIBUTED COMPUTING DISTRIBUTED COMPUTING DISTRIBUTED COMPUTING DISTRIBUTED COMPUTING DISTRIBUTED COMPUTING DISTRIBUTED COMPUTING DISTRIBUTED COMPUTING DISTRIBUTED COMPUTING DISTRIBUTED COMPUTING DISTRIBUTED COMPUTING DISTRIBUTED COMPUTING DISTRIBUTED COMPUTING DISTRIB UTED COMPU TING APPLICATION PROVIDER TELCO & CLOUD ACCESS CUSTOMER PREMISES APPLICATION PROVIDER We have to create the performance integration framework (“glue”) for interoperable SLAs 8 Other industries have solved this reliability engineering integration problem. For instance, the oil industry can relate “upstream” extraction and refining activities to “downstream” distribution. Telecoms had this capability during the telephony era with erlangs as a rational unit of supply and demand. We lost this when we moved to packet-based statistical multiplexing. ∆Q-based “availability SLAs” are ”cloud access erlangs”. We can now performance engineer digital supply chains, create new assurance revenue and reduce the cost of workarounds and failure.
  9. 9. Requirements Challenges Opportunities Define availability SLA …in the user/customer’s own terms Availability SLA metrics …don’t sufficiently reflect QoE and aren’t properly composable/causal Capture the basic science …as ∆Q calculus is the only known scientifically sound approach Solving these reliability engineering problems needs interoperable metrics and measures 9 Decompose SLA …’vertically’ in supply chains to specify SLA requirement on each subsystem and sub-subsystem Market differentiated SLA …to win the customer’s business in ‘horizontal’ competitive market Operationally deliver SLA …to meet our service availability promises (with optimised cost and risk trade-offs for profitability) SLA operational measures …aren’t calibrated against testing and inspection reference standards SLA management models …don’t fully incorporate established and proven management theory (e.g. 6-sigma, TOC, lean, Vanguard) SLA assurance methods …require improved or new business processes across service lifecycle Create standard metrics …that accurately reflect availability in the customer’s own terms Calibrate measures …to manage error bounds, done based on shared cost and IPR Construct assurance SLAs …that have the essential “horizontal” and “vertical” interoperability properties
  10. 10. Capture the basic science …as ∆Q performance calculus is the only scientifically sound approach Create standard metrics …that accurately reflect availability in the customer’s own terms Calibrate measures …to manage error bounds, done based on shared cost and IPR Construct assurance SLAs …that have the essential “horizontal” and “vertical” interoperability properties We have a strong baseline of ∆Q research and technology that now requires industrialisation 10 Fully developed ∆Q theoretical framework. Quality attenuation theory training materials available. “Wind tunnel” for cloud apps to establish ∆Q-based SLA. Demonstrated ability to “budget” performance using ∆Q. Mature first generation ∆Q measurement system. Industrialisation and scaling (e.g. TWAMP) in progress. Contention Management technology developed and demonstrated to assure ∆Q SLA even in overload. Operational trial platform available for shared use (collaboration with Just Right Networks Ltd & SureTec Ltd). 15+ years of development by leading team of reliability engineering and distributed systems experts Proven in application at both tier 1 network operators as well as many boutique and exotic applications
  11. 11. Capture the basic science …as ∆Q performance calculus is the only scientifically sound approach Create standard metrics …that accurately reflect availability in the customer’s own terms Calibrate measures …to manage error bounds, done based on shared cost and IPR Construct assurance SLAs …that have the essential “horizontal” and “vertical” interoperability properties The next steps are clear… and interoperability by its nature is a collaborative activity 11 Develop ex ante reliability engineering curriculum from existing materials — ‘train the trainer’ and disseminate to industry Create reference ∆Q-based SLAs for common application types Construct software libraries to persist and manipulate the SLAs Frictionless deployment of packet observation end points Use high-fidelity measures to audit lower-fidelity measures Establish ‘state of the possible’ (i.e. highest yield of QoE at lowest cost and risk) to establish technical reference study that feeds into business case for industry investment in initiative. The time has come to ‘up our game’ and collectively engage with the problem of interoperable metrics This requires an industry effort that no single player can deliver as every process and system is affected
  12. 12. We help you to apply these reliability engineering breakthroughs to solve the cloud access interoperability and integration problem for distributed applications in complex digital supply chains We feed specialist skills into the development of existing industry initiatives and projects. Examples: 5G, SDN/NFV, Zero Touch Automation, UCaaS, distributed apps (inc. blockchain), machine learning We adapt established quality management methods from other industries, and drive the adoption of new scientific metrics and measures that are suitable for our own. We develop methods, standards and tools by a process of action learning. We target the full service lifecycle (product development, marketing and sales, in-life service and support). 12 Purpose of the Cloud Access Reliability Engineering Initiative 12 …performance integration technology …management methods and processes …skills, people and relationships Cloud access reliability engineering…
  13. 13. NETWORK DEMAND NETWORK SUPPLY QoE ‘SLAZARD’ High-fidelity network measures We know the ‘interoperable unit’ answer: the ∆Q calculus (see http://qualityattenuation.science/) 14
  14. 14. ∆Q(A) ∆Q(B) ∆Q(C) VA SA GA VB SB GB VC SC GC + + + + + + = = = ∑V ∑S ∑G SUPPLIER A SUPPLIER B SUPPLIER C ∆Q(∑ A+B+C) 15 Variable delay due to load Size of packet delay Geographic delay ∆Q|G ∆Q|S ∆Q|V Packet size One-waydelay G/S/V are independent probability functions using improper random variables or improper cumulative distributions. These can be (de)convolved and “budgeted” along the supply chain using (de)composable “quality SLAs”. ∆Q metrics have an algebra for engineering predictable performance (& nothing else does!) 15
  15. 15. Other metrics and measures ∆Q-based metrics and measures Be a strong proxy for QoE Yes: e.g. effective bandwidth, Actual Experience Yes Isolate problems in supply chains Partial: Correlation, but not strong causation Yes Offer an auditable evidence chain No: Would not stand up in court as standard of proof Yes Be non-intrusive Some: Only for passively observed single point average metrics; others more like DoS attacks! Yes Work for all types of bearer Some: Separate worlds of cable, 5G, SDN/NFV all doing own thing. No user-centric end-to-end view. Yes Be cheap to gather and operate Some: But high fidelity remains expensive. Yes Be non-proprietary Some: Cheap to gather data is of low fidelity Yes Have a scientific basis Partial: Have scientific basis, but limited generality Yes Able to define ‘safety margin’ Partial: Only weak proxies for safety margin Yes Can engineer spatial (location of compute, routing of data) as well as temporal (scheduling of resources) Partial: Some ability to separate static from dynamic but no formal algebra Yes SLAs are (de)composable No: Composition not a meaningful operation Yes There are other metrics and measures, some of which meet some needs, but none meet all 16
  16. 16. Martin Geddes mail@martingeddes.com

×