PEnDAR webinar 2 with notes

223 views

Published on

Second webinar from the PEnDAR project, going into more detail about validating quantified performance requirements with a worked example.

Published in: Technology
  • Be the first to comment

PEnDAR webinar 2 with notes

  1. 1. © Predictable Network Solutions Ltd 2017 www.pnsol.com PEnDAR Focus Group Performance Ensurance by Design, Assuring Requirements Second Webinar
  2. 2. © Predictable Network Solutions Ltd 2017 www.pnsol.com Recap Previous webinar recording at https://attendee.gotowebinar.com/register/1615742612853821699 Also on SlideShare: https://www.slideshare.net/pnsol-slides
  3. 3. © Predictable Network Solutions Ltd 2017 www.pnsol.com 3 Goals • Consider how to enable Validation & Verification of cost and performance • For distributed and hierarchical systems • via sophisticated but easy-to-use tools • Supporting both initial and ongoing incremental development • Provide early visibility of cost/performance hazards • Avoiding costly failures • Maximising the chances of successful in-budget delivery of acceptable end-user outcomes Partners Supported by: PEnDAR project
  4. 4. © Predictable Network Solutions Ltd 2017 www.pnsol.com 4 Our offer to you Learn what we know about the laws of the distributed world: • Where the realm of the possible lies; • How to push the envelope in a way that benefits you most; • How to innovate without getting your fingers burnt. Get an insight into the future: • Understand more about achieving sustainability and viability; • See where competition may lie in the future. Our ‘ask’ Help us to find out: • In which markets/sectors/application domains will the benefits outweigh the costs? • Which are the major benefits? • Which are just ‘nice to have’? • How could this methodology be consumed? • Tools? • Services? Role of the focus group
  5. 5. © Predictable Network Solutions Ltd 2017 www.pnsol.com Outline of a coherent methodology
  6. 6. © Predictable Network Solutions Ltd 2017© Predictable Network Solutions Ltd 2017 www.pnsol.com 6 Outcomes Delivered Resources Consumed Variability Exception /failure Externally created Mitigation System impact Propagation Scale Schedulability Capacity Distance Number Time Space Density
  7. 7. © Predictable Network Solutions Ltd 2017 www.pnsol.com 7Performance/resource analysis Sub- system Sub- system Sub- system ∆Q Shared resources Starting with a functional decomposition: • Take each subsystem in isolation • analyse performance • modelling remainder of system as ∆Q • quantify resource consumption • may be dependent on the ∆Q • Examine resource sharing • within system – quantify resource costs • between systems – quantify opportunity cost • Successive refinements • consider couplings • iterate to fixed point Quantitative Timeliness Agreement Nottoday
  8. 8. © Predictable Network Solutions Ltd 2017 8 www.pnsol.com Recap of ∆Q • ∆Q is a measure of the ‘quality impairment’ of an outcome • The extent of deviation from ‘instantaneous and infallible’ • Nothing in the real world is perfect so ∆Q always exists • ∆Q is conserved • A delayed outcome can’t be ‘undelayed’ • A failed outcome can’t be ‘unfailed’ • ∆Q can be traded • E.g. accept more delay in return for more certainty of completion • ∆Q can be represented as an improper random variable • Combines continuous and discrete probabilities • Thus encompasses normal behaviour and exceptions/failures in one model
  9. 9. © Predictable Network Solutions Ltd 2017 www.pnsol.com 9 • Decompose the performance requirement following system structure • Using engineering judgment/best practice/cosmic constraints • Creates initial subsystem requirements • Validate the decomposition by re- combining via the behaviour • Formally and automatically checkable • Can be part of continuous integration • Captures interactions and couplings • Necessary and sufficient: • IF all subsystems function correctly and integrate properly • AND all subsystems satisfy their performance requirements • THEN the overall system will meet its performance requirement • Apply this hierarchically until • Behaviour is trivially provable OR • Have a complete set of testable subsystem verification/acceptance criteria Validating performance requirements
  10. 10. © Predictable Network Solutions Ltd 2017 www.pnsol.com Quantitative worked example
  11. 11. © Predictable Network Solutions Ltd 2017 www.pnsol.com 11Generic RPC Front end Back endNetwork 6 Transmit Response 2 Send request 4 Process request 5 Commit results 1 Construct transaction 7 Receive Response 0 Idle 8 Complete transaction 3 Receive request
  12. 12. © Predictable Network Solutions Ltd 2017 www.pnsol.com 12Generic RPC - observables Front end Back endNetwork 6 Transmit Response 2 Send request 4 Process request 5 Commit results 1 Construct transaction 7 Receive Response 0 Idle 8 Complete transaction 3 Receive request A E D CB F ∆Q(A⇝B) ∆Q(B⇝C) ∆Q(C⇝D) ∆Q(D⇝E)∆Q(E⇝F)
  13. 13. © Predictable Network Solutions Ltd 2017 www.pnsol.com 13Generic RPC – abstract performance Front end 1 Construct transaction 0 Idle 8 Complete transaction A E B F Network 6 Transmit Response 2 Send request 7 Receive Response 3 Receive request D C ∆Q(C⇝D )
  14. 14. © Predictable Network Solutions Ltd 2017 www.pnsol.com 14Generic RPC – abstract performance Front end 1 Construct transaction 0 Idle 8 Complete transaction A E B F ∆Q(B⇝E )
  15. 15. © Predictable Network Solutions Ltd 2017 www.pnsol.com 15Generic RPC – abstract performance 0 Idle A F ∆Q(A⇝F)
  16. 16. © Predictable Network Solutions Ltd 2017 www.pnsol.com 16 • User performance requirement is a bound on ∆Q(A⇝F), e.g. • 50% within 500ms • 95% within 750ms • etc. • Decompose this as ∆Q(A⇝F) = ∆Q(A⇝B) ⊕ ∆Q(B⇝E) ⊕ ∆Q(E⇝F) • How these terms combine depends on the behaviour • Will examine the interaction of performance and behaviour next Performance requirement
  17. 17. © Predictable Network Solutions Ltd 2017 www.pnsol.com 17 • Since the request or its acknowledgement may be lost, the process retries after a 333ms timeout • Failure mitigation • After three attempts the transaction is deemed to have failed • Failure propagation • Can unroll the loop for a clearer picture Front-end behaviour (1) A E B F
  18. 18. © Predictable Network Solutions Ltd 2017 www.pnsol.com 18Front-end behaviour (2)
  19. 19. © Predictable Network Solutions Ltd 2017 www.pnsol.com 19 • We have a non-preemptive first- to-finish synchronisation between receiving the response and the timeout • Which execution path is taken is a probabilistic choice depending on the ∆Q(B ⇝E) • We combine the ∆Qs of the different paths with these probabilities Assumptions: For an initial customer deployment over UK broadband with the server in the US East Coast: • ∆Q(B⇝C) and ∆Q(D⇝E) the same: • Minimum delay 95ms • 50ms variability • 1-2% loss • ∆Q(C⇝D) distributed between 25ms and 60ms Performance analysis (1)
  20. 20. © Predictable Network Solutions Ltd 2017 www.pnsol.com 20Performance analysis (2) ∆Q(C⇝D) ∆Q(B⇝E) ∆Q(A⇝F) Requirement
  21. 21. © Predictable Network Solutions Ltd 2017 21 www.pnsol.com Verification • So far, the verification task seems straightforward: • Provided the network and server performance are within the given bounds, then the front end will deliver the required end-user experience • However, this seems to have plenty of slack • Can we validate a different set of requirements? • How far can we relax them? • With this model we can explore: • Varying the network performance • Varying the server performance (e.g. as a result of load)
  22. 22. © Predictable Network Solutions Ltd 2017 www.pnsol.com 22Performance analysis (3) ∆Q(C⇝D ) ∆Q(B⇝E) ∆Q(A⇝F) Requirement
  23. 23. © Predictable Network Solutions Ltd 2017 www.pnsol.com 23 • Virtualised server has a limit on the rate of I/O operations • So that the underlying infrastructure can be reasonably shared • Such parameters affect the OPEX • Typically access is controlled via a token-bucket shaper • Allows small bursts • Limit long-term average use • This constitutes a QTA • Need to verify both sides • Delivered performance of the virtualised I/O thus depends on recent usage history • Load is a function of number of front-ends accessing the service and the proportion of re-tries • Re-tries depend on the ∆Q(B ⇝E) – which in turn depends on the performance of the I/O system! • This has been modeled here via a probability of taking a ‘slow path’ in the server Server resource consumption
  24. 24. © Predictable Network Solutions Ltd 2017 www.pnsol.com 24Performance analysis (4) ∆Q(C⇝D ) ∆Q(B⇝E) Requirement ∆Q(A⇝F)
  25. 25. © Predictable Network Solutions Ltd 2017© Predictable Network Solutions Ltd 2017 www.pnsol.com 25 Outcomes Delivered Resources Consumed Variability Exception /failure Externally created Mitigation System impact Propagation Scale Schedulability Capacity Distance Number Time Space Density Impact of retry behaviour on delivered ∆Q Impact of retry behaviour on server load Effect of server loading on delivered ∆Q Server I/O constraint triggered by load Coupling of retries/load/resp onse delay
  26. 26. © Predictable Network Solutions Ltd 2017 www.pnsol.com Relation to challenges in ICT E.g. virtualisation
  27. 27. © Predictable Network Solutions Ltd 2017 www.pnsol.com 27 Creates: • New digital supply chain hazards • Loss of mitigation opportunities • Aspects of performance are no longer under direct control • Additional ∆Q • As seen in example • This will be ‘traded’ to extract value • Who benefits? The user or the infrastructure provider? Now need to consider: • Ongoing resource costs • OPEX rather than (sunk) CAPEX • Opportunity costs • Amazon Web Service pricing of compute instances, SQS and lambda micro-services shows engagement with opportunity cost issues Virtualisation
  28. 28. © Predictable Network Solutions Ltd 2017 www.pnsol.com 28 Need quantitative intent • At the very least, some indication of what is really not acceptable • Comparison with something else (that is measurable) will do • But not “the best possible” or “whatever will sell” Can approximate best possible ∆Q • Use to quickly check degree of feasibility • Add up minimum times • Means and variances also add • Puts bounds on future improvement • Early visibility of performance hazards • Early mitigation General ICT application
  29. 29. © Predictable Network Solutions Ltd 2017 29 www.pnsol.com Summary • System performance validation consists of: • Analysing the interaction of system behaviour and subsystem performance requirements • Showing that this ‘adds up’ to meet the quantified user requirements (QTA) • System performance verification consists of showing that subsystems meet their QTAs • By analysis and/or measurement of ∆Q of the subsystems’ observable behaviour • This provides acceptance and/or contractual criteria for third-party subsystems or services • Substantially reduces performance integration risks and hence re-work
  30. 30. © Predictable Network Solutions Ltd 2017 www.pnsol.com What now ?
  31. 31. © Predictable Network Solutions Ltd 2017 31 www.pnsol.com The questions Given the right inputs, this sort of analysis takes relatively little time and effort, however: • Quantitative intent is fundamental • Where can this be found? • Which parts of the system decomposition are forced? • E.g. use of existing infrastructure, such as a network • How to handle subsystems with undocumented characteristics? • Can mitigate this with in-life measurement of ∆Q driven by the validation • Forewarned is forearmed • Inexpensive, focused data rather than wide-angle ‘big data’ approach
  32. 32. © Predictable Network Solutions Ltd 2017 www.pnsol.com Thank you!

×