Software System Scalability: Concepts and Techniques (keynote talk at ISEC 2009)

7,015 views

Published on

Keynote talk at the 2nd India Software Engineering Conference (ISEC 2009), Pune, India, 25 February 2009.

Published in: Technology
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
7,015
On SlideShare
0
From Embeds
0
Number of Embeds
473
Actions
Shares
0
Downloads
0
Comments
0
Likes
8
Embeds 0
No embeds

No notes for slide
  • As in any kind of analysis you are trying to answer a question We represent this question in terms of preferences and utility functions, which I ’ll explain it later. As I have mentioned scalability always have to do with the scaling or variation of application domain or machine design characteristics. We call those independent variables, which are variables that can be manipulated on the analysis. Note that not all variables will vary, therefore we further subdivide them into scaling and non-scaling. Other variables may affect scalability, but we have no control of them. We call them nuisance variables. - Experimental design to reveal the causal relationship Factors Dependent variables The analysis of dependent variables in the presence of the variation of certain factors turn an ordinary quality analysis into a scalability analysis And thus it is vague to refer simply to “the scalability of a system”; instead one must refer to “the scalability with respect to throughput”, or “the scalability with respect to latency and memory consumption”. Scalability analysis should unveil this relationship in a explicit and continuous form Any system analysis conducted with respect to a variation over a range of environmental or design qualities is a scalability analysis Performance, reliability, availability, security, etc.
  • As in any kind of analysis you are trying to answer a question We represent this question in terms of preferences and utility functions, which I ’ll explain it later. As I have mentioned scalability always have to do with the scaling or variation of application domain or machine design characteristics. We call those independent variables, which are variables that can be manipulated on the analysis. Note that not all variables will vary, therefore we further subdivide them into scaling and non-scaling. Other variables may affect scalability, but we have no control of them. We call them nuisance variables. - Experimental design to reveal the causal relationship Factors Dependent variables The analysis of dependent variables in the presence of the variation of certain factors turn an ordinary quality analysis into a scalability analysis And thus it is vague to refer simply to “the scalability of a system”; instead one must refer to “the scalability with respect to throughput”, or “the scalability with respect to latency and memory consumption”. Scalability analysis should unveil this relationship in a explicit and continuous form Any system analysis conducted with respect to a variation over a range of environmental or design qualities is a scalability analysis Performance, reliability, availability, security, etc.
  • Surrogate Key Server is critical subsystem.
  • This is a retrospective study Call attention to multi-criteria trade off: memory vs throughput
  • As in any kind of analysis you are trying to answer a question We represent this question in terms of preferences and utility functions, which I ’ll explain it later. As I have mentioned scalability always have to do with the scaling or variation of application domain or machine design characteristics. We call those independent variables, which are variables that can be manipulated on the analysis. Note that not all variables will vary, therefore we further subdivide them into scaling and non-scaling. Other variables may affect scalability, but we have no control of them. We call them nuisance variables. - Experimental design to reveal the causal relationship Factors Dependent variables The analysis of dependent variables in the presence of the variation of certain factors turn an ordinary quality analysis into a scalability analysis And thus it is vague to refer simply to “the scalability of a system”; instead one must refer to “the scalability with respect to throughput”, or “the scalability with respect to latency and memory consumption”. Scalability analysis should unveil this relationship in a explicit and continuous form Any system analysis conducted with respect to a variation over a range of environmental or design qualities is a scalability analysis Performance, reliability, availability, security, etc.
  • As in any kind of analysis you are trying to answer a question We represent this question in terms of preferences and utility functions, which I ’ll explain it later. As I have mentioned scalability always have to do with the scaling or variation of application domain or machine design characteristics. We call those independent variables, which are variables that can be manipulated on the analysis. Note that not all variables will vary, therefore we further subdivide them into scaling and non-scaling. Other variables may affect scalability, but we have no control of them. We call them nuisance variables. - Experimental design to reveal the causal relationship Factors Dependent variables The analysis of dependent variables in the presence of the variation of certain factors turn an ordinary quality analysis into a scalability analysis And thus it is vague to refer simply to “the scalability of a system”; instead one must refer to “the scalability with respect to throughput”, or “the scalability with respect to latency and memory consumption”. Scalability analysis should unveil this relationship in a explicit and continuous form Any system analysis conducted with respect to a variation over a range of environmental or design qualities is a scalability analysis Performance, reliability, availability, security, etc.
  • In hindsight, the file-based design may appear to be obviously superior to the memory-based design, but this was not at all obvious when the memory-based design was first developed. In fact, if the designs had been compared only in terms of the load at the time the memory-based system was first being developed, then the memory-based design would have been selected instead of the file-based design. Only by doing a proper analysis over the full range of the scaling dimensions are we able to select the most scalable design.
  • As in any kind of analysis you are trying to answer a question We represent this question in terms of preferences and utility functions, which I ’ll explain it later. As I have mentioned scalability always have to do with the scaling or variation of application domain or machine design characteristics. We call those independent variables, which are variables that can be manipulated on the analysis. Note that not all variables will vary, therefore we further subdivide them into scaling and non-scaling. Other variables may affect scalability, but we have no control of them. We call them nuisance variables. - Experimental design to reveal the causal relationship Factors Dependent variables The analysis of dependent variables in the presence of the variation of certain factors turn an ordinary quality analysis into a scalability analysis And thus it is vague to refer simply to “the scalability of a system”; instead one must refer to “the scalability with respect to throughput”, or “the scalability with respect to latency and memory consumption”. Scalability analysis should unveil this relationship in a explicit and continuous form Any system analysis conducted with respect to a variation over a range of environmental or design qualities is a scalability analysis Performance, reliability, availability, security, etc.
  • Software System Scalability: Concepts and Techniques (keynote talk at ISEC 2009)

    1. 1. Software System Scalability:Concepts and TechniquesDavid S. RosenblumUniversity College LondonUnited Kingdomhttp://www.cs.ucl.ac.uk/staff/D.Rosenblum/
    2. 2. Acknowledgments• Letícia Duboc• Tony Wicks• Emmanuel Letier ISEC 2009 2
    3. 3. The Concept of Scalability
    4. 4. Scalability: A Widely Used Term• The technical literature has many uses of the term – Product brochures – Research papers – Design documents – Standards specifications• But there are very few precise definitions ISEC 2009 4
    5. 5. A Typical ExampleSAP SpecificationMark Handley, Colin Perkins and Edmund Whelan, Session Announcement Protocol, RFC 2974, October 2000.• 5500 Words, Including 3 Occurrences of ‘Scalability’: – Abstract: ‘This document describes version 2 of the multicast session directory announcement protocol, Session Announced Protocol (SAP), and the related issues affecting security and scalability that should be taken into account by implementors.’ – Section on Terminology: ‘A SAP announcer periodically multicasts an announcement packet to a well known multicast address and port. The announcement is multicast with the same scope as the session it is announcing, ensuring that the recipients of the announcement are within the scope of the session the announcement describes (bandwidth and other such constraints permitting). This is also important for the scalability of the protocol, as it keeps local session announcements local.’ – Section Heading: ‘Scalability and Caching’ ISEC 2009 5
    6. 6. The Problem‘I examined aspects of scalability, but did not find a useful, rigorous definition of it. Without such a definition, I assert that calling a system “scalable” is about as useful as calling it “modern”. I encourage the technical community to either rigorously define scalability or stop using it to describe systems.’ [Mark D. Hill, ‘What is Scalability?’, ACM SIGARCH Computer Architecture News, vol. 18, no. 4, Dec. 1990, pp. 18-21.] ISEC 2009 6
    7. 7. Does This Lack of Rigour Matter?Publications with the word scalable or scalability in the title [source: Engineering Village 2] 2500 2000 1500 1980: Computer ArchitectureP ublications 1000 1988: Neural Networks 500 0 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 Year ISEC 2009 7
    8. 8. Why Does It Matter to Software Engineering?• Scalability is an important multi-dimensional concern – And engineers have difficulty reasoning about multi-dimensionality!• The dimensions exhibit highly unpredictable trends – And engineers have difficulty anticipating these trends! • Growth in users, growth in capacity requirements • New deployment needs (mergers, miniaturisation)• Software engineers have only primitive, ad hoc techniques to address scalability concerns ISEC 2009 8
    9. 9. Some Typical Notions of Scalability• Performance – High throughput, low latency• Parallel speedup• Tractability of algorithms – Polynomial versus Exponential• Testing versus Verification• State spaces in model checking• Linear growth in resource usage – What about quicksort??? ISEC 2009 9
    10. 10. Scalability of What?• The running system?• The software design?• The number of users?• Something else?• All of the above??? ISEC 2009 10
    11. 11. Characterising and Analysing Scalability
    12. 12. So What Is Scalability?• Scalability is a quality of a software system characterising its ability … – to satisfy its quality goals … – to levels that are acceptable to its stakeholders … – when characteristics of the execution environment … – and the system design … – vary over expected ranges. Scalability is thus a meta-quality of other system qualities ISEC 2009 12
    13. 13. A Scalability FrameworkAs a Form of Experimental Design scaling non-scaling design environment system execution system behaviour dependent independent variables govern determine variables system qualities environment and design characteristics ISEC 2009 13
    14. 14. ExampleGoogle Search Engine• Most people would agree that Google is scalable – Dramatic growth in the size of the Web – Dramatic growth in the rate of queries to Google – Yet a virtually constant response time for users• It’s a naturally parallelisable problem – Implemented as a cluster of commodity PCs – Cluster increased as Web and query load increase ISEC 2009 14
    15. 15. The Scalability FrameworkAs Exemplified by Google scaling non-scaling Google is scalable with respect to response time design environment size of network system execution Web latency response queries per available system time second bandwidth because it maintains a constant response timeI/O usage as the behaviour govern determine cluster number of queries per second choice of price per size algorithms and the number of Web pages scale over performance time, system qualities environment and design characteristics by increasing the number of machines in the cluster ISEC 2009 15
    16. 16. A Real-World Case Study
    17. 17. Case StudyFortent Data Analysis System• Intelligent Enterprise Framework (IEF) – Overnight analysis of transactional data to identify unusual and possibly fraudulent patterns of bank and credit card transactions – Java - 1,556 classes - 326,293 lines of code• Surrogate Key Server (SKS) Component BE BE BE SK SK SK BE BE BE replace business SK SK SK entity identifiers BE BE BE SK SK SK BE BE BE BE SK SK SK SK BE SK batches of BE SK injected transactions on surrogate keys business entities entity-key mapping ISEC 2009 17
    18. 18. Case StudySKS Implementation Details• Scalability problem: support a growing number of business entities in overnight batches, while maintaining throughput and memory usage within acceptable levels• First Generation Design (year 2000) – In-memory cache – High storage overhead, eventually crashing system• Second Generation Design (year 2003) – Disk-based cache for high-volume business entities – In-memory cache for low-volume business entities ISEC 2009 18
    19. 19. Scalability of IEF’s SKSCharacterisation scaling non-scaling design environment system execution number of average business throughput entities system behaviour memory memory govern determine usage number cache vs of disk cache disk usage threads JVM heap size system qualities environment and design characteristics ISEC 2009 19
    20. 20. Scalability of IEF’s SKSAnalysis in Terms of Microeconomics scaling non-scaling distinct design environment behaviours system execution number of average business throughput entities system behaviour manipulate memory memory over ranges govern new prototype determine usage number cache vs vs of disk cache old raw data disk usage implementation threads JVM size measure system qualities environment and design characteristics preference functions t(), m(), d() utility function Design Comparison preference values 10t()+10m()+d() ISEC 2009 20
    21. 21. Case StudyPreferences and Utility• Throughput preference ∧ -1, if x < 100 t(x) = x – 100 , otherwise 400 – 100• Heap usage preference ∧ -1, if y > 500 h(y) = • System utility ∧ ∧ ∧ 500 – y , otherwise U(x,y,z) = 10 t(x) + 10 h(y) + d(z) 500 – 0 21• Disk usage preference ∧ -1, if z > 24 d(z) = 24 – z , otherwise 24 – 0 ISEC 2009 21
    22. 22. Scalability of IEF’s SKSAnalysis Results ISEC 2009 22
    23. 23. Scalability Requirements
    24. 24. Where Do the Variables and Preferences andUtilities Come From?• They must come from system stakeholders – Are able to identify important scalability variables – But like to think in terms of simple bounds • Rather than the underlying functions that relate them – And are usually poor at estimating those bounds • Typically underestimate system load and system lifetime• Goal-Oriented Requirements Engineering can be used to elicit Scalability Requirements – KAOS Method [van Lamsweerde, Letier] ISEC 2009 24
    25. 25. The Scalability FrameworkIn the Context of Requirements Engineering Scalability identify and bound Goals identify and bound scaling non-scaling design environment system execution system behaviour dependent independent variables govern determine variables ISEC 2009 25
    26. 26. Goal-Oriented Requirements EngineeringAs Exemplified by IEF Goal Fraudulent Transactions Handled AND-RefinementSub-Goal Obstacle Fraudulent Transactions Acted Upon Fraudulent Transactions Detected Quickly Requirement nt Transactions Not Expectation Fraudule Acted Upon Bank … IT Team … Batch Processed Overnight Obstacle Refinement Agent Too Many Alerts IEF for IT Team Sub-Obstacle Alert Generator Agent ISEC 2009 26
    27. 27. Scalability Requirements• A scaling assumption is a goal specifying how some quantity in the application domain is assumed to vary over time or system variants• A scalability goal is a goal specifying the required levels of satisfaction under variations specified in associated scalability assumptions• A scalability obstacle is a condition where the load imposed by a goal exceeds the capacity of the agent assigned to the goal We can use goal-obstacle analysis to elicit these ISEC 2009 27
    28. 28. Goal-Obstacle Analysis of IEF Batch Processed Overnight Scalability Obstacle Scaling Assumption Scalability Requirement Batch Siz eIs Unbou Batch Processed Overnight for Expected Batch Size Variation nded Expected Batch Size VariationAssumption Expected Batch Size Variation IEFInstance of scaling assumption Number of transactions exceeds Alert GeneratorDefinition Over the next three years, daily Alert Generator processing speed Resolution Tactic:batches for all customers are expected to Introduce scaling assumptionhave between 50,000 and 300 million mitigatestransactions Adapt Alert Generator Processing Speed at Runtime Resolution Tactic:Dynamically adapt agent capacity Accurate Batch Size Prediction Alert Generator Processing Speed Above Maximum Predicted Batch Size Fortent Bank IT Team ISEC 2009 28
    29. 29. Goal-Obstacle Analysis Summary• Can now elicit scalability requirements for Goal-Oriented Requirements Engineering – Identify the key independent and dependent variables – Identify scalability obstacles – Resolve scalability obstacles – All precisely and quantitatively• What’s Missing? – Agent Load Feasibility Analysis – Cost/Benefit Analysis of Obstacle Resolutions – Testing Scalability Requirements ISEC 2009 29
    30. 30. Conclusion
    31. 31. Summary• Scalability is an important software quality• But it has been poorly understood – And it’s not just about performance!• A proper characterisation of a system’s scalability must be qualified with reference to relevant independent and dependent variables• And these should be derived through a precise elicitation of scalability requirements ISEC 2009 31
    32. 32. Thank you!http://www.cs.ucl.ac.uk/staff/D.Rosenblum/

    ×