Non-functional testing

1,266 views
1,184 views

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,266
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
50
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • The adversary model is on the next slide.
  • Range Denial of Service/Availability Confidentiality Integrity Non-repudiation Depends on what the IA system protection provides Defines the target goals of the adversary/red team Have been very explicitly constrained in IA experiments
  • Non-functional testing

    1. 1. Part III: Execution – Based Verification and Validation Katerina Goseva - Popstojanova Lane Department of Computer Science and Electrical Engineering West Virginia University, Morgantown, WV [email_address] edu www.csee.wvu.edu/~katerina
    2. 2. Outline <ul><li>Introduction </li></ul><ul><ul><li>Definitions, objectives and limitations </li></ul></ul><ul><ul><li>Testing principles </li></ul></ul><ul><ul><li>Testing criteria </li></ul></ul><ul><li>Testing techniques </li></ul><ul><ul><li>Black box testing </li></ul></ul><ul><ul><li>White box testing </li></ul></ul><ul><ul><li>Fault based testing </li></ul></ul><ul><ul><ul><li>Fault injection </li></ul></ul></ul><ul><ul><ul><li>Mutation testing </li></ul></ul></ul>
    3. 3. Outline <ul><li>Testing levels </li></ul><ul><ul><li>Unit testing </li></ul></ul><ul><ul><li>Integration testing </li></ul></ul><ul><ul><ul><li>Top-down </li></ul></ul></ul><ul><ul><ul><li>Bottom-up </li></ul></ul></ul><ul><ul><ul><li>Sandwich </li></ul></ul></ul><ul><li>Regression testing </li></ul><ul><li>Validation testing </li></ul><ul><ul><li>Acceptance testing </li></ul></ul><ul><ul><li>Alpha and beta testing </li></ul></ul><ul><li>Non-functional testing </li></ul><ul><ul><li>Configuration testing </li></ul></ul><ul><ul><li>Recovery Testing </li></ul></ul><ul><ul><li>Security testing </li></ul></ul><ul><ul><li>Stress testing </li></ul></ul><ul><ul><li>Performance testing </li></ul></ul>
    4. 4. Configuration testing <ul><li>Many programs work under wide range of hardware configurations and operating environments </li></ul><ul><li>Configuration testing is concerned with checking the program’s compatibility with as many as possible configurations of hardware and system software </li></ul>
    5. 5. Configuration testing steps <ul><li>Analyze the market </li></ul><ul><ul><li>Which devices (printers, video cards, etc.) must the program work with? How can you get them? </li></ul></ul><ul><li>Analyze the device </li></ul><ul><ul><li>How does it work? How will this affect your testing? Which of its features does the program use? </li></ul></ul><ul><li>Analyze the way the software can drive the device </li></ul><ul><ul><li>How can you identify a group of devices that share same characteristics </li></ul></ul><ul><li>Does this type of device interact with other devices </li></ul><ul><ul><li>Test the device with small sample of other devices </li></ul></ul>
    6. 6. Configuration testing steps <ul><li>Save time </li></ul><ul><ul><li>Test only one device per group until you eliminate the errors. Then test each device in the group. </li></ul></ul><ul><li>Improve efficiency </li></ul><ul><ul><li>Consider automation. Organize the lab effectively. Create precise planning and record keeping. </li></ul></ul><ul><li>Share your experience </li></ul><ul><ul><li>Organize and share your test results so the next project will plan and test more efficiently </li></ul></ul>
    7. 7. Recovery testing <ul><li>Many computer based systems must recover from faults and resume processing within a prespecified time </li></ul><ul><li>Recovery testing forces the software to fail in a variety of ways and verifies that recovery is properly performed </li></ul><ul><ul><li>Recovery that requires human intervention </li></ul></ul><ul><ul><li>Automatic recovery </li></ul></ul>
    8. 8. Recovery testing <ul><li>Systems with automatic recovery must have </li></ul><ul><ul><li>Methods for detecting failures and malfunctions </li></ul></ul><ul><ul><li>Removal of the failed component </li></ul></ul><ul><ul><li>Switchover and initialization of the standby component </li></ul></ul><ul><ul><li>Records of system states that must be preserved despite the failure </li></ul></ul>
    9. 9. Security testing <ul><li>Security testing attempts to establish a sufficient degree of confidence that the system is secure </li></ul><ul><li>Associating integrity and availability with respect to authorized actions, together with confidentiality , leads to security </li></ul><ul><ul><li>availability - readiness for usage </li></ul></ul><ul><ul><li>integrity - data and programs are modified or destroyed only in a specified and authorized manner </li></ul></ul><ul><ul><li>confidentiality - sensitive information is not disclosed to unauthorized recipients </li></ul></ul>
    10. 10. Security testing Complexity , Realism Whiteboard Interactive analysis of hypothesis Automated May simulate human attacker or defender Semi-automated Actual human attacker or defender (team) Interactive Cyberwar Dynamic interaction between human attacker & defender
    11. 11. Security testing - Penetration testing <ul><li>Traditionally security testing is performed using penetration testing </li></ul><ul><ul><li>attempt to break into an installed system by exploiting well-known vulnerabilities </li></ul></ul><ul><li>The Red Team is a model adversary </li></ul><ul><li>Differs from a real adversary </li></ul><ul><ul><li>Attempts to limit actual damages </li></ul></ul><ul><ul><ul><li>Property destruction, information disclosure, etc. </li></ul></ul></ul><ul><ul><li>Discloses all tools, techniques, and methods </li></ul></ul><ul><ul><li>Cooperates in the goals of the experiment </li></ul></ul>
    12. 12. Penetration testing - Adversary models
    13. 13. Penetration testing - Why Red Team? <ul><li>Better identification and understanding of vulnerabilities </li></ul><ul><ul><li>Understand adversary adaptation to defenses </li></ul></ul><ul><ul><li>Understand adversary response to security response </li></ul></ul><ul><li>Evaluate system information assurance </li></ul>
    14. 14. Penetration testing - Limitations <ul><ul><li>There is no simple procedure to identify the appropriate test cases </li></ul></ul><ul><ul><li>Error prediction depends on the testers skills, experience and familiarity with the system </li></ul></ul><ul><ul><li>There is no well defined criterion when to stop testing </li></ul></ul>
    15. 15. Security testing - Fault injection <ul><li>Deliberate insertion of faults into the system to determine its response </li></ul><ul><li>Well known in the testing fault-tolerant systems </li></ul><ul><li>Secure program is one that tolerates injected faults without any security violation </li></ul><ul><li>Capability of </li></ul><ul><ul><li>automating testing </li></ul></ul><ul><ul><li>quantify the quality of testing </li></ul></ul>
    16. 16. Security testing - Fault injection <ul><li>Understanding the nature of security faults provides a basis for the application of fault injection </li></ul><ul><li>Requires the selection of a fault model </li></ul><ul><li>Selection of location for fault injection </li></ul>
    17. 17. Security testing – Fault injection <ul><li>Simulates security flaws by perturbing the internal states </li></ul><ul><ul><li>Source code must be examined by hand for candidate locations for fault injection </li></ul></ul><ul><ul><li>Identifies portions of software code that can result </li></ul></ul><ul><ul><li>in security violations </li></ul></ul><ul><li>Simulates security flaws by perturbing the input that the application receives from the environment </li></ul><ul><ul><li>Test adequacy criterion </li></ul></ul><ul><ul><ul><li>fault coverage </li></ul></ul></ul><ul><ul><ul><li>interaction coverage </li></ul></ul></ul>
    18. 18. Stress testing <ul><li>Stress testing is testing with high workload, to the point where one or more, or all resources are simultaneously saturated </li></ul><ul><li>Intention of a stress test is to “break” the system, i.e., to force a crash </li></ul>
    19. 19. Stress testing <ul><li>Stress testing does the following </li></ul><ul><ul><li>Distorts the normal order of processing, especially processing that occurs at different priority levels </li></ul></ul><ul><ul><li>Forces the exercise of all system limits, thresholds, or other controls designed to deal with overload conditions </li></ul></ul><ul><ul><li>Increases the number of simultaneous actions </li></ul></ul><ul><ul><li>Forces race conditions </li></ul></ul><ul><ul><li>Depletes resource pools in extraordinary and unthought sequences </li></ul></ul>
    20. 20. Stress testing <ul><li>Benefits </li></ul><ul><ul><li>Faults caught by stress testing tend to be subtle </li></ul></ul><ul><ul><li>Faults caught by stress testing are often design flaws that may have implications in many areas </li></ul></ul><ul><li>When to stress test </li></ul><ul><ul><li>Whenever possible, early and repeatedly </li></ul></ul><ul><ul><li>As a part of systems acceptance test </li></ul></ul>
    21. 21. Performance testing <ul><li>Objectives </li></ul><ul><ul><li>Show that the system meets specified performance objectives </li></ul></ul><ul><ul><li>Determine the factors in hardware or software that limit system performance </li></ul></ul><ul><ul><li>Tune the system </li></ul></ul><ul><ul><li>Project the systems future load handling capacity </li></ul></ul>
    22. 22. Performance testing <ul><li>Objectives can be met by </li></ul><ul><ul><li>Analytical modeling </li></ul></ul><ul><ul><li>Simulation </li></ul></ul><ul><ul><li>Measurements on the real system with simulated or real workload </li></ul></ul>
    23. 23. Performance testing <ul><li>Performance testing presumes a robust, working, and stable system </li></ul><ul><ul><li>Faults that have an impact on the system’s function have been removed </li></ul></ul><ul><ul><ul><li>Extreme example - If a fault crashes the system no rational performance testing can be done </li></ul></ul></ul><ul><li>Faults that affect performance could range from poor design to poor implementation </li></ul>
    24. 24. Examples of performance failures <ul><li>NASA delayed the launch of a satellite for eight months because Flight Operations Segment software had unacceptable response times for developing satellite schedules, and poor performance in analyzing satellite status and telemetry data </li></ul><ul><li>The Web sites of several online brokerage houses could not scale to meet unusually large number of hits following a stock market dip on October 27, 1997; customers experienced long delays in using the sites. </li></ul>
    25. 25. Performance measures <ul><li>Performance is an indicator of how well a software system or component meets its requirements for timeliness </li></ul><ul><li>Timeliness is measured in terms of response time or throughput </li></ul><ul><ul><li>Response time is the time required to respond to a request; it may be time required for single transaction, or end-to-end time for a user task </li></ul></ul><ul><ul><li>Throughput of a system is the number of request that can be processed in some specified time interval </li></ul></ul>
    26. 26. Responsiveness and scalability <ul><li>Responsiveness is the ability of a system to meet its objectives for response time or throughput </li></ul><ul><ul><li>Responsiveness has both objective and subjective component </li></ul></ul><ul><li>Scalability is the ability of a system to continue to meet its response time or throughput objectives as the demand for software functions increases </li></ul>The change from linear to exponential increase is usually due to some resource in the system nearing 100% utilization Resource requirements exceed computer and network capacity Number of request per unit of time Response time
    27. 27. Performance testing <ul><li>Prerequisites </li></ul><ul><ul><li>Clear statement of performance objectives </li></ul></ul><ul><ul><li>Workload to drive the experiment </li></ul></ul><ul><ul><li>Controlled experimental process or testbed </li></ul></ul><ul><ul><li>Instrumentation to gather performance related data </li></ul></ul><ul><ul><li>Analytical tools to process and interpret the data </li></ul></ul>
    28. 28. Performance testing <ul><li>Problems with performance objectives </li></ul><ul><ul><li>There is no statement of performance objectives, or a statement is so vague that it cannot be reduced to a quantitative measure </li></ul></ul><ul><ul><li>There is a clear quantitative statement of objectives, but it cannot be measured in practice </li></ul></ul><ul><ul><ul><li>Excessive resources and effort </li></ul></ul></ul><ul><ul><ul><li>Excessive experiment duration </li></ul></ul></ul><ul><ul><li>There is a clear quantitative statement of objectives, but the objectives are unachievable at reasonable costs </li></ul></ul>
    29. 29. Performance testing <ul><li>Performance objectives depend on the domain; acceptable response time could be </li></ul><ul><ul><li>A few milliseconds for an antiaircraft missile control </li></ul></ul><ul><ul><li>A few tens of millisecond for a nuclear reactor control </li></ul></ul><ul><ul><li>A few seconds delay in getting a telephone dial tone </li></ul></ul><ul><ul><li>Half a minute to answer DB query </li></ul></ul>
    30. 30. Complications and variations <ul><li>There is more than one type of workload </li></ul><ul><ul><li>Probability distribution for different workloads </li></ul></ul><ul><ul><li>Different objective for each type of workload </li></ul></ul><ul><ul><ul><li>Example: a response time at 4 messages per second shall be less than 2 seconds, and a response time at 8 messages per second shall be less than 8 </li></ul></ul></ul><ul><li>Performance may be intertwined with a quantitative reliability/availability specification </li></ul><ul><ul><li>Different workload-response time relations are allowed under different hardware/software failure conditions </li></ul></ul>
    31. 31. Complications and variations <ul><li>Analysis and measurement under time varying workload </li></ul><ul><ul><li>Consider different situations – peak hour, average hour, peak day, etc . </li></ul></ul>
    32. 32. Data collection tools <ul><li>Data collection tool must be an observer of the system under study </li></ul><ul><li>Its activity should not affect significantly the operation of the system being measured, that is, degrade its performance </li></ul><ul><ul><li>Acceptable overhead for measurement activities is up to 5% </li></ul></ul><ul><li>Implementation approach </li></ul><ul><ul><li>Hardware </li></ul></ul><ul><ul><li>Software </li></ul></ul><ul><ul><li>Hybrid (combination of hardware and software monitors) </li></ul></ul>
    33. 33. Data collection tools: hardware monitors <ul><li>Detects events within a computer system by sensing predefined signals </li></ul><ul><ul><li>Electronic probes can sense the state of hardware components such as registers, memory locations, I/O channels </li></ul></ul><ul><ul><li>Advantages </li></ul></ul><ul><ul><ul><li>External to the system , do not consume resources </li></ul></ul></ul><ul><ul><ul><li>Portable, do not depend on the operating system </li></ul></ul></ul><ul><ul><li>Disadvantages </li></ul></ul><ul><ul><ul><li>Do not access software related information </li></ul></ul></ul><ul><ul><ul><li>Can not be used for performance measurements related to applications or workloads </li></ul></ul></ul>
    34. 34. Data collection tools: software monitors <ul><li>Set of routines embedded in the software of the system with the aim of recording status and events </li></ul><ul><ul><li>Advantages </li></ul></ul><ul><ul><ul><li>Can record any information available to programs and operating system </li></ul></ul></ul><ul><ul><ul><li>Easy to install and use </li></ul></ul></ul><ul><ul><ul><li>Great flexibility </li></ul></ul></ul><ul><ul><li>Disadvantages </li></ul></ul><ul><ul><ul><li>Introduce overheads - use the same resources they are measuring and may interfere significantly with the system </li></ul></ul></ul><ul><ul><ul><li>Dependent on the operating system and / or programming language </li></ul></ul></ul>
    35. 35. Data collection tools: software monitors <ul><li>System-level measurements – system wide resource usage statistics </li></ul><ul><ul><li>Usually provided by software monitors that run on the operating system level </li></ul></ul><ul><ul><li>Examples of measures: global CPU utilization, global disk utilization, total number of physical I/O operations, page fault rate, total traffic through a router </li></ul></ul><ul><li>Program-level measurements – program related statistics </li></ul><ul><ul><li>Examples of measures: elapsed time, CPU time, number of I/O operations per execution, physical memory usage, application traffic (packets per second per application) </li></ul></ul>
    36. 36. Measurement mode <ul><li>Event mode – information is collected at the occurrence of specific events </li></ul><ul><ul><li>Upon detection of an event, the special code calls an appropriate routine that generates a record containing information such as date, time, type of event, and event related data </li></ul></ul><ul><ul><li>Information corresponding to the occurrence of events is recorded in buffers, and later transferred to disk </li></ul></ul><ul><ul><li>Overheads depend on the events selected, the event rate, and the data collected </li></ul></ul><ul><ul><li>Large overheads due to very high event rates are the major shortcomings for event mode measurement tools </li></ul></ul>
    37. 37. Measurement mode <ul><li>Event mode </li></ul><ul><ul><li>Logging </li></ul></ul><ul><ul><ul><li>Record all the information concerning an event (start and stop time, system state, register contents, etc.) </li></ul></ul></ul><ul><ul><li>Event counts </li></ul></ul><ul><ul><ul><li>Count the events over a specified period </li></ul></ul></ul><ul><ul><li>Event durations </li></ul></ul><ul><ul><ul><li>Accumulate event durations for specified events (dividing by corresponding event counts gives us mean event duration) </li></ul></ul></ul>
    38. 38. Measurement mode <ul><li>Sampling mode – information is collected at predefined time instants specified at the start of the monitoring session </li></ul><ul><ul><li>Sampling is driven by timer interrupts based on a hardware clock </li></ul></ul><ul><ul><li>Usually less detailed observation than event mode </li></ul></ul><ul><ul><li>Overheads depend on the number of variables measured at each sampling point and the size of the sampling interval </li></ul></ul><ul><ul><li>Since we are able to specify both factors, we can also control the overheads of the sampling monitors </li></ul></ul><ul><ul><li>There is a tradeoff between the low overhead and high accuracy of the measurement results </li></ul></ul>
    39. 39. Workload generation <ul><li>The best way for stress and performance testing of a given system is to run the actual workload and measure the results </li></ul><ul><li>Alternative – performance benchmarks </li></ul><ul><ul><li>Clear performance objective and workloads that are measurable and repeatable </li></ul></ul><ul><ul><li>Enable comparative studies of products </li></ul></ul><ul><ul><li>Vendors, developers, and users run benchmarks to accurately test new systems, pinpoint performance problems, or assess the impact of modification to a system </li></ul></ul>
    40. 40. Performance benchmarks <ul><li>Before using benchmark results, one must understand the workload, the system under study, the tests, the measurements, and the results </li></ul><ul><li>To avoid pitfalls ask the following questions </li></ul><ul><ul><li>Did the system under test have a configuration similar to actual configuration (hardware, network, operating system, software, and workload)? </li></ul></ul><ul><ul><li>How representative of my workload are the benchmark tests? </li></ul></ul><ul><ul><ul><li>Example, if you are developing new graphical application, transaction processing benchmark results are useless, because the two workloads are very different </li></ul></ul></ul><ul><ul><li>What is the benchmark version used? What are the new features included in the latest releases? </li></ul></ul>
    41. 41. Performance benchmarks <ul><li>Two consortia offer benchmarks that are commonly used for comparing different computer systems </li></ul><ul><ul><li>Standard Performance Evaluation Cooperation (SPEC) http://www.spec.org </li></ul></ul><ul><ul><li>SPEC is an organization of computer industry vendors that develops standardized performance tests, i.e., benchmarks, and publishes reviewed results </li></ul></ul><ul><ul><li>Transaction Processing Performance Council (TPC) http://www.tpc.org </li></ul></ul><ul><ul><li>TPC is a nonprofit organization that defines transaction processing and database benchmarks </li></ul></ul>
    42. 42. Stress and performance testing – QA tasks <ul><li>Include workload generation as a major budget item </li></ul><ul><li>Select workload generation methods; start workload generation development at the same time as software development </li></ul><ul><li>Plan software instrumentation in support of performance testing as a part of system design; develop, publish, and discuss embedded software instrumentation as early as possible </li></ul>
    43. 43. Stress and performance testing – QA tasks <ul><li>Tie down workload statistics and parameters as early as possible in written form </li></ul><ul><li>Start stress testing as early as possible; subject the system to stress whenever possible </li></ul><ul><li>Include stress test as a part of the formal system acceptance test </li></ul><ul><li>Accept no performance criteria that cannot be measured and validated with the allocated personnel, schedule, and budget </li></ul>
    44. 44. Stress and performance testing – QA tasks <ul><li>Plan performance testing as a specific, recognizable activity to be done on a testbed, and if necessary in the field </li></ul><ul><li>Be sure to allow enough time for system to stabilize before attempting field performance testing </li></ul><ul><li>Run performance tests intermixed with other system tests to detect faults that are performance related </li></ul>

    ×