Introduction This is work in progress and is in a draft state. It however contains enough information to be initially helpful. The prime focus is to allow items to be noted that need to be included within project plans and assistance in estimating.
Scope This slide pack assumes the creation of an application that is then integrated onto a hardware platform. Testing of the hardware platform itself and the integration testing for this is not included at present within this draft pack.
Contents Overview Budgeting for Tooling Design and Build Application Security and DDA Testing Test Application and Integration Phase Working with Widgets and GUI Interfaces Estimating Manual Test Run Time Estimating GUI Automation Testing Methods for Managing, Prioritising Regression Testing Techniques for Deciding Sufficient Testing Done Estimating Testing for Protective Monitoring Testing Mobile interface testing Acceptance Testing (FAT, SAT, UAT, OAT)
Overview This collection of notes was compiled to provide guidance for new Test Managers and Project Managers in the art of estimating test effort, with Overview particular attention made to the activities that are usually forgotten about and which leads to stresses and pressures on a test team and potentially risk to successful delivery. It needs to be understood that while there is an attempt to bring a science to the process, there will also be special factors to take into account which will require additional effort. The aim of these notes is to prevent underestimation and so reduce the risk of late delivery or poor test coverage, due to a test team being under resourced. 02/01/2013 5
Targeting within Test Strategy Within the test strategy, be clear as to what is being tested at each phase. Unit code tests will test the implementation of the understanding of the requirements at a unit code level. The logic is being tested as understood and implemented. Error conditions and error handling should also be tested. The creation of stubs and drivers plus the creation of test data will be necessary. Typically white box test analysis is used as per BS7925-2. At a functional level, the application integration is being tested. Again functionality is tested but at a black box techniques are used as per BS7925-2. End to end functional stories are tested. However these scenarios need to be clearly documented with clear expectations. System level testing concerns the integration of the application with the main system, in addition to security and performance testing. Enterprise monitoring testing may be required. Performance monitoring testing may be required. PKI certificate testing may be required. Penetration testing is a security audit. User testing may include W3C compliance and testing against functional requirements. Site testing concerns itself with site specific targeted tests. Operational testing is concerned with maintenance and support of the system.
Good Enough? In testing we need to make a call as to what is good enough. This will depend upon commercial and product risk. The level of testing required will be defined within the project Test Policy. For the purposes of this presentation, we look at a typical large project (£3 million) and point out key things that are often forgotten in estimates. While in some projects this will be an overkill, the bigger danger is to assume an overkill, miss key points and end up with a project overspent, behind schedule and even causing company failure.
Rewards and Approach A common problem for testers is under resourcing. Project managers typically estimate testing as a small or larger fraction of development time. However this is frequently incorrect. Other pressures are based on the delivery project manager getting the product out to meet a bonus payment based upon saving development costs. However when the project is handed over to the support phase this could incur far greater costs and risk to the company reputation. The key point is not to reward wrong behaviour. These slides focus on producing realistic time estimations with a responsible view to quality. However if dealing with safety or highly sensitive commercial targets then this will reflect the minimum time scales. On the other hand if one is producing a simple web project with no commercial impact if this fails and no risk to commercial reputation or life, then far less testing will be required.
Is it not simple? One approach that is often used as a first finger in the air is to say that the Test Effort is between 50% and You can view the 75% of the development effort. Arian lift off at: This however does not always work because: http://www.youtube. Development may assume use of existing code that is com/watch?v=gp_D integrated. Assuming existing code does not need integration testing can be dangerous (e.g. this is why the 8r-2hwk) Arian rocket, delivered to project schedule, exploded shortly after lift off). This is a good Many test tasks are traditionally underestimated or forgotten and the broad approach existed when systems example of a were less complex. project delivered It may potentially need for some systems more test effort than development effort. It may not however become to schedule, but obvious till the team are under pressure and it is too late not performing to bring additional staff onto the project. Testing version of tools (e.g. Visual Studio) are more full test analysis. expensive than the cut down development versions. Hence the answer is No it is not that simple and here are some notes to help.
Budgeting for Tooling This part examines cost implications for test tooling
Off the Shelf Test Tooling and Hardware Typical project Tooling To be included in a bid: Test management tooling for each tester. Access to tooling to track requirements to tests to defects. If a .Net Project Then For Testers, project will require Visual Studio Ultimate licence each. For Developers will want to ensure that they have access to FxCop and ReSharper, within their development versions of Visual Studio. If a large programme will want to consider using HP Quality Centre tool set. For small to medium sized projects will want to use cheaper tools such as Atlassian. If running PKI certificate licensing tests, will need to budget for additional licences that can be cancelled and revoked. Is video evidence required for testing – do we need a capture tool? Do we need emulators and real devices such as mobile phones for testing? There may also be security checks to make before testing can start. Load and Performance test tooling. May require additional licences for each platform and will need to provide a licence that allows for appropriate level of testing for stress. The load and performance test platform needs to be comparable and scalable. If resources are too low, then the results may not be scalable and it may be difficult to run tests with even a small number of users so not allowing a realistic trend to be identified with any certainty over margins of error. This in turn can cause considerable problems at SAT and OAT and cause a project to be delivered late or with performance errors. Code coverage tools and other analysis tooling, such as Coverity, may be required for large programmes. Are checking tools required (e.g. for DDA / W3C, Usability, etc).
Customised Test Tooling There may be the need to develop internal test tooling for a project. This can typically include: Methods for generating large amounts of data. Methods for comparing or validating large amounts of data. Test Stubs and Test Drivers. Approaches for extracting data, including extracting data from test tooling. Budget for design, build, review, test and verification of the tooling and documentation, configuration control and support for the tooling.
Test Manager Budget for Test Manager: 10% of time per day per test team member. If more than 5 individuals, then need to consider breaking team down to include team leads. 1 day per week dealing with other teams, project manager, Development manager, security testing, system architect, etc. 0.5 day per week rising to 1 day per week in various meetings. 1 hour per day preparing short reports and extracting data – Task can be done by new graduate or technician under Test Manager guidance. Reviewing Development and other test documentation – 1 hour per document per version. Assume 2 versions. Reviewing Requirements – 1 day per 100 requirements (assuming all single logical statements). Dealing with customer and customer issues – 1 day per week. Reviewing Test Analysis – 1 day per 100 requirements. Reviewing Test Script coverage – 1 day per 100 test cases. Reviewing Test Coverage – 1 day per regression run. Dealing with Hosting Company – 1 day per week. Dealing with other Stakeholders – 1 day per week during and leading up to integration, plus during SAT and OAT, this will increase to 1 day per stakeholder. This may need to be covered by additional test support (Principle Consultant level), if many Stakeholders. Writing documentation Who is chairing the code reviews? If the Test manager is responsible, then this needs to be budgeted. One hour per review, with one hour preparation and one hour for minutes and actions. Also need to budget in other review team member effort at one hour per meeting and potentially longer than one hour preparation time.
Key Test Documents The following is just general guidance for writing (not reviewing) and time can increase, less likely to decrease (Principle / Senior level staff): Test Strategy – 2 weeks Functional Test Plan – 2 weeks Function Test Specification (when required) – 4 weeks Non Functional Test Plan (if required) – 2 weeks Security Test Plan – 2 weeks Load and Performance Test Plan – 2 weeks Integration Test Plan – 2 weeks Enterprise Monitoring (EM) Test Plan for SAT – 2 weeks Protective Monitoring (PM) Test Plan for SAT – 4 weeks Site / System Acceptance Plan (in addition to EM and PM) – 2 weeks Operational Acceptance Test Plan – 2 weeks Test harness definitions (includes stubs, drivers, etc) – 2 weeks Documentation for other special test tooling – 2 weeks each. Input to support testing related work with training material – 2 weeks Covered Separately (Technician / Recent Graduate): Test Analysis Test Cases Test Scripts (Manual and Automated) Test data created Note: For large projects and programmes additional time may be required.
Design and Build Phase This part examines testing during the design and build phase.
Assumptions and Risks Before we even consider estimating, we need to consider the quality of the requirements. Defects will leak into the system at requirements level. Poor requirements or poorly constructed requirements will mean that considerable overhead on testing will occur. This will mean that additional resources will be required to help administer the test team and to specifically help in preparing reports and measure test effectiveness. For poorly constructed requirements then an additional person at consultant level will be required for the duration of the project. OR The requirements need to be checked. If requirements are poorly constructed then it may be necessary to break these down into sub-requirements and link to user stories, which will need matching to Test Cases and match the Test Cases to test Scripts. This results in: Need for specialist requirements management tooling and the need to link this with specialist commercial test tooling. Need for additional resources to develop, review and manage requirement improvements
Requirements Tests are delivered against Requirements. It is important to check that: There are no Requirement Gaps. There are no Requirement Conflicts. Derived engineering requirements are well understood and documented. There are no blatant Errors in Requirements. There is no Lack of Detail in terms of valid ranges and expectations of behaviour when error conditions arise. Details concerning security have been identified. Specifications have been checked as if subsets of requirements – do not assume that these will be all complete and correct. Where Browsers are defined, are all tests to be repeated for each browser, or can we prioritise and distribute tests across different browsers. E.g. 100% of tests on Firefox, 90% on IE 6.0, 25% on other. This needs to be reflected in test run time effort calculations. Attention to pop-up behaviour is often different across browsers. Where interfaces to other systems are present and the requirement interface is not proven, poorly (or unreliably) documented, then additional resource will be required.
Early intervention Resource required to check requirements for testing. Needs to be at Principle / Senior staff level. As a general rule 5 to 10 days investment with requirement review by an appropriate test architect can typically save 2 man months of effort later, if advice is adopted.
Defect Cost Implications Defects slip in at the Cost of Fault by Phase requirements phase and £25.0 grow. The later the detection the Cost per Fault (£K) £20.0 £15.0 greater the cost to detect, Cost per Fault (£K) fix and retest. £10.0 It is not a choice of being £5.0 able to afford early test £- intervention in checking t st ts gn es ng se requirements. Te en lT U i i es od m em d na D C ire el It is a fact that early tio Fi st u Sy eq nc Fu R Project Phase intervention: Saves money Prevents project overrun Around 10% of defects are seeded in a Reduces development project at the requirements phase. Late and test effort. detection means longer time to delivery of Improves Development project and greater costs. delivery
Test Management of Requirements It is not enough for requirements to have sufficient content to be testable. They also need to be manageable. Tests are mapped to individual requirements. Requirements need to be structured as individual single logical statements. Failure to do this will mean that many tests are required to sign off many attributes of a requirement. As a project grows, it becomes necessary to cut tests down to a manageable set of regression tests based upon risk assessment. With multiple embedded requirements this creates difficulties and introduces the risk of a requirement attribute not being delivered, containing defects that are not tested and can mean that critical defects go undetected. It is vital that requirements are structured to be a single logical statement with a separate reference number. Logical statements normally do not have the words OR / AND within the statement. If this structuring is not done, then there will be a requirement to have additional effort required to maintain the test reporting tool. If the solution is to create User Stories, then these will need to be managed, reviewed and there may be issues in extracting information from tools such as Visual Studio.
Early Testing Effort Testing applies equally to coding as well as System Testing. To cut defect leakage early on, it is vital that code is: Reviewed against best practice check lists. Checked early for security impacts Checked early for performance issues Checked against tools such as ReSharper and FxCop – This requires configuration and build control effort from the Development Team with the necessary resources to run tests and analyse output early on. This will save time in System Testing effort and help to speed development. To avoid false reporting, ensure tools configured correctly (allow 5 man days for configuration and setup). Resource to ensure: Adequate Static testing to include: Review of Code Running of static tooling
Review of Code Effort It is vital that code reviews are adequately resourced. Reviews need to be effective and so the review rate needs to be considered. Too fast and defects will leak through increasing the overall project cost. Too long and the review become ineffective and people become blinded by lines of code. Reviews need to be resourced, regular, guided and targeted. A review period of 1 hour to 2 hours max is most efficient and reviews longer than 2 hours need to be broken down into targeted focused chunks. Or give individuals specific areas to review. Review rate may be around 1KLOC/hour. Time also needs to be allocated for static review of documents and diagrams. Static test tools can help add confidence to a code review and will (if set up and used correctly) will add value to a review, but should not be used to replace a code review.
Code review activities If reviewing code in a closed meeting, comments by one reviewer will typically inspire comment from another reviewer. If reviewing code using tools to support remote reviews, then the first reviewers will miss comments from other reviewers. Hence it is important for parties to go back over comments when all comments are collected. Review tasks should be set for individuals. Typically these will be supported by project check lists and will include: Use of good coding practice; Code efficiency / Performance; Code security; Consistency with requirements; Consistency with interfaces and other code modules. Any module being interfaced with should have an assigned individual representing that module to check compliance.
System Architecture This has impact for testing of: Security Load and Performance Ensure that the security test team and performance tester have early input to the design. This review needs to be budgeted.
Application Security and DDA Testing This details points that often need testing and can get missed out
Security Scenarios While the system will be subjected to security testing, do not forget to test the application as soon as possible. This needs to be budgeted and resourced. Scenarios need to be put in place for: Ensuring that SQL injection cannot be used. One test per field Ensuring that URL injection cannot be used on secure web pages. Check timeout of logins Check success of logout and try the back button.
Disability Discrimination Act (DDA) Testing While the world wide web consortium (W3C) has tooling to check web sites, this may not be usable on sites prior to go-live. Consequently if developing on an air gaped system, testing for disability can be more involved. The level of DDA adherence will vary under contractual agreement. However one should ensure that the following is tested as minimum good practice and this will need resourcing and budget: Check for Blue / Green colour contrast not being present. Check for Red / Brown colour contrast not being present. Check for Green / Brown colour contrast not being present. Check that images and logos have alternate text for web pages. Check if a web page reader will actually read within a column before moving to the next column and not just read in turn the top line of each column before moving to the next line of each column. Check that fonts in browsers can be resized. So a page does not restrict access for those with poor eyesight. Allow time for scripting and running these extra tests.
Test Application and Integration Phase This part examines test related activity during the early testing of an application and during integration.
Test Analysis Effort Having had time to read documentation and understand the design then the test analysis will be required to identify test cases. There are a range of techniques such as those detailed in BS7925-2 plus methods like Classification Tree. The CT method comes with a tool Classification Tree Editor (CTE), which can help to group tests and cut test effort. In practice for estimation, this will help to provide a margin of error to avoid underestimation of testing. This assumes however that the system under test is not safety critical. If it is safety critical, the free CTE tool in a different mode will help to ensure that test cases are less likely to be missed. For large projects there is a commercial version of the CTE tool, which is worth consideration. The CTE tool also interfaces with the HP tool set. Allow at least 15 minutes per single logical requirement for the analysis phase of testing.
Manual Test Scripting Effort To create a test script from a test case, allow for each logical requirement: 10 minutes to write test setup phase. 5 minutes per step, which will equate to values to be entered (taken from the test case). 10 minutes to write the end of the test and check the test sanity and ensure the test is under configuration control. As a general rule a manual test takes around 30 minutes to write per test case. NOTE: Test cases need to be reviewed. One way to check the sanity of a test is to run it the first time using another tester. HOWEVER the test case set needs to be reviewed for test coverage and effectiveness and this can take around 5 to 10 minutes per test.
Test Case and Script Traps There is a risk that test cases and scripts may miss key opportunities to test during intervening steps. So for each step assess what needs to be checked and referenced. Do not focus only on the final state. If using end to end scenarios for functional testing, then check that the requirements fully document the required actions. Failure to document the requirement flows fully can lead to inadequate testing. Check that the requirement authors are involved in reviewing test cases and scripts.
Estimation First Principles It is assumed that all requirements are in single logical statement. If a statement refers to a standard or other sets of requirements then the relevant requirements need to be identified as single statements. There are a range of test analysis techniques (e.g. Classification Tree and approaches in BS7925-2). For a simple approach one would need to consider the boundary analysis technique. This would run tests with values between boundaries A and B. The tests that one would use would therefore be: Far below Boundary A (can include negative numbers) Just below boundary A On boundary A Just above boundary A Mid point between boundary A and B. (Not always tested, but recommended) Just below boundary B On boundary B Just above boundary B Far above boundary B Special case of value 0 Illegal value (e.g alpha, special characters, etc, when expecting a numeric value). So for each single statement requirement there are a minimum of 11 tests. As a general rule this is a good starting point for estimating.
Pair-wise and Orthogonal Array Pair-wise relies upon 2 variable combinations creating defects that a single change would not produce. So assume 3 inputs (factors), each having a state of 1 and 2 (ie 2 levels). We would test 4 cases (Runs): I/P 1 I/P 2 I/P 3 Case 1 1 1 1 Case 2 1 2 2 Case 3 2 2 1 Case 4 2 1 2 Hence while thorough this can reduce the test cases from the 8 possible test cases. Orthogonal Arrays take the Pair-wise analysis further and is out of the scope of this slide set.
End to End Tests End to End Tests are used to check an application and system from a full user perspective. The end to end business rules will be defined within the requirements and as a general rule allow 30 minutes scripting per rule, which needs to include both positive successful end to end cases and cases where the process will lead to testing error handling. Both sets need to be identified in the count for estimation.
Working with Widgets and GUI Interfaces This part examines estimating test scripting effort for GUI interfaces, so where requirements are structured in a User Experience Document
Estimating scripting effort for a GUI interface As with normal requirements a User Experience Document needs to be reviewed for single logical features identified. Error conditions and legal ranges need to be identified. Business rules need to identify the end to end processes. As a rough estimate of the amount of scripting time: For each widget (GUI interface), the test scripting effort = Number of widget features x 11 x 30 minutes, where 11 represents the standard minimum number of boundary tests required.
Estimating Manual Test Run Time This part examines estimating test run time for Manual Test Scripts.
Estimating Manual Test Run Time For First Pass To estimate manual test script run time. For each run: 5 minutes to set up each test script. 3 minutes per step in the script (not including the first set up and final end steps). 3 minutes for the end step BUT add time for defect handling. Or count last step as 5 minutes. One can expect that around 10% of scripts will flag a problem and so will need a defect report raised. So 15 minutes per defect x 10% of scripts to be run. Any additional time to set up (or re-set) the test environment will need to be added.
Estimating Manual Regression Runs For each set of scripts run: 10% will need to be run again to verify defect fixes. Repeat runs will be required for regression runs. This will either be: All scripts and initially one would want to re-run at least 3 times. Run all scripts once, then if a non critical or low risk system on each regression run, where new functionality is being added, gradually reduce the module testing as one adds end to end tests and automate tests, then for each pass reduce the manual module tests by 10%. Choice of reduction is based upon risk and this is covered later. IF critical or high risk, then all module tests will be required to be tested. However these can be either: Gradually automated as code stabilises Automate all tests from the start, however this has a very high overhead on test effort and minor changes in the code can mean considerable need to re- write tests, depending upon test framework in place. This needs to be resourced.
Estimating GUI Automation Testing This looks at the approach for estimating GUI Automation test effort
GUI Test Automation Estimation of GUI automation effort will depend upon choice of tool, the presence of an automated test framework and the stability of the code. If aiming to automate then allow for: Familiarisation of the tooling. Setting up of an automated test framework – could be 2 weeks minimum for a developer. Scripting, running and proving the first tests will take longer allow at least one week for first tests. If using a tool like Selenium and within a framework then allow for scripting: 5 to 10 minutes for low complexity, based upon experience. For highly complex scripts, a single step can take 1 hour to write. Hence an estimation and banding of the Risk and Complexity of the test target needs to be done. Note if using record and playback scripting is the same as a test run but add 15 minutes for administration. NOTE: If code is unstable, then the overhead on managing and updating scripts can be high. It may be decided to target automation at regression end to end scripts for stable code.
High Use of Automation Automation for unit code tests has the advantage of being able to measure simply the code coverage and should be encouraged. Usually automation is used gradually to replace manual functional scripts for code that is stabilised and has low risk of causing the need to re-write automated scripts. IF all functional scripts are to be automated early on, then there can be a high level of maintenance. In many instances, a manual test script that takes an hour to write, may require a day to write and prove an automated version (depending upon tool and framework). A manual test script may only take 5 minutes to change and may even be tolerant of change to code. However an automated script may require completely re-writing. So the maintenance level of scripts needs considerable thought. However there are ways around this.
Automation Ideally you need a low maintenance approach. Use where possible common scripts, where the data and expected results can be pulled from a table. This means that only the data needs manipulating and updating. Which in turn can reduce test maintenance effort. Always look for the smart approach to tooling and do not rely upon record and playback as this can be expensive.
Methods for Managing, Prioritising Regression Testing There are a number of methods for prioritising regression tests to target Risk. This section looks at these.
Managing Regression Pack A regression pack will grow as functionality is added. If manual scripts are being used for the core regression pack, then once the code become stable, it will be possible to automate scripts gradually. Set priorities for automation based upon: targeting scripts that are more successful at finding defects, targeting scripts that test critical or more risk related functionality. When running a manual regression pack within a time limited period, choose the test as a subset of the full test pack, choosing a customised subset for each run. The choice will be based upon: High risk functionality. Areas of code that have been changed or have interfaces that are impacted by change. Areas of code that have an existing record of being susceptible to defects. For a final regression run, one will want to run a full set of tests.
End of Part 1 of 2 See slide pack part 2 of 2.