Agile Software Testing in a Large-Scale Project


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Agile Software Testing in a Large-Scale Project

  1. 1. focus software testing Agile Software Testing in a Large-Scale Project David Talby and Arie Keren, Air Force, Israel Defense Forces Orit Hazzan and Yael Dubinsky, Technion–Israel Institute of Technology gile software development in general, and Extreme Programming A Quantitative and (XP)1 in particular, promote radical changes in how software de- qualitative data velopment organizations traditionally work. This is particularly from a large, real- true in software testing. Agile development completely redefines world XP project quality assurance work—from formal roles to day-to-day activities—mak- validates agile ing some traditional QA responsibilities and outputs irrelevant.2–5 Agile de- testing practices velopment also eliminates the backbone of many organizations’ software and offers development cycle and its accompanying jar- cally improving development quality and pro- guidelines gon, including “delivering to QA,” “meeting ductivity in line with Kent Beck’s flat cost-of- in four key the QA criteria for delivery,” and “prioritizing change curve.1 We describe the organization’s bugs.” Agile practices require full integration successful practices and guidelines in four key development areas. of testing and development, which can alter areas: test design and activity execution, work- both organizational structures and high-level ing with professional testers, planning, and de- product-release policies. As a result, fully fect management. This article closes several adopting agile quality practices is slower and gaps in existing publications,1,2,7 most notably more difficult than adopting agile program- by describing how to effectively use profes- mer-oriented practices, such as pair program- sional testers and how to thoroughly accept- ming and test-first programming, which are as- ance-test a system that’s too large and complex similated only within the development team.6 for a single customer to specify. Currently, few real projects that implement full agile quality processes—and thus undergo Project overview these sometimes radical changes—have re- The IAF’s enterprise information system is ported substantial findings. Here, we present critical to daily operations and information se- and analyze new data from a real, large-scale curity. As a result, the target system is highly agile project to develop a business-critical en- complex and must be of the utmost quality. terprise information system for the Israeli Air The IAF considered the project risky both be- Force (IAF). Our results offer new evidence that cause of its size and because it was the first agile testing practices actually work, dramati- project of this scope to be developed in the or- 30 IEEE SOFTWARE Published by the IEEE Computer Society 0740-7459/06/$20.00 © 2006 IEEE
  2. 2. ganization using an XP-based method. Conse- cant customer collaboration and fruitful com- quently, many different stakeholders closely munication among teammates. The customer scrutinized the development process. To manage risks and provide full and timely participated in all planning games at both the release and iteration levels. In addition to re- Few real information about the project’s progress, the viewing the system during formal meetings, projects that team developed a set of metrics,8 consistently the customer regularly viewed in-development implement full collected and analyzed data, and held regular features and discussed them with analysts, de- reflection meetings and formal debriefings. velopers, and testers. Formally, however, the agile quality External consultants helped with and aug- customer belonged to a different organiza- practices have mented these efforts. tional unit and had other duties. reported We derived our data from the study proj- The project also employed XP’s whole-team ect’s first four releases (over eight months). and sit-together practices. From day one, the substantial Our quantitative data is the same that the project team met in the same room and was findings. project team used for ongoing project tracking managed as a single, coherent team. This team and decision making, so it’s of high quality included developers, architects, a tester, and and validity. Our qualitative data includes business analysts. The exception to the whole- recordings of formal reflection meetings and team practice was the customer, who had a debriefings, along with data from question- workstation in the team room but used it only naires, interviews, and an analysis of formal occasionally. These team-centered approaches organizational documents depicting and ex- resulted in two measurable test-related bene- plaining project decisions. To our knowledge, fits. First, the defect-management overhead this is the most comprehensive data set from was lower than in traditional projects, where an actual project using agile quality practices the QA and development teams are separate that has published its results to date. entities. Second, the project produced fewer false defects (“false alarms”) than traditional Conforming to XP projects do. The project team implemented many of the Team members promoted an informative standard XP practices,1,7 which we’ve itali- workplace using story cards on the wall, cized in this and the next section. which describe the details of the different de- XP’s basic practice is short releases. The velopment tasks and their status. They also team delivered working software at each iter- published up-to-date design diagrams and ation, which typically lasted two weeks. At the held a stand-up meeting every morning. Fi- end of each iteration, the team formally pre- nally, the team practiced continuous integra- sented the system to the customer for feed- tion, achieving one to three integration builds back. The team fully tested the system every on an average development day. two weeks. Team members fully specified, coded, and tested each new feature during the Diverging from classic XP iteration. Moreover, developers performed full The project differed from classic XP proj- regression testing at every iteration, fixing all ects in three respects. First, in addition to code known defects. The goal here was to achieve and tests, the team was required to produce several measurable testing benefits: few new detailed specifications as a development- and open defects, a short average time for de- process output. This was required due to or- fect fixes, and short average defect longevity. ganizational policies and obligations, as well The team used the planning game practice as because the project had to produce semifor- to plan project iterations and releases.1 In the mal specifications for a metadata repository9 planning game, the entire team meets and the to enable automatic code generation. So, com- customer describes and prioritizes the stories pleting a feature in this project meant com- that the system’s next release should imple- pleting its specifications, code, and tests. Team ment. The team then breaks the stories down members were thus encouraged to use pair into development tasks and estimates their programming, collective ownership, and cod- cost. Planning’s goal is to keep a sustainable ing standard practices for not only code but pace and define refactoring activities, thus also specifications and tests. achieving simple design. Second, unlike in classic XP, the project’s From the project’s start, there was signifi- acceptance testing (also known as user testing) July/August 2006 IEEE SOFTWARE 31
  3. 3. was far too complex for the customer to fully the required extra training, but the customer specify. According to the project’s leaders, ac- took part only in defining and reviewing some How an ceptance testing would require the equivalent test cases. of two full-time personnel. This finding led the Having everyone test offered several proj- organization organization to use professional testers—a ect advantages. First, it eliminated the team’s promotes common choice in large-scale projects. So, the reliance on the single, assigned project tester, project modified the customer tests practice to who would have otherwise constituted a ma- developer retain its essence while making it practical: Do jor bottleneck. This proved highly important testing is a thorough acceptance testing, but not only by because—for reasons unrelated to the proj- the customer. ect—the tester was replaced twice during the major issue in Third, the project modified test-driven de- first three releases. Second, because developers shifting to agile velopment to focus on the continuous creation were responsible for writing tests for each new development. of automated acceptance tests rather than unit tests. In classic XP, code is developed by first feature, their test-awareness increased and they prevented or quickly caught more edge writing a (failed) unit test, then writing the cases while they worked. code to make it pass, and so on. This project Finally, because people knew that they were required a solution for automating the massive required to test their specifications and code, amount of constantly generated acceptance they designed them for testability.11 That is, tests. The solution was a tool—well integrated where appropriate, developers added special in the project’s development environment— features or design considerations to the code that simulates an end user working on the to make hard-to-test features testable. Such complete multitier system.10 Unlike classic XP, coordination tasks are often difficult and ne- which automates acceptance tests as coded glected when developers and testers work as unit tests, in this critical project, the organiza- separate teams. tion considered only real simulation to be ad- equate acceptance testing. So, while the proj- Product size = test size ect still used unit testing, writing test scenarios Measuring progress is also crucial; the proj- directly into the automated acceptance-testing ect team tackled it by defining the project’s tool was preferred whenever possible. In this product size metrics as the number of regres- respect, test-first development meant planning sion test steps run at each iteration.8 The test scenarios before coding and never consid- “running and tested features” metric12 in- ering a feature “complete” before all its tests spired this approach. The project’s leaders were written and passed. chose this approach for two reasons. First, be- Each of the three exceptions to standard cause test size is more highly correlated with XP procedures is a typical issue for large-scale complexity than are lines of code or specifica- projects. Nonetheless, such differences must tions, it’s a better approximation of product be kept in mind when generalizing our agile size. Second, such a metric sends a strong mes- development results. sage to the team: only features that have full regression testing at each iteration are counted Test design and execution as delivered product size. Making sure that testing is everyone’s on- Reaching and increasing the product size going business is a big challenge. This project metric is a nontrivial team goal; in this case, tackled it by making testing part of each team the team performed it successfully in 12 out of member’s work, and a key measure of both the 13 iterations we studied. Figure 1 plots the team and personal productivity. product size metric over this period of time. Everyone tests Untested work = no work In a traditional project, “everyone is re- Another related factor is how the organiza- sponsible for quality,”2 but in agile projects, tion promotes developer testing. This is a ma- everyone actually writes tests. This includes all jor issue in shifting to agile development, par- developers, business analysts, and even the ticularly in organizations where testing is customer. For the IAF, this was a radical shift poorly perceived. The key to success is similar from existing practices. The organization im- to that with product size metrics. In this case, plemented the practice after developers received untested work equals no work—in all chan- 32 IEEE SOFTWARE w w w . c o m p u t e r. o r g / s o f t w a r e
  4. 4. nels and under all conditions. The case is sim- ilar for task completion. At each iteration 5,000 summary, the team reviews the iteration’s 4,500 tasks to investigate incompletion or time esti- Number of steps in test scenarios 4,000 mation errors. The team considers tasks lack- Entire team ing full written tests as incomplete—that is, 3,500 Professional testers in the team failures—even if they already “work” and 3,000 were presented to the customer. 2,500 In response to a written questionnaire ad- ministered after two releases, over 90 percent 2,000 of team members ranked acceptance and sys- 1,500 tem tests as “highly important”; the remaining 1,000 10 percent ranked them as “important.” Fur- thermore, 33 percent were either “interested” 500 or “very interested” in taking a leading role in 0 1.1 1.2 1.3 1.4 2.1 2.2 2.3 2.4 3.1 3.2 3.3 3.4 4.1 acceptance testing, and almost 60 percent Iteration were “interested” in leading system testing. Working with professional testers Figure 1. Product size In agile projects, everyone on the team fully absence affected development speed but didn’t per iteration. The team tests their own work. So, professional testers compromise full, continuous testing. ran more passing test add value not through more testing, but by scenarios—reflecting a writing some of the developers’ tests, thus Encourage interaction over isolation bigger product—on all freeing them to code more new features. Pro- Initially, the project tester didn’t work with iterations except one. fessional testers can also create better test sce- other team developers but instead wrote Note that all tests must narios, although this ability varies greatly cross-feature tests and automated the cus- be rerun on each among individuals. Using professional testers tomer-defined use cases. The testing person- iteration and are not nonetheless raises two key challenges for or- nel manager—who wasn’t an XP team mem- “accumulated.” ganizations adopting agile development: test- ber—required this separation for two ing bottlenecks and coordinating the testing reasons. work among testers and programmers. First, common wisdom states that testers must be independent of programmers to pre- Easing bottlenecks: Code less, test more vent implementation details from affecting Dealing with testing bottlenecks is rela- their test designs. This contamination argu- tively straightforward: developers simply code ment maintains that testers can’t write ade- less and test more. Figure 1 illustrates this, quate tests for a feature if a programmer has comparing the product-size metric points at- already explained the testing requirements. tained by tests that the entire team ran to the Second, one of testing’s aims is to verify that points attained by the team’s testers alone. The detailed specifications are up-to-date. So, graph lines show the various phases the proj- testers traditionally write tests according to ect went through. At first, only the tester was the specifications, not to what someone has running tests. This became impractical within “unofficially” told them. An unspecified fea- a few weeks as the software grew in size; at ture is viewed as a kind of bug. that point, most tests were handed over to These two claims form many QA profes- other developers. The only iteration in which sionals’ core objection to agile testing.13 The the total size decreased was when the tester claims target basic XP principles—whole-team was unexpectedly removed from the team and developer testing, in particular—and in- (owing to an emergency in another project). sinuate that they’re unsuitable for producing However, the team recovered and took over quality software. The counterargument isn’t testing the following week. Later, for the third that these claims are false but simply that the release (3.3 in figure 1), a new tester joined the alternative is better overall. Published reports team and tester-generated points increased on professional testers’ experience in agile again. In this process, the team proved capable teams support this.4,14,15 They also emphasize of coping with any tester availability—tester that testers must change their mind-set if July/August 2006 IEEE SOFTWARE 33
  5. 5. 50 60 45 50 40 35 40 Net hours Net hours 30 25 30 20 20 15 10 10 5 0 0 1.1 1.2 1.3 1.4 2.1 2.2 2.3 2.4 3.1 3.2 3.3 3.4 4.1 1.1 1.2 1.3 1.4 2.1 2.2 2.3 2.4 3.1 3.2 3.3 3.4 4.1 (a) Iteration (b) Iteration Figure 2. Testing and defect repair. they’re to successfully integrate into an agile ing time are usually equal: if feature A is esti- The study project team; our findings support this as well. mated to require five hours of specifications considered both In our study, testers working independently and 10 hours of coding, for example, it would regression testing found many minor mismatches between writ- typically be allocated 10 hours for writing and and defect fixing as ten specifications and the system, and few ac- running tests. This estimation can change on a global overhead. tual bugs. The mismatches were tracked and per-task basis, such as when a team knows in (a) Net hours resolved as bugs; they were usually trivial, tak- advance that a given feature is more difficult to devoted to running ing only minutes to fix. However, because de- test than others or requires refactoring of exist- and maintaining velopers tested each task as they wrote it, few ing tests. As a rule, features are time-estimated regression tests. actual system bugs remained at this stage. So, to completion. Feature A, for example, would (b) Net hours working with testers in this way was highly in- be planned as a single, 25-hour task. This again devoted to fixing efficient: in one extreme case, the tester found stresses that no task is considered complete un- defects. only two actual bugs after three weeks of test- til its tests are written and running. ing a complex feature. These results reassured the project’s man- Regression testing: Divide and conquer agement that developers could effectively test In our study project, regression testing was the software themselves. In turn, the organiza- considered as global (rather than personal) tion decided to integrate the tester into the overhead and allocated as a global time period. project team. The tester began to work on This works because regression tests are usually tasks with pairs of developers, writing tests in run on each iteration’s last day, with the testing parallel with coding after they had all in- work divided among team members. Figure 2a spected the feature’s specifications and gener- plots the net number of hours for running and ated a set of test cases accordingly. Although maintaining regression tests in each iteration. we haven’t yet collected data on the method’s As the figure shows, during the first two itera- effectiveness, it does seem clear that, in this tions, the variance between iterations was small agile project, continuous interaction is the enough to make this global approach the cor- more productive choice. rect choice. The net time required of the entire team at each iteration was about half a day; this Activity planning changed over time, owing mainly to substantial Planning for quality activities involves growth in team size in the third release. time-allocation challenges in feature testing, regression testing, and repairing defects. Allocate bug-fix time globally The project also allocated a global time Integrate feature testing and coding pool for fixing defects; experience and avail- In agile development, feature testing is al- able data thus far indicates that this was a ways an integral part of each feature’s devel- good choice for two reasons. First, estimating opment and time estimation. Testing and cod- the time to fix a defect is extremely difficult 34 IEEE SOFTWARE w w w . c o m p u t e r. o r g / s o f t w a r e
  6. 6. because defect complexity is unpredictable. At the same time, as figure 2b shows, the law of 2.00 averages applies here, indicating that we can 1.80 reasonably predict the time required for fixing 1.60 all defects in a forthcoming iteration. We can also adjust this much-more-stable estimate at 1.40 each iteration’s planning game according to 1.20 Hours the number of open defects at the time (see, 1.00 for example, iterations 2.1 and 2.4). 0.80 Second, planning defect resolution as an in- dividual task results in high overestimates. In 0.60 the single iteration that allocated time on a 0.40 per-defect basis, each defect was estimated to 0.20 require at least four hours, with the highest single defect estimate being nine hours. Such 0.00 1.2 1.3 1.4 2.1 2.2 2.3 2.4 3.1 3.2 3.3 3.4 4.1 overestimation occurs because a single, unex- Iteration pectedly complex defect might well require that much time. But, as the data shows, it’s highly improbable that many unrelated defects Figure 3. Average net will simultaneously require that much time. assign defect fixes to each other. XP’s whole- time to fix a defect per team and sit-together practices require that iteration. Defect management everyone, including the tester, sit in the same Defect management involves two major room. The informative-workplace practice re- challenges: managing workflow and selecting quires that everyone participate in the planning and scheduling the defects to fix. game and daily stand-up meetings, so people know each other and everyone’s respective Use a team-centered knowledge areas. Given this, team members defect-management approach can usually identify the right person to fix a Defect management workflow is much sim- given bug. This doesn’t occur when the testing pler in agile projects than in traditional ones in team is separate from the development team. three ways: The same practices also reduce false defect reports. Such reports either duplicate existing ■ Anyone can open a defect (once its devel- defect reports or are created owing to devel- opers declare that the defective feature is oper misunderstanding. During our study pe- “done,” it’s considered a defect). riod, only 29 false defects were recorded, con- ■ Anyone can close a defect, after fixing it stituting 7.5 percent of all defects. Sixteen of and running the related tests. these false defects were recorded in a single it- ■ Anyone who finds a defect also assigns it eration (2.3), in which the tester worked sepa- to someone to be fixed. rately from the team. These numbers are low; during the same periods, our two reference Team members routinely view their own de- projects at the IAF reported four to eight times fects list and fix the defects. If they’re assigned more false defects. a defect by mistake—for example, a defect re- lated to a subject they know nothing about— Fix defects as soon as possible they can reassign it. This eliminates a sizable Letting team members assign bug-fixing re- overhead from team leaders,4 who tradition- sponsibilities requires that the average bug-fix ally are responsible for this task. For example, time is short—in this study, it was slightly over the team leaders on two comparable tradi- an hour (see figure 3). This speed is crucial to tional projects at the IAF estimated that they ensure that precise load balancing of bugs devote one to three hours per week to defect among team members is unimportant. When a management, compared to zero for the agile bug could require a day or more to fix, team project’s leaders. leaders must assign such work to prevent bot- Team-centered, XP-based development is tlenecks and ensure that deadlines are met. In key to enabling project members to successfully this case, the average bug-fix time remains July/August 2006 IEEE SOFTWARE 35
  7. 7. person to fix a defect wasn’t always immedi- 1.6 ately available. In addition, teams sometimes find defects during regression testing, at the 1.4 end of an iteration, that they can’t fix in time. 1.2 Given this, our project’s team devoted the be- ginning of each iteration to fixing the previous 1.0 iteration’s open defects. Weeks 0.8 Figure 4 shows a defect’s average longevity over time. During the first four iterations, the 0.6 team fixed all defects within a week (usually at the start of the next iteration after their dis- 0.4 covery). The team relaxed the pace slightly in 0.2 the next two releases. However, the average remained low and relatively constant relative 0 to traditional projects (in the organization and 1.1 1.2 1.3 1.4 2.1 2.2 2.3 2.4 3.1 3.2 3.3 3.4 4.1 Iteration elsewhere). Moreover, these averages relate to all defects, not only critical or urgent ones. Fixing all defects as soon as possible has Figure 4. A defect’s several advantages. First, as we discussed ear- average longevity per relatively constant, even as the application lier, defects require far less time to fix with this iteration. The average grows significantly in size and complexity. This approach. Second, working on a clean and is for all defects, agrees with the flat cost-of-change curve,1 a highly stable code base makes new develop- regardless of severity. central benefit of agile development. ment faster. Third, besides avoiding the over- The average time is based on developer re- head of prioritizing and planning defect fixes, ports of the total net time they spent working this rule avoids unpleasant customer negotia- on defects (developers complete daily reports tions over which defects to solve. Such negoti- on time spent on each task). The team ation is often forced upon both sides when a achieved this very low average—an order of project’s delivery date approaches and—because magnitude or more lower than most projects, traditional development projects test late in including those in the same organization—for the process—it becomes clear that not all de- two reasons. First, there’s little overhead to re- fects will be fixed before the deadline. In the produce defects because testers and program- project studied here, the team fixed all defects mers work on the same baseline. Second, team routinely and, given its perceptions about members fix defects as soon as they’re found, product quality, planned no “stabilization pe- when the relevant code is still fresh in the de- riod” toward the official delivery date. velopers’ mind. Another significant difference from com- In traditional projects, a feature-complete mon practice that often surprises traditional version is sent to the QA group, which sends practitioners is that agile projects have no con- back a large stream of defects of varying im- cept of defect severity. This relieves developers portance. Because the effort to repair defects of both the pressure to stabilize the system just always greatly exceeds the available resources, before the delivery date and the need to coerce a lengthy process of prioritizing defects begins. the customer into dictating which defects to The team fixes the most urgent defects first, fix (and which to not fix). So, the only consid- during time slots preallocated for this activity eration regarding a given defect is whether to during project planning. fix it. If the answer is no, the defect isn’t Agile projects replace this process with a opened. If the answer is yes, it’s fixed when it’s simple rule: fix every defect as soon as you dis- cheapest to do so—right away. cover it. This is required because the team re- leases software to its customers at every itera- O tion, and releases shouldn’t, in principle, ur study’s data and analysis are useful contain known bugs. In our case study, the both to practitioners wishing to estab- rule was that defects should be fixed as soon lish new agile teams and to researchers as possible—this does not necessarily mean im- looking for high-quality, real-world data. In mediately, however, simply because the right addition to illustrating applicable tactics, our 36 IEEE SOFTWARE w w w . c o m p u t e r. o r g / s o f t w a r e
  8. 8. data shows that agile testing works. The proj- ect team cut by an order of magnitude the time About the Authors required to fix defects, defect longevity, and de- David Talby is a software team leader in the Israeli Air Force’s MAMDAS operational soft- fect-management overhead. Even on such a ware development unit, and a doctoral student in computer science at Hebrew University, large-scale project, the team achieved full re- Jerusalem. His research interests are in parallel computer scheduling and workload modeling, and software process improvement initiatives focusing on agile methods. He received his MSc gression testing at each iteration and developer in computer science and MBA in business administration from Hebrew University. Contact him testing. It also resolved all defects over a sig- at the School of Computer Science and Eng., Hebrew University, Givat Ram, Jerusalem 91904, nificant time period that included both person- Israel; nel changes and team growth. Implementing the presented process re- Orit Hazzan is an associate professor in the Department of Education in Technology and quired a new mind-set at personal and organi- Science at Technion–Israel Institute of Technology. Her research interests are in human aspects zational levels, and it remains a challenge to of software engineering, particularly relating to agile software development and XP develop- expand its adoption to other IAF projects. ment environments. She is coauthor (with Jim Tomayko) of Human Aspects of Software Engi- neering (Charles River Media, 2004). She received her PhD in mathematics education from the One key motivator for such expansion, in the Technion–Israel Institute of Technology. She’s a member of the ACM and the ACM Special Inter- IAF and industry-wide, is the continued publi- est Group on Computer Science Education. Contact her at the Dept. of Education in Technology cation of quality data on large-scale agile proj- and Science, Technion–Israel Inst. of Technology, Haifa 32000, Israel; ects. We strongly encourage and call for fur- ther investigation on these issues, and, in particular, on long-term agile projects, proj- Yael Dubinsky is a visiting member of the human-computer interaction research group ects spanning multiple coordinating teams, at the Department of Computer and Systems Science at La Sapienza, Rome, and an adjunct lec- turer in the Computer Science Department at the Technion–Israel Institute of Technology. Her and projects that are concurrently developing research examines the implementation of agile software development methods in software and maintaining production systems. teamwork, focusing on software process measurement and product quality. She received her PhD in science and technology education from the Technion–Israel Institute of Technology. Con- tact her at 7 Pinchas Rozen St., Haifa 34784, Israel; References 1. K. Beck and C. Andres, Extreme Programming Ex- Arie Keren leads the Magnet division’s systems-engineering group in Israel Aircraft Indus- plained, 2nd ed., Addison-Wesley, 2005. tries, and previously led the MAMDAS software development unit’s C4I engineering group in the 2. L. Crispin and T. House, Testing Extreme Programming, Israeli Air Force. His research interests are in systems analysis, architecture, and development Addison-Wesley, 2002. processes. He received his PhD in computer science from Hebrew University, Jerusalem. Contact 3. C. Kaner, J. Bach, and B. Pettichord, Lessons Learned him at 43 Lilach St., Modiin-Reut 71908, Israel; in Software Testing: A Context-Driven Approach, John Wiley & Sons, 2002. 4. R. Leavitt, “Let’s End the Defect Report-Fix-Check- Rework-Cycle,” 2005, presentation, 2005; www.rallydev. com/kportal.jsp?doc=wp6. 5. B. Pettichord, “Agile Testing Challenges,” Proc. Pacific Northwest Software Quality Conf., 2004; 14. L. Crispin, “Extreme Rules of the Road: How an XP ~wazmo/papers/agile_testing_challenges.pdf. Tester Can Steer the Project toward Success,” 6. V. Ramachandran and A. Shukla, “Circle of Life, Spiral STQE/Better Software Magazine, July/Aug. 2001, pp. of Death: Are XP Teams Following the Essential Prac- 24–29; tices?” Proc. XP/Agile Universe, LNCS 2418, Springer, 15. B. Pettichord, “Don’t Become the Quality Police,” 2002, pp. 166–173., 1 July 2002, www.stickyminds. 7. K. Beck, Test Driven Development: By Example, Addi- com/sitewide.asp?Function=edetail&ObjectType= son-Wesley, 2002. ART&ObjectId=3543. 8. Y. Dubinsky et al., “Agile Metrics at the Israeli Air Force,” Proc. Agile 2005 Conf., IEEE Press, 2005, pp. 12–19. 9. D. Talby et al., “The Design and Implementation of a Metadata Repository,” Proc. Int’l Council on Systems Eng. Israeli Chapter Conf., Incose-IL, 2002. 10. D. Talby et al., “A Process-Complete Automatic Accep- tance Testing Framework,” Proc. Int’l Conf. Software— Science, Technology and Eng. (SwSTE 05), IEEE CS Press, 2005, pp. 129–138. 11. B. Pettichord, “Design for Testability,” Proc. Pacific Northwest Software Quality Conf., 2002; ~wazmo/papers/design_for_testability_PNSQC.pdf. 12. R. Jeffries, “A Metric Leading to Agility,” XP Maga- zine, 14 June 2004; jatRtsMetric.htm. 13. M. Stephens and D. Rosenberg, Extreme Programming Refactored: The Case against XP, Apress Publishers, For more information on this or any other computing topic, please visit our 2003. Digital Library at July/August 2006 IEEE SOFTWARE 37