Lessons Learned in Software DevelopmentQA Infrastructure – Maintaining Robustness in Commercial Software Marcus Lagergren Consulting Member of Technical Staff Oracle Corporation
About the Speaker Marcus Lagergren holds a master’s degree from KTH, major in Theoretical Computer Science Marcus was one of the founders of Appeal Virtual Machines that was acquired by BEA in 2002, which in turn was acquired by Oracle in 2008. Marcus has worked on almost all aspects of the JRockit Virtual Machine and is now working with Virtualization technology Marcus likes power tools and scuba diving.
Agenda Robustness in commercial apps with tight release schedules. Utopian vision: perpetual stable bits so we can spin off a release at any time Build Systems Source control and Development Tests Functionality Performance Regression testing
Agenda Result databases Automatic Testing Complex and not-so-standard testing Development aspects
Why listen to me? We’vespent the last 10 yearsdeveloping a JVM and the last 3 yearsdeveloping a Guest Operating system for twocommercial hypervisors. Hundreds of thousands on man hoursspent on robustnessalone Harder-to-debug software hardlyexists. We’vehad to invent stuff from dayone. Ok, from day 365 or so, lessonslearned We’vemademanymistakesalong the way. No one gets to Utopia, but at leastwehave a reasonablygoodidea of in whichdirection to go
BEA Confidential. | 6 QA infrastructure QA infrastructure is harder and probably even more important than development infrastructure. The most valuable lesson we have learned is that it must be developed parallel to the application and significant effort must be spent on it. It is at least as important as the application itself. Sometimes the boundaries between app and test infrastructure aren’t even clear.
QA infrastructure QA and Dev, if separate deparments or roles, shouldalways work together. Preferably as physicallyclose to eachother as possible Theyshould be able to fill in for eachother and be encouraged to doeachothers’ work. Verydangerous to have a separate QA department on anotherfloor. Verydangerous for QA to just do blackbox testing withoutunderstandingwhat’s in the box. QA staffshould be treated as anyotherdeveloper
Build System Build system, test system and sourcecontrol are parts of the same distributed system. Mobility - Buildanythinganywhere, locally or globally (distributed) - ”Adistributed cross compiler” Build system should be selfcontained & part of sourcecontrol. Do a sync on a fresh laptop, have all the details. We chose to putbinariesthere as well to producedeterministic bits and provide selfsufficience Not always a goodidea, butmostly a goodidea
Source Control and Development Needgood support for distributeddevelopment Should be able to handledirectories as separate sourcecontrolentitites. Gatekeepers of mainbranches, distributed team baseddevelopment. Sourcecontrol, builds and developmentshouldonlyrequire vi + prompt Morecomplexenvironments on top for ease of use. Easier to extend with different UIs.
Test System Also under sourcecontrol Distributed system – veryimportant. Virtualizeifpossible. Maximizeresourceusage. Local and remote test runspossible. Submitjobs ”crunchthroughthese tests” ”Check in ifpasses tests”. Test Machines Performance test machines (dedicated) Functionality test machines (not necessarilydedicated) Anymachinecanvolounteer CPU cycles for functional testing.
Building Blocks – Tests Many tests, especially regression tests, for an appneedn’t be morethan a mainclass with a returnvalue. Keep it simple! ”I spent a fewhoursdistilling this huge program down to a reproducer for BUG123456” Claim:ifit’s simple enough to write and submit a test, > 50% of the bugscan get regression tests as part of the original bugfix. I willaddress the other 50% later.
Building Blocks – Tests Easy-to-write tests make it possible for the test suite to grownaturally. If 10 minutes of spare time canlead to a new test beingwritten, checked in and enabled as part of the global test suite, you havesucceeded. Encouragedevelopers to check in unit tests for new functionalitytogether with the functionality. Need the infrastructure for it in the app Mightwant to enforce this strictly, but it might hinder developmenttoo.
Building Blocks – Result Database Store results in cheapdatabase with sensible layout somewhere. Any freeware is fine – get it up and running. Easy to maintain and backup Query from localmachinesabouthistorical test results. ”Whenexactlydid this performance regression appear?” ”List all benchmarkscores on this machine for this benchmarksinceJanuary 1” ”Has this functional test failedbefore? Whatwere the bugfixes?”
Building Blocks - Tests Use ”terror harnesses” that attack the cross sectionsbetweenmodules. AllocAndRun RedefineClasses ExceptionInClinit
Building Blocks - Performance Anythingcaneffects performance. EVERYTHING affects performance. Weneedautomatic regression warnings. Anyone who submits a performance regression will get an e-mail from the test system. Continuously make it easy to addmorebenchmarks. Automation: Deviations, baselines, invariants.
Testing – The need for continuous automatic testing Needcontinuousautomatic testing. Example from real life: JRockit Solaris has beenmadeavailableoff and on over the years. Bit rot sets in immediatelywhenremoved from automated testing. Release version may break debug version and vice versa. Linux version may break Windows version and vice versa. Useextremelystrict and pickycompilerflags.
Testing – So What About the Other 50%? Simple Java programs with main functions may not be enough for all the bugs. How do we test for a specific optimization bug in the code generator? How do we test for a strange boundary case that crashes the GC, that happens after two weeks in production? Key observation: We need to export a state.
Testing – So What About the Other 50%? Examples: Create a very special heap with a fewobjects in nastyplaces. Load it and trigger a garbagecollection. Save it and compare to reference. Serialize an IR from just before an offendingoptimization. Load it and trigger the optimization. Save the resulting IR and compare it to reference. Comparewould be more of an ”equals” than a ”memcmp” Weneed a level of modularizationthat’sgoodenough for this. The collection of tests shouldgrownaturally, but the VM design shouldallow the ways of testing the VM to grownaturally as well.
Testing – So What About the Other 50%? But of courseit’s not as simple as that. Whataboutmultithreadedapps? Race conditions? Plenty of threadsoperate on the same memory – e.g. Multithreaded GC. Howcanwe make test cases? Synchronization points. Randomized input, randomizedsleeps. Try to cover the malicioussideeffects of parallelism. ThingsliketheRaceTrackalgorithmcanfindsome (not all) races in staticcode, but the world is dynamic. Testing needs to be.
Testing – So What About the Other 50%? Disclaimer: Sometimeswe just need to crunch a lot of code for a long, long time. Nothingelsesuffices to reproduce a problem or the framework that would make it possibledoesn’texist. So make sure the distributed system burnsthosefree CPU cycles And make the dumps full and comprehensible. Don’tlosethem, dammit! No wipingthem after 24h. Disk is cheap. ”Phonehome” Suprisinglyeffectiveif you haveenough beta testers.
Testing – Retrofitting a framework You willprobablyhave to do this, sincepeopledon’tunderstand the importance of fundamental QA from day 1. Situation: Weneed the QA infrastructurebutdon’thave it. Our app has come a longway. Learn from history For example, go over 500 bug parade entries for HotSpot. Howmanycan be tested by small deterministicreproducers? Whatabout the rest - brainstormwhatfunctionality the VM wouldneedifwehad to write a simple reproducer for each problem.
Development – The platform matrix Try to keep the amount of common code as large as possible. It is always a choice between platform specific features and test matrix growth. Initially, our performance critical code was native. As our JIT got better, we would write more and more in Java. Native is much worse. ”premethods” Augmented Java – intrinsics, ”pd_addr”, preprocessed Java files.
Development – The platform matrix Otherseemlinglyplatformdependentthingscan be madeplatform independent. Example: Native stubs. The bulk of the work is parameter marshalling, the register allocatorcando that already. Beware of ”falseabstraction”. That extra parameter that is NULL on all platformsexcept IA64. Implementationlanguage: Debugging is an issue Powerful C/C++ debuggersexist. Meta-debugging is usuallyharder.
Development Don’tlosefocus. Modularity first. Example: ”the fastest server side JVM”, ”startup time is an issue”, ”clientapplications are an issue” ”weneedzero overhead runtime instrumentation”. Runfool! Run! It is importantwhenoptimizing for performance not just too look at e.g.SPECjbb™and SPECjvm98™ Real world applicationsdo a lot of otherthings. ”There is no genericcommutative plus operator”. At leastnobodycares.
Development - Policy Don’t be toomuch of a quality fascist whencode is written. If you spend all your time preventinglargercheckins or demand 100% testing on everythingnothingwillever get checked in. If you demand a strictlydocumented process with specifications for everything, all anyonewilleverdo is to writespecifications and holdmeetings. Both of the above are good in smalleramounts. It’smore of an awarenessthing. And the infrastructursshouldquickly and mercilesslyraise the alarm as soon as something breaks to preventfurtherdamage.
Lessons Learned Summary – The important stuff to bring with you Build the test infrastructure in parallel with the application Start at the same time! Don’tput it off. It is part of the appdevelopment process and should be in the time budget. IdeallyDevand QA teams should be fused and be able to doeachother’sjobs. No separate compartments. Don’t be afraid to couple it tightly in placesif that is what is required to maintainstability. Use all available CPU cycles for testing