Initial evaluation of DA4JavaIn my reengineering course (TU Delft)40-50 master students analyzing a Java system (150 KLOC)Pros/cons+ DA4Java reduces clutter/information overload+ Good input for discussing dependencies- Performance, graph can still get very complexTodoAdd information about changesUser studies to evaluate the approachUse the approach in different domains15
Evaluation with spreadsheet usersInterviews with 27 usersCase studies with 9 spreadsheets
Spreadsheet #Worksheets #Cells #FormulasShares risk management 9 29,671 221Top and bottom 5 stock performance 5 2,781 1,601Weekly report 16 9,555 7,215Overview of portfolio data 42 28,222 13,096Gain and loss of all trades for one week 10 503,050 38,188Constructing a stock portfolio 6 16,054 16,659Industrial spreadsheets
30ResultsDoes the visualization help to understand large, complexspreadsheets?Answers“This really helps me to understand what [worksheet] is what.”“The global view reveals the idea (design) behind the spreadsheet.”“The different levels allow to show and ﬁlter details.”Whats more ...?
Upload your spreadsheet at: http://app.infotron.nl
Lehman’s Laws of software evolution1. Continuing changeA program that is used in a real-world environment must change2. Increasing complexityAs a program evolves, it becomes more complex33
Growth and changes to Mozilla source code341998
Lehman’s Laws in practiceQuick ﬁxesLack of time, resources, money, etc.Initial good design is not maintainedSpaghetti code, copy/paste programming, dependencies are introduced, notests, etc.Documentation is not updated (if there is one)Architecture and design documentsOriginal developers leave and with them their knowledge35
Implications of Lehman’s Laws36Maintenance75%Initial development25%Maintenance costs increase60% is spent on understandingNumber of bugs increase
A solution: Business intelligence for SE38SourceCodeBugsTasksEmailsKnowledgeRepositoryData MiningWhat is the effect of the new developeron productivity?What are the effects of the source codechanges on the design?Where will bugs occur?Where is this bug located?
Study with MicrosoftReleased in January, 2007> 4 years of developmentSeveral thousand developersSeveral thousand binaries (*.exe, *.dll)Several millions of commits39RQ: Is fragmentation of contributions relatedwith the number of post-release failures?
Maxima over 4,000 binaries44Maximum#Commits 48,112.000#Authors 466.000Power 562.093Closeness 43.299Reach 0.473Betweenness 1.182
Research hypotheses45Binaries with fragmented contributions are failure-proneLarger fragmentation correlates with more post-release failures
Predicting failure-prone binaries46Binary logistic regression of 50 random splits4 principal components from 7 centrality measures402001.000.900.800.700.600.50402001.000.900.800.700.600.50402001.000.900.800.700.600.50Precision Recall AUC
Larger fragmentation - more failures47402001.000.900.800.700.600.50402001.000.900.800.700.600.50402001.000.900.800.700.600.50R-Square Pearson SpearmanLinear regression of 50 random splits#Failures = b0 + b1*Closeness + b2*#Authors + b3*#Commits
Summary of results48Centrality measures to predict 83% of failure-pone Vista binariesCloseness, #Authors, and #Commits to predict the number of post-release failures
What can we learn from that?Increase testing effort for central binaries? - yesRedesign central binaries? - maybeRestrict contributions? - maybe49AliceBobDanEricFuGoHinabc5462 4625 74
The knowledgeable software engineer50Martin AndreasKnowledgeRepositorymartin.email@example.com http://serg.aau.at