Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
The KnowledgeableSoftware EngineerUniv.-Prof. Dr. Martin PinzgerProfessor of Software EngineeringSoftware Engineering Rese...
Classical software systems2
Mobile applications (apps)3
The oven in your kitchen4
In your car5Software is everywhere!
Software systems are large6How many lines of code?10 MLOC = 4 meters
Software systems are complex7
1. Challenge: Understanding software systems8Martin?Andreas?
Perspective of software developers9Perspective of the user
Perspective of software developers10Difficult to spot and comprehenddependencies
A solution: software visualization11NbBundleTesttestExistingR.()testNonE.()main()NbBundlegetMessage()
Software visualization challenge12
DA4Java - Dependency Analysis for Java13Nested graphNodes present source code entitiesEdges present dependencies betweenth...
Install from: http://serg.aau.at/bin/view/MartinPinzger/DA4Java
Initial evaluation of DA4JavaIn my reengineering course (TU Delft)40-50 master students analyzing a Java system (150 KLOC)...
Applying the idea to Spreadsheets16
1750% form the basis for decisionsSpreadsheets are business criticalErrors often lead to financial losssee: http://www.eusp...
Interviewed 27 prof. spreadsheet users18What annoys you?What makes you happy?
19Support for understanding is missingHow are the different worksheets related? (44%)Where do formulas refer to? (38%)What...
Adapting DA4Java to Breviz
211. Cell classification
2. Identifying data blocks22
3. Data flow construction23C4 D4E4AVERAGEC5 D5E5AVERAGE
4. Name replacement24exam Richard Griffin lab Richard Griffinoverall Richard GriffinAVERAGE
End Result5. Grouping25exam Richard Griffin lab Richard Griffinoverall Richard GriffinAVERAGE
Breviz - Global View26
Breviz - Formula View27
Evaluation with spreadsheet usersInterviews with 27 usersCase studies with 9 spreadsheets
Spreadsheet #Worksheets #Cells #FormulasShares risk management 9 29,671 221Top and bottom 5 stock performance 5 2,781 1,60...
30ResultsDoes the visualization help to understand large, complexspreadsheets?Answers“This really helps me to understand w...
Upload your spreadsheet at: http://app.infotron.nl
32Software systems evolve
Lehman’s Laws of software evolution1. Continuing changeA program that is used in a real-world environment must change2. In...
Growth and changes to Mozilla source code341998
Lehman’s Laws in practiceQuick fixesLack of time, resources, money, etc.Initial good design is not maintainedSpaghetti code...
Implications of Lehman’s Laws36Maintenance75%Initial development25%Maintenance costs increase60% is spent on understanding...
2. Challenge: Evolving software systems37Martin?Andreas?
A solution: Business intelligence for SE38SourceCodeBugsTasksEmailsKnowledgeRepositoryData MiningWhat is the effect of the...
Study with MicrosoftReleased in January, 2007> 4 years of developmentSeveral thousand developersSeveral thousand binaries ...
Approach40ChangeLogsBugs RegressionAnalysisMeasuringContributionsCount post-releasefailure reports
Developer contributions41StevePrinter.dllSystem.dllBillChange Logs Build System
Developer contribution network42AliceBobDanEricFuGoHinabcWindows binary (*.dll)DeveloperWhich binary is failure-prone?
Network centrality measures43AliceBobDanEricFuGoHinabcFreeman degreeAliceBobDanEricFuGoHinabcAliceBobDanEricFuGoHinabcBona...
Maxima over 4,000 binaries44Maximum#Commits 48,112.000#Authors 466.000Power 562.093Closeness 43.299Reach 0.473Betweenness ...
Research hypotheses45Binaries with fragmented contributions are failure-proneLarger fragmentation correlates with more pos...
Predicting failure-prone binaries46Binary logistic regression of 50 random splits4 principal components from 7 centrality ...
Larger fragmentation - more failures47402001.000.900.800.700.600.50402001.000.900.800.700.600.50402001.000.900.800.700.600...
Summary of results48Centrality measures to predict 83% of failure-pone Vista binariesCloseness, #Authors, and #Commits to ...
What can we learn from that?Increase testing effort for central binaries? - yesRedesign central binaries? - maybeRestrict ...
The knowledgeable software engineer50Martin AndreasKnowledgeRepositorymartin.pinzger@aau.at http://serg.aau.at
Upcoming SlideShare
Loading in …5
×

The Knowledgeable Software Engineer

1,845 views

Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

The Knowledgeable Software Engineer

  1. 1. The KnowledgeableSoftware EngineerUniv.-Prof. Dr. Martin PinzgerProfessor of Software EngineeringSoftware Engineering Research GroupUniversity of Klagenfurt
  2. 2. Classical software systems2
  3. 3. Mobile applications (apps)3
  4. 4. The oven in your kitchen4
  5. 5. In your car5Software is everywhere!
  6. 6. Software systems are large6How many lines of code?10 MLOC = 4 meters
  7. 7. Software systems are complex7
  8. 8. 1. Challenge: Understanding software systems8Martin?Andreas?
  9. 9. Perspective of software developers9Perspective of the user
  10. 10. Perspective of software developers10Difficult to spot and comprehenddependencies
  11. 11. A solution: software visualization11NbBundleTesttestExistingR.()testNonE.()main()NbBundlegetMessage()
  12. 12. Software visualization challenge12
  13. 13. DA4Java - Dependency Analysis for Java13Nested graphNodes present source code entitiesEdges present dependencies betweenthemIncrementally add/filter infoAdd/filter dependent entitiesIntegrated with Eclipse IDEWorks for JavaInstall from: http://serg.aau.at/bin/view/MartinPinzger/DA4Java
  14. 14. Install from: http://serg.aau.at/bin/view/MartinPinzger/DA4Java
  15. 15. Initial evaluation of DA4JavaIn my reengineering course (TU Delft)40-50 master students analyzing a Java system (150 KLOC)Pros/cons+ DA4Java reduces clutter/information overload+ Good input for discussing dependencies- Performance, graph can still get very complexTodoAdd information about changesUser studies to evaluate the approachUse the approach in different domains15
  16. 16. Applying the idea to Spreadsheets16
  17. 17. 1750% form the basis for decisionsSpreadsheets are business criticalErrors often lead to financial losssee: http://www.eusprig.org/horror-stories.htm
  18. 18. Interviewed 27 prof. spreadsheet users18What annoys you?What makes you happy?
  19. 19. 19Support for understanding is missingHow are the different worksheets related? (44%)Where do formulas refer to? (38%)What cells are meant for input? (22%)What cells contain output? (22%)
  20. 20. Adapting DA4Java to Breviz
  21. 21. 211. Cell classification
  22. 22. 2. Identifying data blocks22
  23. 23. 3. Data flow construction23C4 D4E4AVERAGEC5 D5E5AVERAGE
  24. 24. 4. Name replacement24exam Richard Griffin lab Richard Griffinoverall Richard GriffinAVERAGE
  25. 25. End Result5. Grouping25exam Richard Griffin lab Richard Griffinoverall Richard GriffinAVERAGE
  26. 26. Breviz - Global View26
  27. 27. Breviz - Formula View27
  28. 28. Evaluation with spreadsheet usersInterviews with 27 usersCase studies with 9 spreadsheets
  29. 29. Spreadsheet #Worksheets #Cells #FormulasShares risk management 9 29,671 221Top and bottom 5 stock performance 5 2,781 1,601Weekly report 16 9,555 7,215Overview of portfolio data 42 28,222 13,096Gain and loss of all trades for one week 10 503,050 38,188Constructing a stock portfolio 6 16,054 16,659Industrial spreadsheets
  30. 30. 30ResultsDoes the visualization help to understand large, complexspreadsheets?Answers“This really helps me to understand what [worksheet] is what.”“The global view reveals the idea (design) behind the spreadsheet.”“The different levels allow to show and filter details.”Whats more ...?
  31. 31. Upload your spreadsheet at: http://app.infotron.nl
  32. 32. 32Software systems evolve
  33. 33. Lehman’s Laws of software evolution1. Continuing changeA program that is used in a real-world environment must change2. Increasing complexityAs a program evolves, it becomes more complex33
  34. 34. Growth and changes to Mozilla source code341998
  35. 35. Lehman’s Laws in practiceQuick fixesLack of time, resources, money, etc.Initial good design is not maintainedSpaghetti code, copy/paste programming, dependencies are introduced, notests, etc.Documentation is not updated (if there is one)Architecture and design documentsOriginal developers leave and with them their knowledge35
  36. 36. Implications of Lehman’s Laws36Maintenance75%Initial development25%Maintenance costs increase60% is spent on understandingNumber of bugs increase
  37. 37. 2. Challenge: Evolving software systems37Martin?Andreas?
  38. 38. A solution: Business intelligence for SE38SourceCodeBugsTasksEmailsKnowledgeRepositoryData MiningWhat is the effect of the new developeron productivity?What are the effects of the source codechanges on the design?Where will bugs occur?Where is this bug located?
  39. 39. Study with MicrosoftReleased in January, 2007> 4 years of developmentSeveral thousand developersSeveral thousand binaries (*.exe, *.dll)Several millions of commits39RQ: Is fragmentation of contributions relatedwith the number of post-release failures?
  40. 40. Approach40ChangeLogsBugs RegressionAnalysisMeasuringContributionsCount post-releasefailure reports
  41. 41. Developer contributions41StevePrinter.dllSystem.dllBillChange Logs Build System
  42. 42. Developer contribution network42AliceBobDanEricFuGoHinabcWindows binary (*.dll)DeveloperWhich binary is failure-prone?
  43. 43. Network centrality measures43AliceBobDanEricFuGoHinabcFreeman degreeAliceBobDanEricFuGoHinabcAliceBobDanEricFuGoHinabcBonacich’s powerClosenessAliceBobDanEricFuGoHinabc
  44. 44. Maxima over 4,000 binaries44Maximum#Commits 48,112.000#Authors 466.000Power 562.093Closeness 43.299Reach 0.473Betweenness 1.182
  45. 45. Research hypotheses45Binaries with fragmented contributions are failure-proneLarger fragmentation correlates with more post-release failures
  46. 46. Predicting failure-prone binaries46Binary logistic regression of 50 random splits4 principal components from 7 centrality measures402001.000.900.800.700.600.50402001.000.900.800.700.600.50402001.000.900.800.700.600.50Precision Recall AUC
  47. 47. Larger fragmentation - more failures47402001.000.900.800.700.600.50402001.000.900.800.700.600.50402001.000.900.800.700.600.50R-Square Pearson SpearmanLinear regression of 50 random splits#Failures = b0 + b1*Closeness + b2*#Authors + b3*#Commits
  48. 48. Summary of results48Centrality measures to predict 83% of failure-pone Vista binariesCloseness, #Authors, and #Commits to predict the number of post-release failures
  49. 49. What can we learn from that?Increase testing effort for central binaries? - yesRedesign central binaries? - maybeRestrict contributions? - maybe49AliceBobDanEricFuGoHinabc5462 4625 74
  50. 50. The knowledgeable software engineer50Martin AndreasKnowledgeRepositorymartin.pinzger@aau.at http://serg.aau.at

×