Of Bugs and Men


Published on

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Of Bugs and Men

  1. 1. Of Bugs and Men (and Plugins too) Michel Wermelinger, Yijun Yu The Open University, UK Markus Strohmaier Technical University Graz, Austria
  2. 2. Plugins Working Conf. on Mining Softw. Repositories 2008 Int’l Conf. on Softw. Maintenance 2008
  3. 3. Motivation & Method What is the validity, generality and usefulness of design principles? Study long-term evolution Study architectural evolution Study complex systems Case study: Eclipse modern CBS with reusable, extensible components
  4. 4. Eclipse Static dependency: X depends on Y Dynamic dependency: X uses extension points provided by Y Self-cycles possible We analysed whole Eclipse SDK (JDT, PDE, etc)
  5. 5. Eclipse releases Various types of releases Major (e.g. 3.1) and maintenance releases (e.g. 3.1.1) Milestones (3.2M1) and Release candidates (3.2RC1) Maintenance of current major release in parallel with milestones and release candidates of next one We analysed 20 major and maintenance releases over 6 years (1.0 to‫‏‬ 27 milestone and release candidates over 2 years (3.1 to 3.3)‫‏‬ grouped in 2 sequences: 1.0 – 3.1 and 3.1 –
  6. 6. Data processing (concrete) product process repository graphviz repository guess OpenOffice release ccvisu Bug 1.0 bug.xml reports graphs histograms 1-100 . plugins status Architecture Bug Info. . . depend- priority Extractors severity Extractors . . encies RSF RSF . AWK, XSLT XSLT CrocoPat release graphs Bug plugin.xml graphviz bug.xml reports manifest.mf + spectrum visualisations Trace2PNG 219901-220000 time time
  7. 7. Some Research Questions Is there continuous growth (Lehman’s 6th law)? Is there any pattern (e.g. superlinear growth)? Does complexity increase (Lehman’s 2nd law)? Is there any effort to reduce it? Does coupling decrease? Does cohesion increase?
  8. 8. Modules A simple structural model Module = directed graph Elements = internal or external Arcs = internal or external relations External elements and arcs show context For Eclipse SDK module elements = plugins or external components arcs = static and/or dynamic dependencies
  9. 9. Module measures Size = # internal elements NIP = number of internal plugins Complexity = # internal arcs NISD/NIDD = number of internal static/dynamic dependencies Cohesion = complexity / size Coupling = # external arcs NESD (NEDD is always zero)
  10. 10. Size Evolution (1) Number of plugins kept, added, deleted w.r.t. previous release Number kept since initial release → stable architectural core Segmented growth Overall 4- to 5-fold growth, but not superlinear Many changes in 3.0; few deletions overall
  11. 11. Size Evolution (2) Long equilibrium and short punctuation periods Equilibrium: changes accommodated within current architecture Punctuation: changes require architectural revisions mostly in milestones some in release candidates hardly in maintenance
  12. 12. Architectural core jdt.ui jdt.launching jdt.doc.isv jdt.doc.user pde.doc.user platform.doc.isv platform.doc.user help.ui pde.runtime ant.ui search compare pde.core debug.ui jdt.debug help pde ui ant.core jdt.core debug.core core.runtime swt core.resources core with static and dynamic dependencies self-cycles point to reuse of extension points layered architecture core is >40% of release 1.0 and ca. 10% of
  13. 13. Complexity Evolution Charts show NISD (left) and NIDD (right)‫‏‬ Release 3.1 is major restructuring Static dependencies decreased by 19% Plugins increased by 57% More deletions, i.e. effort to reduce complexity
  14. 14. Cohesion evolution (1) Size (left) and complexity (right) grow in step Two exceptions Release 3.0 maintains size Release 3.1 reduces complexity
  15. 15. Cohesion evolution (2) Result: cohesion slightly decreases over time Except for major increase during 3.0.* releases Independently of static, dynamic, or both‫ ‏‬dependencies Low cohesion: <3 (incoming or outgoing) dependencies per plugin explicit effort to keep architecture loosely cohesive?
  16. 16. Coupling Evolution Charts show NESD Refactoring in 3.0: All existing external dependencies removed via new internal proxies External component org.apache.xerces was removed Overall, coupling is small compared to size and complexity
  17. 17. Acyclic Dependency Principle Dependency graph should be acyclic [Martin 96 and others] decreases change propagation eases release management and work allocation Measured cycle length over joint dependency graph Graph shows segmented growth of harmless self-cycles (length 1)‫‏‬ Single cycle with length > 1 was broken apart in release 3.0
  18. 18. Stable Dependency Principle dependencies should be in direction of stability [Martin 97] changes propagate opposite to dependencies if A depends on B, A can’t be harder to change than B instability of element = fanout / (fanin + fanout) irresponsible: fanin = 0, instability = 1, may change independent: fanout = 0, instability = 0, no reason to change
  19. 19. SDP Evolution Charts show number of SDP violations Absolute (left) and relative (right)‫‏‬ static, dynamic and both dependencies Numbers kept low, with ratio tending to decrease 1-5% violations for static dependencies, 9-17% for dynamic
  20. 20. Changeability measures slight adaptation of [van Belle 04] likelihood of changing an element # of actual changes / max possible # impact of an element’s changes avg # of elements changed with it acuteness = impact / likelihood high for interfaces, low for method bodies
  21. 21. Changes and Stability (1) changes and stability are related responsible elements: high change impact independent elements: low change likelihood stable elements: high change acuteness van Belle: correlational linkage implicit, from co-change observation takes change propagation closure into account Martin: causal linkage must be given explicitly only looks at immediate neighbours
  22. 22. Changes and Stability (2) measured fanin/fanout of the 69 plugins in release 2.0 measured impact/likelihood of same plugins over next 45 releases normalised measures, ordered plugins by fanin and fanout lower fanin ⇒ less responsible ⇒ lower impact: not quite so lower fanout ⇒ less dependent ⇒ lower likelihood: somewhat
  23. 23. Changes and Stability (3) measured instability when defined (52 plugins in 2.0)‫‏‬ All but one irresponsible and independent plugins remained so over time higher instability lower acuteness: mixed some trend but many exceptions likelihood vs independence is better than impact vs responsibility static causal linkage can’t predict future correlational linkage former only accounts for internal drives, latter includes external drives
  24. 24. Conclusions (1) Successful evolution of Eclipse due to…? systematic architectural change process segmented growth of size and complexity cohesion kept low; cycles removed SDP violations and coupling reduced significant stable layered architectural core Some consistency between causal and correlational changeability measures
  25. 25. Conclusions (2) many design principles/guidelines proposed, but… no empirical evidence of usefulness for maintenance selected representative case study large, complex, successful, component-based system accurate architectural information + enough evolution history generic and lightweight approach no reverse engineering, no static code analysis modules and changeability measures flexible scripting tool manipulating text files with relational data potential practical implications of findings confirmed some laws and principles; observed some patterns investigated static and historic changeability measures
  26. 26. Bugs and Men New Ideas and Emerging Results track of Int’l Conf. on Software Eng. 2009
  27. 27. Motivation Software engineering is socio-technical activity Global and open source software development led to increased interest in and relevance of social aspects Need for representing socio-technical relations Bipartite graphs of software artefacts and people Ad-hoc arc semantics, depending on relation Ad-hoc flat layout, often hard to read Relevant relations lost among many nodes and arcs Sought improvements: More compact, intuitive, and explicit representation Distinguish ‘hierarchical’ importance of artefacts, people and their relations.
  28. 28. General Approach Obtain a bipartite socio-technical network Compute socio-technical concept lattice Apply formal concept analysis (FCA) theory Use free tool ConExp (Concept Explorer) Concept: clusters all artefacts associated to same people Hierarchy: partial ordering of clusters Study different and evolving socio-technical relations Repeat for various relations and system releases
  29. 29. Case study Requirements: Should have non-trivial social and technical structure Should not have fluid social structure Should provide different data sources (not just code) Eclipse Has IBM lead and Bugzilla repository
  30. 30. The socio-technical network (1) Build PBC network P nodes: 16,025 people B nodes: 101,966 Eclipse SDK bug reports C nodes: 16 Eclipse SDK components p-b arc: p reported/assigned to/discussed b b-c arc: b is reported for c Repeat for various releases and roles
  31. 31. The socio-technical network (2) Build the PC network Folding of PBC, i.e. p-c arc with weight b person p is associated to b reports for component c Number of paths from p to c Build the PC(k) network Remove all arcs with weight < k Remove all weight information
  32. 32. Formal Concept Analysis Given objects O and attributes A and relation O × A e.g. O = components, A = assignees Concept c = (o ⊆ O, a ⊆ A) each object in o has all attributes a o is the extent and a is the intent of the concept Hierarchy: (o, a) ≤ (o’, a’) if o ⊆ o’ (or a’ ⊆ a) From top to bottom: extent decreases, intent increases Socio-technical concept lattice Usually, people at level n (bottom=0) associated to n components ‘specialists’ at lower, ‘generalists’ at upper levels Each node includes all its ancestors’ people and all its descendants’ components
  33. 33. Release 1.0, assignees, k=10 USA coordinating 2 Canadian teams? only 4 ‘generalists’ (2 components each) the French team only 1 developer associated: what if they leave project? the Swiss team most developers associated: is this largest or most complex component?
  34. 34. Release 3.0, assignees, k=100 only 2 ‘generalists’ Common developers: (3 components each) highly dependent components? Used higher k because bug reports accumulate over time Geographical and workload distribution like release 1.0
  35. 35. Release 3.0, discussants, k=100 Developers discuss more components than they are assigned to: due to dependencies? Developers don’t discuss all reports they are assigned to
  36. 36. Conclusions Novel application of Formal Concept Analysis Clustering and ordering of socio-technical relations General tool-supported approach Some advantages over bi-partite graphs More scalable: not one node per person and artefact More explicit: related people & artefacts in same node More intuitive: uniform vertical layout & arc semantics Helps spot expertise and potential problems Generalist and specialist people Artefacts with too many or too few people associated Undesired or absent communication/coordination
  37. 37. Concluding conclusions Software engineering is inherently socio-technical endeavour Availability of FLOSS projects allows to study historical heterogeneous data Used process and artefact data to present different views on same case study Evolution of architecture Hierarchy of maintainers Impact of dependencies Opportunities for many studies, mining and visualisation techniques that can help academics, developers and managers