Evidence-based Software Process Recovery: A Postdoctoral View




                         Evidence-based Software
                            Process Recovery:
                           A Postdoctoral View
                                                      Abram Hindle
                                       Department of Computing Science
                                             University of Alberta
                                              Edmonton, Alberta
                                                    Canada
                                          http://softwareprocess.es/
                                    abram.hindle@softwareprocess.es




                                                                         1
Abram Hindle
What are we going to do?

                    Theory   Practice




                             ?
Business Modeling
Requirements
Analysis & Design
Implementation
Test
Deployment
CM and SCS

Project Mangement
Environment
                                        2
3
Motivation: Stakeholders


     Fixers or                Investors and
Star Programmers               Acquisitions

                   Managers




                         Employees assigned
  New Developers             to a ISO9000
                         conformance project
                                          4
Proposed Process   Recovered Process
Workflows




                           Is my proposed
                           process actually
                             being used?

                                              5
Proposed and Recovered Differences between
   Process Overlayed  Proposed and Recovered
Workflows




                   I can compare
                  and contrast the
                 observed process
                     versus the
                 expected process!
                                          6
How to get an overview: Interviews
                             time
                          consuming



                  annoying

      ER
DANG
 access



                                     7
Can't we just                                                   Revisions
                                        Software
 summarize                             Repositories
what is going                 Source Code




on within this                     Source Code                       Build / Configuration




   project?                                             Tests
                                                                Documentation




                                                   Phases
         Disciplines
                           Inception      Elaboration    Construction Transition
       Business Modeling
       Requirements
       Analysis & Design
       Implementation
       Test
       Deployment
       CM and SCS
       Project Mangement
       Environment
                           Initial Elab     Elab    Const Const Const Trans
                                                                                    8
Research Relationships
                                                    Software
                                  Intent, Purpose
                      Behaviour                   Development
                                     and Tasks
                                                    Process
 Release Patterns:
    Source/Test/
Build/Documentation

   Topic Analysis

  Large Changes

 Recovered Unified
   Process Views




                                                           9
Mining Software Repositories


                             Revisions




  Source Code


       Source Code                 Build / Configuration




                     Tests
                              Documentation
                                                  10
Source Acquisition


                         Version
                         Control

                        Revisions
 discussions   bugs       source

Initial Repositories and artifacts
                                    11
SOFTWARE
PROCESS
RECOVERY
       12
Release Patterns: STBD
                             Release
                             Event

                                                          Linear Regression



              Source                                      Source Code Revisions
                                                          per Time Unit (day)
                                                          Smoothed
                                                          Summed Near Release
Revisions
per Time
              Test                                        Test Revisions
Unit                                                      per Time Unit (day)
summed                                                    Smoothed
per time                                                  Summed Near Release
unit
before and    Build                                       Build Revisions
after a                                                   per Time Unit (day)
release                                                   Smoothed
                                                          Summed Near Release

              Documentation                               Documentation Revisions
                                                          per Time Unit (day)
                                                          Smoothed
             -n time units      0      +n time units
                                                          Summed Near Release

                                Time                   [Hindle ICSM07]
                                                                    13
STBD applied to SQLite




SQLITE                   14
Proportion of Commits




                                  0
                                      0.05
                                             0.1
                                                   0.15
                                                          0.2
                                                                0.25
                                                                       0.3
                                                                             0.35
                                                                                    0.4
      C                                                                                   0.45
       or
                  re
                    ct
                         iv
                              e
           A
                  d
                   ap
                        ti
                             ve
      P
             er
                fe
                   c    ti
                             ve
         S No
          o     n
        C urc -
         od e
            e -
        m Im
          en p
             ta le-
               ti
                  on
            15
                                                                                                 Maintenance Classes of Large Changes




[Hindle ICPC09]
What is this commit about?




 Added a test for bug
   #1326 on OSX




                             16
What is this commit about?




 Added a test for bug
   #1326 on OSX




                             17
What is this commit about?




    Added a test for bug
      #1326 on OSX


              Maintain-
Reliability   ability     Portability

                                        18
But we have many commits..




              Maintain-
Reliability               Portability
              ability
                                        19
Cross Project Relevance

      Version                                             Version
      Control                                             Control


                efficiency


                 Shared
                                  usability



                                                 reliability and
                                                  functionality




                Concepts
                                            (includes correctness)




                                 maintainability


   Version         portability
                                                                     Version
   Control                                                           Control




                                                                               20
Quality Related
Non functional requirements



portability            reliability and
                        functionality
                  (includes correctness)
   usability

                  efficiency

maintainability              [cleland-huang03]
                                      [ernst10]
                                             21
Word Bag
                 Examples



  Portability           Reliability
     portability         reliability
   transferability         failure
  interoperability          error
   documentation        redundancy
internationalization        fails
        i18n                 bug
         ...                  ...      22
Developer Topics


     Commit                   Commit



 Developer Topic              Developer Topic
                   purpose?


Maintainability               Reliability
                     LDA
                                                23
Unique Topics   Labelled Developer Topics




                Time (months)           24
Labelled Developer Topics


                      Linux
Unique Topics



                      Kernel
                     Windows
                     AMD64

                Time (months)           25
Labelled Developer Topics
                           efficiencyportability                   efficiency
                  portability                                  functionalit
                  maintainability
                        efficiency
Unique Topics




                                                 reliability


                   maintainability portability

                 functionality

                Time (months)                                           26
MaxDB 7.500 Timeline




Maintainability   Maintainability    Maintainability
Portability       Portability        Effeciency
                  Reliability
                  Effeciency



                                            27
                                    [MSR 2011]
SOFTWARE
PROCESS
RECOVERY:
        28
Recovered Unified Process
           Views
                    Theory   Practice
Business Modeling
Requirements
Analysis & Design
Implementation
Test
Deployment
CM and SCS

Project Mangement
Environment
                                 [ICSM10]
                                       29
UP Requirements Signal




      +             +
              =

                         30
UP Implementation Signal




             =

                           31
UP Testing Signal




              =

                    32
SQLite Case Study




                    33
SQLite Case Study: 2009

                     Testing
Business Modelling
                     Deployment
Requirements
                     Config/SCS
Analysis
                     Project Management
Implementation
                     Environment
                                          34
UP Observability



?            ?          ?
      Disciplines

    Business Modeling
    Requirements
    Analysis & Design

    Project Mangement
    Environment
                            35
Future Work

               ?




People                             Accuracy
               ?
                   Unknown
                    Project




 and                  MSR
                   Benchmark




teams        Validation


Industrial                      Iteration
                              Identification
                                              36
Common Threads
     Idioms
          *.doxygen
       *.tex
                    FILES
                          Makefile
                                 Makefile.*                                 External
        INSTALL doc/
      README
AUTHORS
 TODO
                                configure
                           configure.*
                                                                            Project
*.txt
               Documentation
                                             setup.hs
                                             *test*
                                                     build.xml
                                                      setup.py            Language
    *.hs
 *.ml
                *.php          *.java             *.scm
                                                                                        shared
                  *.lisp *.cpp
  *.tcl
               *.c
                     *.C    Source Code
                                           *.py     *.t unit tests
                                                  *.sql                               vocabulary
Source Code
                                          *.pl    Source Code
                                                                 *.test
                     *.pl
                                          *.pm
                *.rb
                                                                Test




                                                                             vocabulary       vocabulary
        Shared Terms
                                                                            Project          Project
                     Usability
                                          Maintainability                  Internal
                                                                            Project
              Portability           Reliability
                                                          Efficiency        Language
                                                                                                       37
Process Recovery Summary
                                                                               Disciplines
                                                                            Business Modeling
                                                                            Requirements
                                                                            Analysis & Design
                                                                            Implementation
                                                                            Test
                                                                            Deployment
                                                                            CM and SCS
                                                                            Project Mangement
                                                                                                                                                                                                                      Managers
Software Process


                                                                            Environment




                                                    Maintenance Classes                                                                                                        Release Patterns
                                                                                                  DeveloperportaTopic Analysis
                                                                                                          efficiency bility
                                                                                                                                                                                                                       Fixers or
                                            0.45



                                                                                                  and Labelling
                                             0.4

                                            0.35
                                                                                                                    efficiency                                                                                      Star Programmers
    Recovery



                                                                                                                                                                                                   Source
                    Proportion of Commits




                                             0.3

                                            0.25                                                                                       reliability
                                             0.2

                                            0.15                                                                                                                                                        Test
                                             0.1
                                                                                                         maintainability portability
                                            0.05
                                                                                                   functionality                                                                                 Build
                                              0
                                                                                                                                          LDA
                                                                         ta le-
                                                                               on
                                                      e




                                                                                e
                                                              ve




                                                                   C urc -
                                                     iv




                                                                             iv

                                                                              n

                                                                        e -
                                                                     en p
                                                                            ti
                                                                    od e




                                                                                                                                          LSI
                                                                    S No
                                                              ti




                                                                   m Im
                                                     ct




                                                                          ct




                                                                                                                                                                                 Documentation
                                                          ap
                                                   re




                                                                       fe


                                                                     o
                                                          d
                                               or




                                                                    er




                                                                                                                                                     [Hindle09ICSM]
                                                                                                                                                                      [hindl
                                                          A




                                                                              [Hindle ICPC09]
                                             C




                                                                   P




                                                                                                     [Hindle and Ernst http://softwareprocess.es/name/]                                         [Hindle ICSM07]
                                                                                                                                                                                         Time
                                                                                                                                                                                                                   New Developers

                                                                                                                                                                                     Tests           Source
                                                                                                                             Documentation

                   Usability Maintainability                                        Reliability   Portability
                                                                        Efficiency
                                                                                                                                                                  Build
                                                                                                                                                                                                                    Investors and
                                                                                                                                                                                                                     Acquisitions



                                                                                                                                                                           Version
                                                                                                                                                                           Control                                   ISO9000
                                                                                                                                                                                                                    Consultants


                                                                                                                                                                      Revisions
                                                                       Discussions                                       Bugs                                             Source
                                                                                                                                                                                                                              38
Lessons Learned
                   Boring
                   Slides
                    Conferences!
Collaborate!

Organization


               A PhD is not enough
                                   39

Postdoc Symposium - Abram Hindle

  • 1.
    Evidence-based Software ProcessRecovery: A Postdoctoral View Evidence-based Software Process Recovery: A Postdoctoral View Abram Hindle Department of Computing Science University of Alberta Edmonton, Alberta Canada http://softwareprocess.es/ abram.hindle@softwareprocess.es 1 Abram Hindle
  • 2.
    What are wegoing to do? Theory Practice ? Business Modeling Requirements Analysis & Design Implementation Test Deployment CM and SCS Project Mangement Environment 2
  • 3.
  • 4.
    Motivation: Stakeholders Fixers or Investors and Star Programmers Acquisitions Managers Employees assigned New Developers to a ISO9000 conformance project 4
  • 5.
    Proposed Process Recovered Process Workflows Is my proposed process actually being used? 5
  • 6.
    Proposed and RecoveredDifferences between Process Overlayed Proposed and Recovered Workflows I can compare and contrast the observed process versus the expected process! 6
  • 7.
    How to getan overview: Interviews time consuming annoying ER DANG access 7
  • 8.
    Can't we just Revisions Software summarize Repositories what is going Source Code on within this Source Code Build / Configuration project? Tests Documentation Phases Disciplines Inception Elaboration Construction Transition Business Modeling Requirements Analysis & Design Implementation Test Deployment CM and SCS Project Mangement Environment Initial Elab Elab Const Const Const Trans 8
  • 9.
    Research Relationships Software Intent, Purpose Behaviour Development and Tasks Process Release Patterns: Source/Test/ Build/Documentation Topic Analysis Large Changes Recovered Unified Process Views 9
  • 10.
    Mining Software Repositories Revisions Source Code Source Code Build / Configuration Tests Documentation 10
  • 11.
    Source Acquisition Version Control Revisions discussions bugs source Initial Repositories and artifacts 11
  • 12.
  • 13.
    Release Patterns: STBD Release Event Linear Regression Source Source Code Revisions per Time Unit (day) Smoothed Summed Near Release Revisions per Time Test Test Revisions Unit per Time Unit (day) summed Smoothed per time Summed Near Release unit before and Build Build Revisions after a per Time Unit (day) release Smoothed Summed Near Release Documentation Documentation Revisions per Time Unit (day) Smoothed -n time units 0 +n time units Summed Near Release Time [Hindle ICSM07] 13
  • 14.
    STBD applied toSQLite SQLITE 14
  • 15.
    Proportion of Commits 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 C 0.45 or re ct iv e A d ap ti ve P er fe c ti ve S No o n C urc - od e e - m Im en p ta le- ti on 15 Maintenance Classes of Large Changes [Hindle ICPC09]
  • 16.
    What is thiscommit about? Added a test for bug #1326 on OSX 16
  • 17.
    What is thiscommit about? Added a test for bug #1326 on OSX 17
  • 18.
    What is thiscommit about? Added a test for bug #1326 on OSX Maintain- Reliability ability Portability 18
  • 19.
    But we havemany commits.. Maintain- Reliability Portability ability 19
  • 20.
    Cross Project Relevance Version Version Control Control efficiency Shared usability reliability and functionality Concepts (includes correctness) maintainability Version portability Version Control Control 20
  • 21.
    Quality Related Non functionalrequirements portability reliability and functionality (includes correctness) usability efficiency maintainability [cleland-huang03] [ernst10] 21
  • 22.
    Word Bag Examples Portability Reliability portability reliability transferability failure interoperability error documentation redundancy internationalization fails i18n bug ... ... 22
  • 23.
    Developer Topics Commit Commit Developer Topic Developer Topic purpose? Maintainability Reliability LDA 23
  • 24.
    Unique Topics Labelled Developer Topics Time (months) 24
  • 25.
    Labelled Developer Topics Linux Unique Topics Kernel Windows AMD64 Time (months) 25
  • 26.
    Labelled Developer Topics efficiencyportability efficiency portability functionalit maintainability efficiency Unique Topics reliability maintainability portability functionality Time (months) 26
  • 27.
    MaxDB 7.500 Timeline Maintainability Maintainability Maintainability Portability Portability Effeciency Reliability Effeciency 27 [MSR 2011]
  • 28.
  • 29.
    Recovered Unified Process Views Theory Practice Business Modeling Requirements Analysis & Design Implementation Test Deployment CM and SCS Project Mangement Environment [ICSM10] 29
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
    SQLite Case Study:2009 Testing Business Modelling Deployment Requirements Config/SCS Analysis Project Management Implementation Environment 34
  • 35.
    UP Observability ? ? ? Disciplines Business Modeling Requirements Analysis & Design Project Mangement Environment 35
  • 36.
    Future Work ? People Accuracy ? Unknown Project and MSR Benchmark teams Validation Industrial Iteration Identification 36
  • 37.
    Common Threads Idioms *.doxygen *.tex FILES Makefile Makefile.* External INSTALL doc/ README AUTHORS TODO configure configure.* Project *.txt Documentation setup.hs *test* build.xml setup.py Language *.hs *.ml *.php *.java *.scm shared *.lisp *.cpp *.tcl *.c *.C Source Code *.py *.t unit tests *.sql vocabulary Source Code *.pl Source Code *.test *.pl *.pm *.rb Test vocabulary vocabulary Shared Terms Project Project Usability Maintainability Internal Project Portability Reliability Efficiency Language 37
  • 38.
    Process Recovery Summary Disciplines Business Modeling Requirements Analysis & Design Implementation Test Deployment CM and SCS Project Mangement Managers Software Process Environment Maintenance Classes Release Patterns DeveloperportaTopic Analysis efficiency bility Fixers or 0.45 and Labelling 0.4 0.35 efficiency Star Programmers Recovery Source Proportion of Commits 0.3 0.25 reliability 0.2 0.15 Test 0.1 maintainability portability 0.05 functionality Build 0 LDA ta le- on e e ve C urc - iv iv n e - en p ti od e LSI S No ti m Im ct ct Documentation ap re fe o d or er [Hindle09ICSM] [hindl A [Hindle ICPC09] C P [Hindle and Ernst http://softwareprocess.es/name/] [Hindle ICSM07] Time New Developers Tests Source Documentation Usability Maintainability Reliability Portability Efficiency Build Investors and Acquisitions Version Control ISO9000 Consultants Revisions Discussions Bugs Source 38
  • 39.
    Lessons Learned Boring Slides Conferences! Collaborate! Organization A PhD is not enough 39