SlideShare a Scribd company logo
1 of 86
Download to read offline
Software Systems as Cities:
 A Controlled Experiment



 Richard Wettel, Michele Lanza        Romain Robbes
    REVEAL @ Faculty of Informatics     PLEIAD @ DCC
        University of Lugano           University of Chile
            Switzerland                      Chile
Software Systems as Cities
City Metaphor




                VISSOFT 2007
City Metaphor



   class building
package district


                    VISSOFT 2007
City Metaphor



   class building
package district


                    VISSOFT 2007
City Metaphor



   class building
package district

          nesting level   color

                                  VISSOFT 2007
City Metaphor
     number of methods (NOM)       height

     number of attributes (NOA)    base size

  number of lines of code (LOC)    color




   class building
package district

                   nesting level   color

                                               VISSOFT 2007
Program Comprehension
                  ArgoUML
                  LOC     136,325




                         ICPC 2007
Program Comprehension
FacadeMDRImpl
NOA         3
                skyscraper
NOM       349
LOC     3,413
CPPParser
NOA        85
                office building
NOM       204
LOC     9,111
JavaTokenTypes
NOA       173
                 parking lot
NOM         0
LOC         0
house   PropPanelEvent
        NOA          2
        NOM          3
        LOC         37
Program Comprehension




                    ICPC 2007
Design Quality Assessment
disharmony map




ArgoUML classes
   brain class      8
   god class       30
   god + brain      6
   data class      17
   unaffected    1,715
                         SoftVis 2008
System Evolution Analysis



time traveling




                       WCRE 2008
System Evolution Analysis

                 time
                        ArgoUML
time traveling          8 major releases
                        6 years




                                           WCRE 2008
System Evolution Analysis

                 time
                        ArgoUML
time traveling          8 major releases
                        6 years




                                           WCRE 2008
http://codecity.inf.usi.ch




                 implemented in Smalltalk


                                ICSE 2008 tool demo
Is it
useful  ?
A Controlled Experiment
Design
technical report 2010


State of the art?
Design desiderata
 1 Avoid comparing using a technique against not using it.
 2 Involve participants from the industry.
 3 Provide a not-so-short tutorial of the experimental tool to the participants.
 4 Avoid, whenever possible, giving the tutorial right before the experiment.
 5 Use the tutorial to cover both the research behind the approach and the tool.
 6 Find a set of relevant tasks.
 7 Choose real object systems that are relevant for the tasks.
 8 Include more than one object system in the design.
 9 Provide the same data to all participants.
10 Limit the amount of time allowed for solving each task.
11 Provide all the details needed to make the experiment replicable.
12 Report results on individual tasks.
13 Include tasks on which the expected result is not always to the advantage of the
   tool being evaluated.
14 Take into account the possible wide range of experience level of the participants.
Design desiderata
 1 Avoid comparing using a technique against not using it.
 2 Involve participants from the industry.
 3 Provide a not-so-short tutorial of the experimental tool to the participants.
 4 Avoid, whenever possible, giving the tutorial right before the experiment.
 5 Use the tutorial to cover both the research behind the approach and the tool.
 6 Find a set of relevant tasks.
 7 Choose real object systems that are relevant for the tasks.
 8 Include more than one object system in the design.
 9 Provide the same data to all participants.
10 Limit the amount of time allowed for solving each task.
11 Provide all the details needed to make the experiment replicable.
12 Report results on individual tasks.
13 Include tasks on which the expected result is not always to the advantage of the
   tool being evaluated.
14 Take into account the possible wide range of experience level of the participants.
Finding a baseline




1. program comprehension

2. design quality assessment

3. system evolution analysis
Finding a baseline




1. program comprehension

2. design quality assessment

3. system evolution analysis
Finding a baseline




1. program comprehension

2. design quality assessment

3. system evolution analysis
Finding a baseline




1. program comprehension

2. design quality assessment
Tasks
Tasks
                         program comprehension                                    6
A1      Identity the convention used in the system to organize unit tests.
A2.1&   What is the spread of term T in the name of the classes, their attributes and
A2.2    methods?
A3      Evaluate the change impact of class C, in terms of intensity and dispersion.
A4.1    Find the three classes with the highest number of methods.
        Find the three classes with the highest average number of lines of code per
A4.2
        method.
Tasks
                         program comprehension                                    6
A1      Identity the convention used in the system to organize unit tests.
A2.1&   What is the spread of term T in the name of the classes, their attributes and
A2.2    methods?
A3      Evaluate the change impact of class C, in terms of intensity and dispersion.
A4.1    Find the three classes with the highest number of methods.
        Find the three classes with the highest average number of lines of code per
A4.2
        method.
B1.1    Identify the package with the highest percentage of god classes.
B1.2    Identify the god class with the largest number of methods.
        Identify the dominant (affecting the highest number of classes) class-level
B2.1
        design problem.
B2.2    Write an overview of the class-level design problems in the system.

                     design quality assessment                                    4
Tasks
                         program comprehension                                    6
                                                                                  5
A1      Identity the convention used in the system to organize unit tests.
A2.1&   What is the spread of term T in the name of the classes, their attributes and
A2.2    methods?
A3      Evaluate the change impact of class C, in terms of intensity and dispersion.
A4.1    Find the three classes with the highest number of methods.
        Find the three classes with the highest average number of lines of code per
A4.2
        method.
B1.1    Identify the package with the highest percentage of god classes.
B1.2    Identify the god class with the largest number of methods.
        Identify the dominant (affecting the highest number of classes) class-level
B2.1
        design problem.
B2.2    Write an overview of the class-level design problems in the system.

                     design quality assessment                                    4
Tasks
                                                     quantitative                 9
                                                                                  8
A1      Identity the convention used in the system to organize unit tests.
A2.1&   What is the spread of term T in the name of the classes, their attributes and
A2.2    methods?
A3      Evaluate the change impact of class C, in terms of intensity and dispersion.
A4.1    Find the three classes with the highest number of methods.
        Find the three classes with the highest average number of lines of code per
A4.2
        method.
B1.1    Identify the package with the highest percentage of god classes.
B1.2    Identify the god class with the largest number of methods.
        Identify the dominant (affecting the highest number of classes) class-level
B2.1
        design problem.
B2.2    Write an overview of the class-level design problems in the system.

                                                         qualitative              1
Main research questions
Main research questions


   1
       Does the use of CodeCity increase the correctness
       of the solutions to program comprehension tasks,
       compared to non-visual exploration tools, regardless
       of the object system size?
Main research questions


   1
       Does the use of CodeCity increase the correctness
       of the solutions to program comprehension tasks,
       compared to non-visual exploration tools, regardless
       of the object system size?




   2
       Does the use of CodeCity reduce the time needed to
       solve program comprehension tasks, compared to
       non-visual exploration tools, regardless of the object
       system size?
Variables of the experiment
Variables of the experiment
             correctness
dependent
             completion time
                 CodeCity
            tool Eclipse + Excel
independent                                   medium
            object system size                large

                                           beginner
             experience level              advanced
controlled                         academia
             background            industry
Variables of the experiment
             correctness
dependent
             completion time
                 CodeCity
            tool Eclipse + Excel
independent                                   medium
            object system size                large

                                           beginner
             experience level              advanced
controlled                         academia
             background            industry
Variables of the experiment
             correctness
dependent
             completion time
                 CodeCity
            tool Eclipse + Excel
independent                                   medium
            object system size                large

                                           beginner
             experience level              advanced
controlled                         academia
             background            industry
Variables of the experiment
             correctness
dependent                                             FindBugs
             completion time                          1,320 classes
                                                      93,310 LOC
                 CodeCity
            tool Eclipse + Excel
independent                                   medium
            object system size                large

                                           beginner
             experience level              advanced   Azureus
controlled                         academia           4,656 classes
             background            industry           454,387 LOC
Variables of the experiment
             correctness
dependent
             completion time
                 CodeCity
            tool Eclipse + Excel
independent                                   medium
            object system size                large

                                           beginner
             experience level              advanced
controlled                         academia
             background            industry
Variables of the experiment
             correctness
dependent
             completion time
                 CodeCity
            tool Eclipse + Excel
independent                                   medium
            object system size                large

                                           beginner
             experience level              advanced
controlled                         academia
             background            industry
The experiment’s design


                     between-subjects
                    randomized-block
The experiment’s design


                                             between-subjects
                                            randomized-block


            CodeCity
                       T1    large


     Tool
                       T2   medium
                                     Size

            Ecl+Excl
                       T3    large


                       T4   medium
The experiment’s design
background       academia               industry
experience   beginner advanced beginner advanced


              B1         B2          B3       B4
                                                       between-subjects
                                                      randomized-block


                     CodeCity
                                T1    large


              Tool
                                T2   medium
                                               Size

                     Ecl+Excl
                                T3    large


                                T4   medium
The experiment’s design
background       academia               industry
experience   beginner advanced beginner advanced


              B1         B2          B3       B4
                                                       between-subjects
                                                      randomized-block


                     CodeCity
                                T1    large


              Tool
                                T2   medium
                                               Size

                     Ecl+Excl
                                T3    large


                                T4   medium
Execution
Experimental runs
Experimental runs
                   day 1
                           time


training session
(1 hour)
Experimental runs
                     day 1
                             time


training session
(1 hour)




                      e1
experiment session
(2 hours)
                      c1
Experimental runs
                     day 1   day 2
                                     time


training session
(1 hour)




                      e1      e2
experiment session
(2 hours)
                      c1      c2
Experimental runs
                     day 1   day 2   day 3
                                             time


training session
(1 hour)




                      e1      e2      e3
experiment session
(2 hours)
                      c1      c2
Experimental runs
                     day 1   day 2   day 3   day 4
                                                     time


training session
(1 hour)




                      e1      e2      e3
experiment session
(2 hours)
                      c1      c2              c4
Testing the waters
                                       2009
         November           December
         18 24 25   2   9


Lugano
                1   3   1
            1   1   1   1
Timeline of the experiment
                                         2009 2010
           November           December               January        February           ... April
           18 24 25   2   9         21    28   5     8   14    28   18   22 24 25             14


Lugano
                  1   3   1               1                         1              1
              1   1   1   1                                                        3



 Bologna
                                    2          1         6
                                    1          1     1   1



 Antwerp
                                                                         5
                                                                         6



Bern
                                                                               4               1
                                                                               6
Timeline of the experiment
                                         2009 2010
           November           December               January        February           ... April
           18 24 25   2   9         21    28    5    8    14   28   18   22 24 25             14


Lugano
                  1   3   1               1                         1              1
              1   1   1   1                                                        3
                                                remote
                                               sessions
 Bologna
                                    2           1         6
                                    1           1    1    1



 Antwerp
                                                                         5
                                                                         6
                                                                                       remote
                                                                                       session
Bern
                                                                               4               1
                                                                               6
Treatments and subjects
                               academia        industry
                           beginner advanced advanced

                  large         2         2          6    10
       CodeCity
                  medium        3         2          7    12
                  large         2         3          3    8
       Ecl+Excl
                  medium        2         5          4    11
                                9         12      20      41
Collecting raw data
Collecting raw data




          solution
Collecting raw data
        completion time
Controlling time
Controlling time


common time
Controlling time


common time


info on subjects
Name (Task): Remaining time
Assessing correctness
       1 T2: Findbugs, analyzed with CodeCity                  A3: Impact Analysis                                               B1.2

       A1                                                      Multiple locations.                                               The god class containing the largest number of methods in the system is
                                                               There are 40/41 [0.5pts] classes                                  class MainFrame [0.8pts]
       Dispersed. [1pt]                                        defined in the following 3 packages [1/6pts for each]:             defined in package edu.umd.cs.findbugs.gui2 [0.1pts]
                                                                                                                                 which contains 119 [0.1pts] methods.
                                                                  • edu.umd.cs.findbugs
       A2.1
                                                                  • edu.umd.cs.findbugs.bcel                                      B2.1
       Localized [0.5pts]
       in package edu.umd.cs.findbugs.detect [0.5pts].             • edu.umd.cs.findbugs.detect                                    The dominant class-level design problem is
                                                                                                                                 DataClass [0.5pts]
       A2.2                                                    A4.1                                                              which affects a number of 67 [0.5pts] classes.

       Dispersed                                                  The 3 classes with the highest number of methods are [ 1 pts each correctly placed and 1 pts each misplaced]:
                                                                                                                         3                               6
       in the following (max. 5) packages [0.2pts for each]:      1. class AbstractFrameModelingVisitor
          • edu.umd.cs.findbugs                                       defined in package edu.umd.cs.findbugs.ba
                                                                     contains 195 methods;
          • edu.umd.cs.findbugs.anttask
                                                                  2. class MainFrame
          • edu.umd.cs.findbugs.ba                                    defined in package edu.umd.cs.findbugs.gui2
                                                                     contains 119 methods;
          • edu.umd.cs.findbugs.ba.deref
                                                                  3. class BugInstance
          • edu.umd.cs.findbugs.ba.jsr305                             defined in package edu.umd.cs.findbugs
          • edu.umd.cs.findbugs.ba.npe                                contains 118 methods
                                                                     or
          • edu.umd.cs.findbugs.ba.vna                                class TypeFrameModelingVisitor
                                                                     defined in package edu.umd.cs.findbugs.ba.type
          • edu.umd.cs.findbugs.bcel                                  contains 118 methods.
          • edu.umd.cs.findbugs.classfile
                                                               A4.2
          • edu.umd.cs.findbugs.classfile.analysis
                                                                 The 3 classes with the highest average number of lines of code per method are [ 1 pts each correctly placed and 1 pts each
          • edu.umd.cs.findbugs.classfile.engine                                                                                                   3                               6
                                                               misplaced]:
          • edu.umd.cs.findbugs.classfile.impl
                                                                  1. class DefaultNullnessAnnotations
          • edu.umd.cs.findbugs.cloud                                 defined in package edu.umd.cs.findbugs.ba
                                                                     has an average of 124 lines of code per method;
          • edu.umd.cs.findbugs.cloud.db
                                                                  2. class DBCloud.PopulateBugs
          • edu.umd.cs.findbugs.detect                                defined in package edu.umd.cs.findbugs.cloud.db
                                                                     has an average of 114.5 lines of code per method;
          • edu.umd.cs.findbugs.gui
                                                                  3. class BytecodeScanner
          • edu.umd.cs.findbugs.gui2
                                                                     defined in package edu.umd.cs.findbugs.ba
          • edu.umd.cs.findbugs.jaif                                  has an average of 80.75 lines of code per method.

          • edu.umd.cs.findbugs.model                           B1.1
          • edu.umd.cs.findbugs.visitclass




                                                                                                                                                     oracles
                                                               The package with the highest percentage of god classes in the system is
          • edu.umd.cs.findbugs.workflow                         edu.umd.cs.findbugs.ba.deref [0.8pts]
                                                               which contains 1 [0.1pts] god classes
                                                               out of a total of 3 [0.1pts] classes.
Assessing correctness
blinding   1 T2: Findbugs, analyzed with CodeCity                  A3: Impact Analysis                                               B1.2

           A1                                                      Multiple locations.                                               The god class containing the largest number of methods in the system is
                                                                   There are 40/41 [0.5pts] classes                                  class MainFrame [0.8pts]
           Dispersed. [1pt]                                        defined in the following 3 packages [1/6pts for each]:             defined in package edu.umd.cs.findbugs.gui2 [0.1pts]
                                                                                                                                     which contains 119 [0.1pts] methods.
                                                                      • edu.umd.cs.findbugs
           A2.1
                                                                      • edu.umd.cs.findbugs.bcel                                      B2.1
           Localized [0.5pts]
           in package edu.umd.cs.findbugs.detect [0.5pts].             • edu.umd.cs.findbugs.detect                                    The dominant class-level design problem is
                                                                                                                                     DataClass [0.5pts]
           A2.2                                                    A4.1                                                              which affects a number of 67 [0.5pts] classes.

           Dispersed                                                  The 3 classes with the highest number of methods are [ 1 pts each correctly placed and 1 pts each misplaced]:
                                                                                                                             3                               6
           in the following (max. 5) packages [0.2pts for each]:      1. class AbstractFrameModelingVisitor
              • edu.umd.cs.findbugs                                       defined in package edu.umd.cs.findbugs.ba
                                                                         contains 195 methods;
              • edu.umd.cs.findbugs.anttask
                                                                      2. class MainFrame
              • edu.umd.cs.findbugs.ba                                    defined in package edu.umd.cs.findbugs.gui2
                                                                         contains 119 methods;
              • edu.umd.cs.findbugs.ba.deref
                                                                      3. class BugInstance
              • edu.umd.cs.findbugs.ba.jsr305                             defined in package edu.umd.cs.findbugs
              • edu.umd.cs.findbugs.ba.npe                                contains 118 methods
                                                                         or
              • edu.umd.cs.findbugs.ba.vna                                class TypeFrameModelingVisitor
                                                                         defined in package edu.umd.cs.findbugs.ba.type
              • edu.umd.cs.findbugs.bcel                                  contains 118 methods.
              • edu.umd.cs.findbugs.classfile
                                                                   A4.2
              • edu.umd.cs.findbugs.classfile.analysis
                                                                     The 3 classes with the highest average number of lines of code per method are [ 1 pts each correctly placed and 1 pts each
              • edu.umd.cs.findbugs.classfile.engine                                                                                                   3                               6
                                                                   misplaced]:
              • edu.umd.cs.findbugs.classfile.impl
                                                                      1. class DefaultNullnessAnnotations
              • edu.umd.cs.findbugs.cloud                                 defined in package edu.umd.cs.findbugs.ba
                                                                         has an average of 124 lines of code per method;
              • edu.umd.cs.findbugs.cloud.db
                                                                      2. class DBCloud.PopulateBugs
              • edu.umd.cs.findbugs.detect                                defined in package edu.umd.cs.findbugs.cloud.db
                                                                         has an average of 114.5 lines of code per method;
              • edu.umd.cs.findbugs.gui
                                                                      3. class BytecodeScanner
              • edu.umd.cs.findbugs.gui2
                                                                         defined in package edu.umd.cs.findbugs.ba
              • edu.umd.cs.findbugs.jaif                                  has an average of 80.75 lines of code per method.

              • edu.umd.cs.findbugs.model                           B1.1
              • edu.umd.cs.findbugs.visitclass




                                                                                                                                                         oracles
                                                                   The package with the highest percentage of god classes in the system is
              • edu.umd.cs.findbugs.workflow                         edu.umd.cs.findbugs.ba.deref [0.8pts]
                                                                   which contains 1 [0.1pts] god classes
                                                                   out of a total of 3 [0.1pts] classes.
Results
Statistical test



   two-way analysis of variance (ANOVA)
                            95% confidence interval
Correctness
                                                           Ecl+Excl
                                                           CodeCity

Does the use of CodeCity increase the         8
correctness of the solutions to program
comprehension tasks, compared to non-         7
visual exploration tools, regardless of the   6
object system size?
                                              5

                                              4

                                              3

                                              2

                                              1

                                              0
                                                  medium      large
Correctness
                                                           Ecl+Excl
                                                           CodeCity

Does the use of CodeCity increase the         8
correctness of the solutions to program
comprehension tasks, compared to non-         7
visual exploration tools, regardless of the   6
object system size?
                                              5




       24.26%
                                              4

                                              3

                                              2
            more correct with CodeCity
                                              1
                large effect size (d=0.89)
                                              0
                                                  medium      large
Correctness
                                                           Ecl+Excl
                                                           CodeCity

Does the use of CodeCity increase the         8
correctness of the solutions to program
comprehension tasks, compared to non-         7
visual exploration tools, regardless of the
object system size?
                                              6




       24.26%
                                              5

                                              4

            more correct with CodeCity        3
                large effect size (d=0.89)
                                              2
                                                  medium      large
Completion time
                                                             Ecl+Excl
                                                             CodeCity

Does the use of CodeCity reduce the time       60
needed to solve program comprehension
tasks, compared to non-visual exploration      50
tools, regardless of the object system size?
                                               40

                                               30

                                               20

                                               10

                                                0
                                                    medium      large
Completion time
                                                             Ecl+Excl
                                                             CodeCity

Does the use of CodeCity reduce the time       60
needed to solve program comprehension
tasks, compared to non-visual exploration      50
tools, regardless of the object system size?
                                               40




        12.01%
                                               30

                                               20

                 less time with CodeCity       10
           moderate effect size (d=0.63)
                                                0
                                                    medium      large
Completion time
                                                             Ecl+Excl
                                                             CodeCity

Does the use of CodeCity reduce the time       60
needed to solve program comprehension
tasks, compared to non-visual exploration
tools, regardless of the object system size?
                                               50




        12.01%
                                               40



                                               30
                 less time with CodeCity
           moderate effect size (d=0.63)
                                               20
                                                    medium      large
after the first round

CodeCity
vs

Ecl+Excl
+24% correctness
-12% completion time
Software Systems as Cities: a Controlled Experiment
Software Systems as Cities: a Controlled Experiment
Software Systems as Cities: a Controlled Experiment
Software Systems as Cities: a Controlled Experiment
Software Systems as Cities: a Controlled Experiment
Software Systems as Cities: a Controlled Experiment
Software Systems as Cities: a Controlled Experiment
Software Systems as Cities: a Controlled Experiment
Software Systems as Cities: a Controlled Experiment
Software Systems as Cities: a Controlled Experiment

More Related Content

Similar to Software Systems as Cities: a Controlled Experiment

Introduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaIntroduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaopenseesdays
 
Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models IJECEIAES
 
2014 01-ticosa
2014 01-ticosa2014 01-ticosa
2014 01-ticosaPharo
 
EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...
EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...
EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...ijseajournal
 
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...SAIL_QU
 
Systematic software development using vdm by jones 2nd edition
Systematic software development using vdm by jones 2nd editionSystematic software development using vdm by jones 2nd edition
Systematic software development using vdm by jones 2nd editionYasir Raza Khan
 
Systematic software development using vdm by jones 2nd edition
Systematic software development using vdm by jones 2nd editionSystematic software development using vdm by jones 2nd edition
Systematic software development using vdm by jones 2nd editionYasir Raza Khan
 
A preliminary study on using code smells to improve bug localization
A preliminary study on using code smells to improve bug localizationA preliminary study on using code smells to improve bug localization
A preliminary study on using code smells to improve bug localizationkrws
 
Lecture 1 uml with java implementation
Lecture 1 uml with java implementationLecture 1 uml with java implementation
Lecture 1 uml with java implementationthe_wumberlog
 
Devry CIS 247 Full Course Latest
Devry CIS 247 Full Course LatestDevry CIS 247 Full Course Latest
Devry CIS 247 Full Course LatestAtifkhilji
 
Oop(object oriented programming)
Oop(object oriented programming)Oop(object oriented programming)
Oop(object oriented programming)geetika goyal
 
SADP PPTs of all modules - Shanthi D.L.pdf
SADP PPTs of all modules - Shanthi D.L.pdfSADP PPTs of all modules - Shanthi D.L.pdf
SADP PPTs of all modules - Shanthi D.L.pdfB.T.L.I.T
 
Applying Machine Learning to Software Clustering
Applying Machine Learning to Software ClusteringApplying Machine Learning to Software Clustering
Applying Machine Learning to Software Clusteringbutest
 
Object Oriented Analysis and Design
Object Oriented Analysis and DesignObject Oriented Analysis and Design
Object Oriented Analysis and DesignDr. C.V. Suresh Babu
 
Developing Tools for “What if…” Testing of Large-scale Software Systems
Developing Tools for “What if…” Testing of Large-scale Software SystemsDeveloping Tools for “What if…” Testing of Large-scale Software Systems
Developing Tools for “What if…” Testing of Large-scale Software Systems James Hill
 

Similar to Software Systems as Cities: a Controlled Experiment (20)

Introduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaIntroduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKenna
 
Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models
 
VISSOFTPresentation.pdf
VISSOFTPresentation.pdfVISSOFTPresentation.pdf
VISSOFTPresentation.pdf
 
2014 01-ticosa
2014 01-ticosa2014 01-ticosa
2014 01-ticosa
 
EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...
EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...
EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...
 
Oops ppt
Oops pptOops ppt
Oops ppt
 
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
 
Systematic software development using vdm by jones 2nd edition
Systematic software development using vdm by jones 2nd editionSystematic software development using vdm by jones 2nd edition
Systematic software development using vdm by jones 2nd edition
 
Systematic software development using vdm by jones 2nd edition
Systematic software development using vdm by jones 2nd editionSystematic software development using vdm by jones 2nd edition
Systematic software development using vdm by jones 2nd edition
 
A preliminary study on using code smells to improve bug localization
A preliminary study on using code smells to improve bug localizationA preliminary study on using code smells to improve bug localization
A preliminary study on using code smells to improve bug localization
 
Siguccs20101026
Siguccs20101026Siguccs20101026
Siguccs20101026
 
Lecture 1 uml with java implementation
Lecture 1 uml with java implementationLecture 1 uml with java implementation
Lecture 1 uml with java implementation
 
Devry CIS 247 Full Course Latest
Devry CIS 247 Full Course LatestDevry CIS 247 Full Course Latest
Devry CIS 247 Full Course Latest
 
Oop(object oriented programming)
Oop(object oriented programming)Oop(object oriented programming)
Oop(object oriented programming)
 
SADP PPTs of all modules - Shanthi D.L.pdf
SADP PPTs of all modules - Shanthi D.L.pdfSADP PPTs of all modules - Shanthi D.L.pdf
SADP PPTs of all modules - Shanthi D.L.pdf
 
Objective-C
Objective-CObjective-C
Objective-C
 
Applying Machine Learning to Software Clustering
Applying Machine Learning to Software ClusteringApplying Machine Learning to Software Clustering
Applying Machine Learning to Software Clustering
 
Sample SRS format
Sample SRS formatSample SRS format
Sample SRS format
 
Object Oriented Analysis and Design
Object Oriented Analysis and DesignObject Oriented Analysis and Design
Object Oriented Analysis and Design
 
Developing Tools for “What if…” Testing of Large-scale Software Systems
Developing Tools for “What if…” Testing of Large-scale Software SystemsDeveloping Tools for “What if…” Testing of Large-scale Software Systems
Developing Tools for “What if…” Testing of Large-scale Software Systems
 

Recently uploaded

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 

Recently uploaded (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Software Systems as Cities: a Controlled Experiment

  • 1. Software Systems as Cities: A Controlled Experiment Richard Wettel, Michele Lanza Romain Robbes REVEAL @ Faculty of Informatics PLEIAD @ DCC University of Lugano University of Chile Switzerland Chile
  • 3. City Metaphor VISSOFT 2007
  • 4. City Metaphor class building package district VISSOFT 2007
  • 5. City Metaphor class building package district VISSOFT 2007
  • 6. City Metaphor class building package district nesting level color VISSOFT 2007
  • 7. City Metaphor number of methods (NOM) height number of attributes (NOA) base size number of lines of code (LOC) color class building package district nesting level color VISSOFT 2007
  • 8. Program Comprehension ArgoUML LOC 136,325 ICPC 2007
  • 10. FacadeMDRImpl NOA 3 skyscraper NOM 349 LOC 3,413
  • 11. CPPParser NOA 85 office building NOM 204 LOC 9,111
  • 12. JavaTokenTypes NOA 173 parking lot NOM 0 LOC 0
  • 13. house PropPanelEvent NOA 2 NOM 3 LOC 37
  • 15. Design Quality Assessment disharmony map ArgoUML classes brain class 8 god class 30 god + brain 6 data class 17 unaffected 1,715 SoftVis 2008
  • 16. System Evolution Analysis time traveling WCRE 2008
  • 17. System Evolution Analysis time ArgoUML time traveling 8 major releases 6 years WCRE 2008
  • 18. System Evolution Analysis time ArgoUML time traveling 8 major releases 6 years WCRE 2008
  • 19. http://codecity.inf.usi.ch implemented in Smalltalk ICSE 2008 tool demo
  • 24. Design desiderata 1 Avoid comparing using a technique against not using it. 2 Involve participants from the industry. 3 Provide a not-so-short tutorial of the experimental tool to the participants. 4 Avoid, whenever possible, giving the tutorial right before the experiment. 5 Use the tutorial to cover both the research behind the approach and the tool. 6 Find a set of relevant tasks. 7 Choose real object systems that are relevant for the tasks. 8 Include more than one object system in the design. 9 Provide the same data to all participants. 10 Limit the amount of time allowed for solving each task. 11 Provide all the details needed to make the experiment replicable. 12 Report results on individual tasks. 13 Include tasks on which the expected result is not always to the advantage of the tool being evaluated. 14 Take into account the possible wide range of experience level of the participants.
  • 25. Design desiderata 1 Avoid comparing using a technique against not using it. 2 Involve participants from the industry. 3 Provide a not-so-short tutorial of the experimental tool to the participants. 4 Avoid, whenever possible, giving the tutorial right before the experiment. 5 Use the tutorial to cover both the research behind the approach and the tool. 6 Find a set of relevant tasks. 7 Choose real object systems that are relevant for the tasks. 8 Include more than one object system in the design. 9 Provide the same data to all participants. 10 Limit the amount of time allowed for solving each task. 11 Provide all the details needed to make the experiment replicable. 12 Report results on individual tasks. 13 Include tasks on which the expected result is not always to the advantage of the tool being evaluated. 14 Take into account the possible wide range of experience level of the participants.
  • 26. Finding a baseline 1. program comprehension 2. design quality assessment 3. system evolution analysis
  • 27. Finding a baseline 1. program comprehension 2. design quality assessment 3. system evolution analysis
  • 28. Finding a baseline 1. program comprehension 2. design quality assessment 3. system evolution analysis
  • 29. Finding a baseline 1. program comprehension 2. design quality assessment
  • 30. Tasks
  • 31. Tasks program comprehension 6 A1 Identity the convention used in the system to organize unit tests. A2.1& What is the spread of term T in the name of the classes, their attributes and A2.2 methods? A3 Evaluate the change impact of class C, in terms of intensity and dispersion. A4.1 Find the three classes with the highest number of methods. Find the three classes with the highest average number of lines of code per A4.2 method.
  • 32. Tasks program comprehension 6 A1 Identity the convention used in the system to organize unit tests. A2.1& What is the spread of term T in the name of the classes, their attributes and A2.2 methods? A3 Evaluate the change impact of class C, in terms of intensity and dispersion. A4.1 Find the three classes with the highest number of methods. Find the three classes with the highest average number of lines of code per A4.2 method. B1.1 Identify the package with the highest percentage of god classes. B1.2 Identify the god class with the largest number of methods. Identify the dominant (affecting the highest number of classes) class-level B2.1 design problem. B2.2 Write an overview of the class-level design problems in the system. design quality assessment 4
  • 33. Tasks program comprehension 6 5 A1 Identity the convention used in the system to organize unit tests. A2.1& What is the spread of term T in the name of the classes, their attributes and A2.2 methods? A3 Evaluate the change impact of class C, in terms of intensity and dispersion. A4.1 Find the three classes with the highest number of methods. Find the three classes with the highest average number of lines of code per A4.2 method. B1.1 Identify the package with the highest percentage of god classes. B1.2 Identify the god class with the largest number of methods. Identify the dominant (affecting the highest number of classes) class-level B2.1 design problem. B2.2 Write an overview of the class-level design problems in the system. design quality assessment 4
  • 34. Tasks quantitative 9 8 A1 Identity the convention used in the system to organize unit tests. A2.1& What is the spread of term T in the name of the classes, their attributes and A2.2 methods? A3 Evaluate the change impact of class C, in terms of intensity and dispersion. A4.1 Find the three classes with the highest number of methods. Find the three classes with the highest average number of lines of code per A4.2 method. B1.1 Identify the package with the highest percentage of god classes. B1.2 Identify the god class with the largest number of methods. Identify the dominant (affecting the highest number of classes) class-level B2.1 design problem. B2.2 Write an overview of the class-level design problems in the system. qualitative 1
  • 36. Main research questions 1 Does the use of CodeCity increase the correctness of the solutions to program comprehension tasks, compared to non-visual exploration tools, regardless of the object system size?
  • 37. Main research questions 1 Does the use of CodeCity increase the correctness of the solutions to program comprehension tasks, compared to non-visual exploration tools, regardless of the object system size? 2 Does the use of CodeCity reduce the time needed to solve program comprehension tasks, compared to non-visual exploration tools, regardless of the object system size?
  • 38. Variables of the experiment
  • 39. Variables of the experiment correctness dependent completion time CodeCity tool Eclipse + Excel independent medium object system size large beginner experience level advanced controlled academia background industry
  • 40. Variables of the experiment correctness dependent completion time CodeCity tool Eclipse + Excel independent medium object system size large beginner experience level advanced controlled academia background industry
  • 41. Variables of the experiment correctness dependent completion time CodeCity tool Eclipse + Excel independent medium object system size large beginner experience level advanced controlled academia background industry
  • 42. Variables of the experiment correctness dependent FindBugs completion time 1,320 classes 93,310 LOC CodeCity tool Eclipse + Excel independent medium object system size large beginner experience level advanced Azureus controlled academia 4,656 classes background industry 454,387 LOC
  • 43. Variables of the experiment correctness dependent completion time CodeCity tool Eclipse + Excel independent medium object system size large beginner experience level advanced controlled academia background industry
  • 44. Variables of the experiment correctness dependent completion time CodeCity tool Eclipse + Excel independent medium object system size large beginner experience level advanced controlled academia background industry
  • 45. The experiment’s design between-subjects randomized-block
  • 46. The experiment’s design between-subjects randomized-block CodeCity T1 large Tool T2 medium Size Ecl+Excl T3 large T4 medium
  • 47. The experiment’s design background academia industry experience beginner advanced beginner advanced B1 B2 B3 B4 between-subjects randomized-block CodeCity T1 large Tool T2 medium Size Ecl+Excl T3 large T4 medium
  • 48. The experiment’s design background academia industry experience beginner advanced beginner advanced B1 B2 B3 B4 between-subjects randomized-block CodeCity T1 large Tool T2 medium Size Ecl+Excl T3 large T4 medium
  • 51. Experimental runs day 1 time training session (1 hour)
  • 52. Experimental runs day 1 time training session (1 hour) e1 experiment session (2 hours) c1
  • 53. Experimental runs day 1 day 2 time training session (1 hour) e1 e2 experiment session (2 hours) c1 c2
  • 54. Experimental runs day 1 day 2 day 3 time training session (1 hour) e1 e2 e3 experiment session (2 hours) c1 c2
  • 55. Experimental runs day 1 day 2 day 3 day 4 time training session (1 hour) e1 e2 e3 experiment session (2 hours) c1 c2 c4
  • 56. Testing the waters 2009 November December 18 24 25 2 9 Lugano 1 3 1 1 1 1 1
  • 57. Timeline of the experiment 2009 2010 November December January February ... April 18 24 25 2 9 21 28 5 8 14 28 18 22 24 25 14 Lugano 1 3 1 1 1 1 1 1 1 1 3 Bologna 2 1 6 1 1 1 1 Antwerp 5 6 Bern 4 1 6
  • 58. Timeline of the experiment 2009 2010 November December January February ... April 18 24 25 2 9 21 28 5 8 14 28 18 22 24 25 14 Lugano 1 3 1 1 1 1 1 1 1 1 3 remote sessions Bologna 2 1 6 1 1 1 1 Antwerp 5 6 remote session Bern 4 1 6
  • 59. Treatments and subjects academia industry beginner advanced advanced large 2 2 6 10 CodeCity medium 3 2 7 12 large 2 3 3 8 Ecl+Excl medium 2 5 4 11 9 12 20 41
  • 62. Collecting raw data completion time
  • 65. Controlling time common time info on subjects Name (Task): Remaining time
  • 66. Assessing correctness 1 T2: Findbugs, analyzed with CodeCity A3: Impact Analysis B1.2 A1 Multiple locations. The god class containing the largest number of methods in the system is There are 40/41 [0.5pts] classes class MainFrame [0.8pts] Dispersed. [1pt] defined in the following 3 packages [1/6pts for each]: defined in package edu.umd.cs.findbugs.gui2 [0.1pts] which contains 119 [0.1pts] methods. • edu.umd.cs.findbugs A2.1 • edu.umd.cs.findbugs.bcel B2.1 Localized [0.5pts] in package edu.umd.cs.findbugs.detect [0.5pts]. • edu.umd.cs.findbugs.detect The dominant class-level design problem is DataClass [0.5pts] A2.2 A4.1 which affects a number of 67 [0.5pts] classes. Dispersed The 3 classes with the highest number of methods are [ 1 pts each correctly placed and 1 pts each misplaced]: 3 6 in the following (max. 5) packages [0.2pts for each]: 1. class AbstractFrameModelingVisitor • edu.umd.cs.findbugs defined in package edu.umd.cs.findbugs.ba contains 195 methods; • edu.umd.cs.findbugs.anttask 2. class MainFrame • edu.umd.cs.findbugs.ba defined in package edu.umd.cs.findbugs.gui2 contains 119 methods; • edu.umd.cs.findbugs.ba.deref 3. class BugInstance • edu.umd.cs.findbugs.ba.jsr305 defined in package edu.umd.cs.findbugs • edu.umd.cs.findbugs.ba.npe contains 118 methods or • edu.umd.cs.findbugs.ba.vna class TypeFrameModelingVisitor defined in package edu.umd.cs.findbugs.ba.type • edu.umd.cs.findbugs.bcel contains 118 methods. • edu.umd.cs.findbugs.classfile A4.2 • edu.umd.cs.findbugs.classfile.analysis The 3 classes with the highest average number of lines of code per method are [ 1 pts each correctly placed and 1 pts each • edu.umd.cs.findbugs.classfile.engine 3 6 misplaced]: • edu.umd.cs.findbugs.classfile.impl 1. class DefaultNullnessAnnotations • edu.umd.cs.findbugs.cloud defined in package edu.umd.cs.findbugs.ba has an average of 124 lines of code per method; • edu.umd.cs.findbugs.cloud.db 2. class DBCloud.PopulateBugs • edu.umd.cs.findbugs.detect defined in package edu.umd.cs.findbugs.cloud.db has an average of 114.5 lines of code per method; • edu.umd.cs.findbugs.gui 3. class BytecodeScanner • edu.umd.cs.findbugs.gui2 defined in package edu.umd.cs.findbugs.ba • edu.umd.cs.findbugs.jaif has an average of 80.75 lines of code per method. • edu.umd.cs.findbugs.model B1.1 • edu.umd.cs.findbugs.visitclass oracles The package with the highest percentage of god classes in the system is • edu.umd.cs.findbugs.workflow edu.umd.cs.findbugs.ba.deref [0.8pts] which contains 1 [0.1pts] god classes out of a total of 3 [0.1pts] classes.
  • 67. Assessing correctness blinding 1 T2: Findbugs, analyzed with CodeCity A3: Impact Analysis B1.2 A1 Multiple locations. The god class containing the largest number of methods in the system is There are 40/41 [0.5pts] classes class MainFrame [0.8pts] Dispersed. [1pt] defined in the following 3 packages [1/6pts for each]: defined in package edu.umd.cs.findbugs.gui2 [0.1pts] which contains 119 [0.1pts] methods. • edu.umd.cs.findbugs A2.1 • edu.umd.cs.findbugs.bcel B2.1 Localized [0.5pts] in package edu.umd.cs.findbugs.detect [0.5pts]. • edu.umd.cs.findbugs.detect The dominant class-level design problem is DataClass [0.5pts] A2.2 A4.1 which affects a number of 67 [0.5pts] classes. Dispersed The 3 classes with the highest number of methods are [ 1 pts each correctly placed and 1 pts each misplaced]: 3 6 in the following (max. 5) packages [0.2pts for each]: 1. class AbstractFrameModelingVisitor • edu.umd.cs.findbugs defined in package edu.umd.cs.findbugs.ba contains 195 methods; • edu.umd.cs.findbugs.anttask 2. class MainFrame • edu.umd.cs.findbugs.ba defined in package edu.umd.cs.findbugs.gui2 contains 119 methods; • edu.umd.cs.findbugs.ba.deref 3. class BugInstance • edu.umd.cs.findbugs.ba.jsr305 defined in package edu.umd.cs.findbugs • edu.umd.cs.findbugs.ba.npe contains 118 methods or • edu.umd.cs.findbugs.ba.vna class TypeFrameModelingVisitor defined in package edu.umd.cs.findbugs.ba.type • edu.umd.cs.findbugs.bcel contains 118 methods. • edu.umd.cs.findbugs.classfile A4.2 • edu.umd.cs.findbugs.classfile.analysis The 3 classes with the highest average number of lines of code per method are [ 1 pts each correctly placed and 1 pts each • edu.umd.cs.findbugs.classfile.engine 3 6 misplaced]: • edu.umd.cs.findbugs.classfile.impl 1. class DefaultNullnessAnnotations • edu.umd.cs.findbugs.cloud defined in package edu.umd.cs.findbugs.ba has an average of 124 lines of code per method; • edu.umd.cs.findbugs.cloud.db 2. class DBCloud.PopulateBugs • edu.umd.cs.findbugs.detect defined in package edu.umd.cs.findbugs.cloud.db has an average of 114.5 lines of code per method; • edu.umd.cs.findbugs.gui 3. class BytecodeScanner • edu.umd.cs.findbugs.gui2 defined in package edu.umd.cs.findbugs.ba • edu.umd.cs.findbugs.jaif has an average of 80.75 lines of code per method. • edu.umd.cs.findbugs.model B1.1 • edu.umd.cs.findbugs.visitclass oracles The package with the highest percentage of god classes in the system is • edu.umd.cs.findbugs.workflow edu.umd.cs.findbugs.ba.deref [0.8pts] which contains 1 [0.1pts] god classes out of a total of 3 [0.1pts] classes.
  • 69. Statistical test two-way analysis of variance (ANOVA) 95% confidence interval
  • 70. Correctness Ecl+Excl CodeCity Does the use of CodeCity increase the 8 correctness of the solutions to program comprehension tasks, compared to non- 7 visual exploration tools, regardless of the 6 object system size? 5 4 3 2 1 0 medium large
  • 71. Correctness Ecl+Excl CodeCity Does the use of CodeCity increase the 8 correctness of the solutions to program comprehension tasks, compared to non- 7 visual exploration tools, regardless of the 6 object system size? 5 24.26% 4 3 2 more correct with CodeCity 1 large effect size (d=0.89) 0 medium large
  • 72. Correctness Ecl+Excl CodeCity Does the use of CodeCity increase the 8 correctness of the solutions to program comprehension tasks, compared to non- 7 visual exploration tools, regardless of the object system size? 6 24.26% 5 4 more correct with CodeCity 3 large effect size (d=0.89) 2 medium large
  • 73. Completion time Ecl+Excl CodeCity Does the use of CodeCity reduce the time 60 needed to solve program comprehension tasks, compared to non-visual exploration 50 tools, regardless of the object system size? 40 30 20 10 0 medium large
  • 74. Completion time Ecl+Excl CodeCity Does the use of CodeCity reduce the time 60 needed to solve program comprehension tasks, compared to non-visual exploration 50 tools, regardless of the object system size? 40 12.01% 30 20 less time with CodeCity 10 moderate effect size (d=0.63) 0 medium large
  • 75. Completion time Ecl+Excl CodeCity Does the use of CodeCity reduce the time 60 needed to solve program comprehension tasks, compared to non-visual exploration tools, regardless of the object system size? 50 12.01% 40 30 less time with CodeCity moderate effect size (d=0.63) 20 medium large
  • 76. after the first round CodeCity vs Ecl+Excl +24% correctness -12% completion time

Editor's Notes

  1. Hi, I’m Richard Wettel and I am here to present you Software Systems as Cities: A Controlled Experiment. This is work I did while I was a PhD student at the University of Lugano, in collaboration with my advisor, Michele Lanza, and with Romain Robbes, from the University of Chile.\n
  2. Before diving into the controlled experiment, I’d like to give you a brief description of the approach evaluated in this work. Our approach, which is a software visualization approach, addresses mostly object-oriented software and is based on the following metaphor...\n
  3. The system is a city, its packages are the city’s districts and its classes, the city’s buildings. The visible properties of the city artifacts reflect a set of software metrics. One of the configuration we use often is the following: the nesting level of the package mapped on the colors of the district (from dark to light grays), while for the classes, the number of methods is mapped on the building’s height, the number of attributes on its base size, and the number of lines of code on the color of the buildings: from dark gray to intense blue. Since software has so many facets, it was important for us to have a versatile metaphor that can be employed in different contexts. We applied it in the following three:\n
  4. The system is a city, its packages are the city’s districts and its classes, the city’s buildings. The visible properties of the city artifacts reflect a set of software metrics. One of the configuration we use often is the following: the nesting level of the package mapped on the colors of the district (from dark to light grays), while for the classes, the number of methods is mapped on the building’s height, the number of attributes on its base size, and the number of lines of code on the color of the buildings: from dark gray to intense blue. Since software has so many facets, it was important for us to have a versatile metaphor that can be employed in different contexts. We applied it in the following three:\n
  5. The system is a city, its packages are the city’s districts and its classes, the city’s buildings. The visible properties of the city artifacts reflect a set of software metrics. One of the configuration we use often is the following: the nesting level of the package mapped on the colors of the district (from dark to light grays), while for the classes, the number of methods is mapped on the building’s height, the number of attributes on its base size, and the number of lines of code on the color of the buildings: from dark gray to intense blue. Since software has so many facets, it was important for us to have a versatile metaphor that can be employed in different contexts. We applied it in the following three:\n
  6. The system is a city, its packages are the city’s districts and its classes, the city’s buildings. The visible properties of the city artifacts reflect a set of software metrics. One of the configuration we use often is the following: the nesting level of the package mapped on the colors of the district (from dark to light grays), while for the classes, the number of methods is mapped on the building’s height, the number of attributes on its base size, and the number of lines of code on the color of the buildings: from dark gray to intense blue. Since software has so many facets, it was important for us to have a versatile metaphor that can be employed in different contexts. We applied it in the following three:\n
  7. The system is a city, its packages are the city’s districts and its classes, the city’s buildings. The visible properties of the city artifacts reflect a set of software metrics. One of the configuration we use often is the following: the nesting level of the package mapped on the colors of the district (from dark to light grays), while for the classes, the number of methods is mapped on the building’s height, the number of attributes on its base size, and the number of lines of code on the color of the buildings: from dark gray to intense blue. Since software has so many facets, it was important for us to have a versatile metaphor that can be employed in different contexts. We applied it in the following three:\n
  8. The system is a city, its packages are the city’s districts and its classes, the city’s buildings. The visible properties of the city artifacts reflect a set of software metrics. One of the configuration we use often is the following: the nesting level of the package mapped on the colors of the district (from dark to light grays), while for the classes, the number of methods is mapped on the building’s height, the number of attributes on its base size, and the number of lines of code on the color of the buildings: from dark gray to intense blue. Since software has so many facets, it was important for us to have a versatile metaphor that can be employed in different contexts. We applied it in the following three:\n
  9. The system is a city, its packages are the city’s districts and its classes, the city’s buildings. The visible properties of the city artifacts reflect a set of software metrics. One of the configuration we use often is the following: the nesting level of the package mapped on the colors of the district (from dark to light grays), while for the classes, the number of methods is mapped on the building’s height, the number of attributes on its base size, and the number of lines of code on the color of the buildings: from dark gray to intense blue. Since software has so many facets, it was important for us to have a versatile metaphor that can be employed in different contexts. We applied it in the following three:\n
  10. The system is a city, its packages are the city’s districts and its classes, the city’s buildings. The visible properties of the city artifacts reflect a set of software metrics. One of the configuration we use often is the following: the nesting level of the package mapped on the colors of the district (from dark to light grays), while for the classes, the number of methods is mapped on the building’s height, the number of attributes on its base size, and the number of lines of code on the color of the buildings: from dark gray to intense blue. Since software has so many facets, it was important for us to have a versatile metaphor that can be employed in different contexts. We applied it in the following three:\n
  11. The system is a city, its packages are the city’s districts and its classes, the city’s buildings. The visible properties of the city artifacts reflect a set of software metrics. One of the configuration we use often is the following: the nesting level of the package mapped on the colors of the district (from dark to light grays), while for the classes, the number of methods is mapped on the building’s height, the number of attributes on its base size, and the number of lines of code on the color of the buildings: from dark gray to intense blue. Since software has so many facets, it was important for us to have a versatile metaphor that can be employed in different contexts. We applied it in the following three:\n
  12. The system is a city, its packages are the city’s districts and its classes, the city’s buildings. The visible properties of the city artifacts reflect a set of software metrics. One of the configuration we use often is the following: the nesting level of the package mapped on the colors of the district (from dark to light grays), while for the classes, the number of methods is mapped on the building’s height, the number of attributes on its base size, and the number of lines of code on the color of the buildings: from dark gray to intense blue. Since software has so many facets, it was important for us to have a versatile metaphor that can be employed in different contexts. We applied it in the following three:\n
  13. Program comprehension. This is a code city of ArgoUML, a Java system about 140 thousand lines of code. The visualization gives us a structural overview of the system and reveals several patterns in the form of building archetypes.\n
  14. Program comprehension. This is a code city of ArgoUML, a Java system about 140 thousand lines of code. The visualization gives us a structural overview of the system and reveals several patterns in the form of building archetypes.\n
  15. Antenna-like skyscrapers, representing classes with many methods and few attributes,\n
  16. Antenna-like skyscrapers, representing classes with many methods and few attributes,\n
  17. Antenna-like skyscrapers, representing classes with many methods and few attributes,\n
  18. Antenna-like skyscrapers, representing classes with many methods and few attributes,\n
  19. Antenna-like skyscrapers, representing classes with many methods and few attributes,\n
  20. Antenna-like skyscrapers, representing classes with many methods and few attributes,\n
  21. Office buildings for classes with many methods and many attributes,\n
  22. Office buildings for classes with many methods and many attributes,\n
  23. Office buildings for classes with many methods and many attributes,\n
  24. Office buildings for classes with many methods and many attributes,\n
  25. Office buildings for classes with many methods and many attributes,\n
  26. Office buildings for classes with many methods and many attributes,\n
  27. Parking lots, for classes with few methods and a lot of attributes, as in the case of this huge Java interface (the color of the parking lot shows that it does not contain any code, just a whole bunch of constants)\n
  28. Parking lots, for classes with few methods and a lot of attributes, as in the case of this huge Java interface (the color of the parking lot shows that it does not contain any code, just a whole bunch of constants)\n
  29. Parking lots, for classes with few methods and a lot of attributes, as in the case of this huge Java interface (the color of the parking lot shows that it does not contain any code, just a whole bunch of constants)\n
  30. Parking lots, for classes with few methods and a lot of attributes, as in the case of this huge Java interface (the color of the parking lot shows that it does not contain any code, just a whole bunch of constants)\n
  31. Parking lots, for classes with few methods and a lot of attributes, as in the case of this huge Java interface (the color of the parking lot shows that it does not contain any code, just a whole bunch of constants)\n
  32. Parking lots, for classes with few methods and a lot of attributes, as in the case of this huge Java interface (the color of the parking lot shows that it does not contain any code, just a whole bunch of constants)\n
  33. Or houses, for small classes with few attributes and methods.\n
  34. Or houses, for small classes with few attributes and methods.\n
  35. Or houses, for small classes with few attributes and methods.\n
  36. Or houses, for small classes with few attributes and methods.\n
  37. Or houses, for small classes with few attributes and methods.\n
  38. Or houses, for small classes with few attributes and methods.\n
  39. This was the first context, program comprehension.\n
  40. The second context is assessing the quality of software design. For this we use code smells, which are violation of the rules of good object-oriented design. For example brain class and god class are design problems related to extensive complexity and lack of collaboration, essential in an object-oriented system. A data class is the opposite, a bare container of data, which does not have any behavior. \nIn our approach, we assign vivid colors to classes affected by such design problems, while the unaffected ones are gray and thus appear muted. This visualization called disharmony map enables us to focus on the problems in the context of the entire system.\n
  41. The third application context of our approach is system evolution analysis. One of the techniques we developed in this context is time travel which allows us to watch the evolution of the city and thus of the system represented by it. I won’t get into details here, because, for reasons I’ll explain later, we did not evaluate our approach in this context, in spite of its potential.\n
  42. The third application context of our approach is system evolution analysis. One of the techniques we developed in this context is time travel which allows us to watch the evolution of the city and thus of the system represented by it. I won’t get into details here, because, for reasons I’ll explain later, we did not evaluate our approach in this context, in spite of its potential.\n
  43. The third application context of our approach is system evolution analysis. One of the techniques we developed in this context is time travel which allows us to watch the evolution of the city and thus of the system represented by it. I won’t get into details here, because, for reasons I’ll explain later, we did not evaluate our approach in this context, in spite of its potential.\n
  44. The third application context of our approach is system evolution analysis. One of the techniques we developed in this context is time travel which allows us to watch the evolution of the city and thus of the system represented by it. I won’t get into details here, because, for reasons I’ll explain later, we did not evaluate our approach in this context, in spite of its potential.\n
  45. The third application context of our approach is system evolution analysis. One of the techniques we developed in this context is time travel which allows us to watch the evolution of the city and thus of the system represented by it. I won’t get into details here, because, for reasons I’ll explain later, we did not evaluate our approach in this context, in spite of its potential.\n
  46. The third application context of our approach is system evolution analysis. One of the techniques we developed in this context is time travel which allows us to watch the evolution of the city and thus of the system represented by it. I won’t get into details here, because, for reasons I’ll explain later, we did not evaluate our approach in this context, in spite of its potential.\n
  47. The third application context of our approach is system evolution analysis. One of the techniques we developed in this context is time travel which allows us to watch the evolution of the city and thus of the system represented by it. I won’t get into details here, because, for reasons I’ll explain later, we did not evaluate our approach in this context, in spite of its potential.\n
  48. The third application context of our approach is system evolution analysis. One of the techniques we developed in this context is time travel which allows us to watch the evolution of the city and thus of the system represented by it. I won’t get into details here, because, for reasons I’ll explain later, we did not evaluate our approach in this context, in spite of its potential.\n
  49. The third application context of our approach is system evolution analysis. One of the techniques we developed in this context is time travel which allows us to watch the evolution of the city and thus of the system represented by it. I won’t get into details here, because, for reasons I’ll explain later, we did not evaluate our approach in this context, in spite of its potential.\n
  50. The third application context of our approach is system evolution analysis. One of the techniques we developed in this context is time travel which allows us to watch the evolution of the city and thus of the system represented by it. I won’t get into details here, because, for reasons I’ll explain later, we did not evaluate our approach in this context, in spite of its potential.\n
  51. The third application context of our approach is system evolution analysis. One of the techniques we developed in this context is time travel which allows us to watch the evolution of the city and thus of the system represented by it. I won’t get into details here, because, for reasons I’ll explain later, we did not evaluate our approach in this context, in spite of its potential.\n
  52. The software systems as cities approach is implemented in a freely-available tool called CodeCity.\n
  53. PAUSE\nAnd yet, what does this all mean?\nWe needed to take a pragmatic stance and wonder whether this approach could be useful to practitioners, too.\n
  54. This is what our control experiment is aimed at finding out.\nFrom this experiment, I’d like to leave you with not only the results, but also the various decisions we took towards obtaining these results.\n
  55. A controlled experiment will be at most as good as its design.\n
  56. We first performed an extensive study of the works related to empirical evaluation of information visualization in general and of software visualization in particular.\n
  57. To synthesize the lessons learned from the existing body of knowledge, we built a list of design desiderata, which we used as guidelines for the design our experiment.\n
  58. Here are the ones where we could improve over the existing experiments...\n
  59. The first thing we had to consider was the baseline. Ideally, we’d have a tool that supports all the three context that our approach supports. Unfortunately, we could not find one. Therefore, we started building the baseline from several tools.\n
  60. The first thing we had to consider was the baseline. Ideally, we’d have a tool that supports all the three context that our approach supports. Unfortunately, we could not find one. Therefore, we started building the baseline from several tools.\n
  61. The first thing we had to consider was the baseline. Ideally, we’d have a tool that supports all the three context that our approach supports. Unfortunately, we could not find one. Therefore, we started building the baseline from several tools.\n
  62. The first thing we had to consider was the baseline. Ideally, we’d have a tool that supports all the three context that our approach supports. Unfortunately, we could not find one. Therefore, we started building the baseline from several tools.\n
  63. The first thing we had to consider was the baseline. Ideally, we’d have a tool that supports all the three context that our approach supports. Unfortunately, we could not find one. Therefore, we started building the baseline from several tools.\n
  64. After we knew what we were evaluating, we started designing the set of tasks. We have 6 program comprehension tasks, PAUSE\nand 4 design quality assessment tasks.\n
  65. After we knew what we were evaluating, we started designing the set of tasks. We have 6 program comprehension tasks, PAUSE\nand 4 design quality assessment tasks.\n
  66. After we knew what we were evaluating, we started designing the set of tasks. We have 6 program comprehension tasks, PAUSE\nand 4 design quality assessment tasks.\n
  67. After we knew what we were evaluating, we started designing the set of tasks. We have 6 program comprehension tasks, PAUSE\nand 4 design quality assessment tasks.\n
  68. After we knew what we were evaluating, we started designing the set of tasks. We have 6 program comprehension tasks, PAUSE\nand 4 design quality assessment tasks.\n
  69. After we knew what we were evaluating, we started designing the set of tasks. We have 6 program comprehension tasks, PAUSE\nand 4 design quality assessment tasks.\n
  70. After we knew what we were evaluating, we started designing the set of tasks. We have 6 program comprehension tasks, PAUSE\nand 4 design quality assessment tasks.\n
  71. After we knew what we were evaluating, we started designing the set of tasks. We have 6 program comprehension tasks, PAUSE\nand 4 design quality assessment tasks.\n
  72. Another classification of our tasks is: 9 quantitative and 1 qualitative (which is not considered in our quantitative test).\n
  73. With this task set, we wanted to find out whether the use of our approach brings any benefits over the use of the baseline, in terms of correctness of the solutions (that is, a grade for the solution, as the ones you give your students for any assignment). And we were also interested in the potential improvements in terms of completion time. \n
  74. With this task set, we wanted to find out whether the use of our approach brings any benefits over the use of the baseline, in terms of correctness of the solutions (that is, a grade for the solution, as the ones you give your students for any assignment). And we were also interested in the potential improvements in terms of completion time. \n
  75. With this task set, we wanted to find out whether the use of our approach brings any benefits over the use of the baseline, in terms of correctness of the solutions (that is, a grade for the solution, as the ones you give your students for any assignment). And we were also interested in the potential improvements in terms of completion time. \n
  76. With this task set, we wanted to find out whether the use of our approach brings any benefits over the use of the baseline, in terms of correctness of the solutions (that is, a grade for the solution, as the ones you give your students for any assignment). And we were also interested in the potential improvements in terms of completion time. \n
  77. Based on the questions, we have the following variables for our experiment. Dependent variables are the ones we measure: correctness and completion time. Independent variables are the ones whose effect we want to measure. Here we find the tool used to solve the tasks (CodeCity vs baseline). Moreover, we wanted to see whether the advantages of using our approach scale with the object system size, which is the second independent variable. This variable has two levels: medium and large and for these we chose two Java systems of different magnitudes and also different application domain (FindBugs is a bug detection tool, while Azureus is a peer-to-peer client). Finally, we wanted to eliminate the effect of background and experience level on the outcome of the experiment, and therefore me made them controlled variables.\n
  78. Based on the questions, we have the following variables for our experiment. Dependent variables are the ones we measure: correctness and completion time. Independent variables are the ones whose effect we want to measure. Here we find the tool used to solve the tasks (CodeCity vs baseline). Moreover, we wanted to see whether the advantages of using our approach scale with the object system size, which is the second independent variable. This variable has two levels: medium and large and for these we chose two Java systems of different magnitudes and also different application domain (FindBugs is a bug detection tool, while Azureus is a peer-to-peer client). Finally, we wanted to eliminate the effect of background and experience level on the outcome of the experiment, and therefore me made them controlled variables.\n
  79. Based on the questions, we have the following variables for our experiment. Dependent variables are the ones we measure: correctness and completion time. Independent variables are the ones whose effect we want to measure. Here we find the tool used to solve the tasks (CodeCity vs baseline). Moreover, we wanted to see whether the advantages of using our approach scale with the object system size, which is the second independent variable. This variable has two levels: medium and large and for these we chose two Java systems of different magnitudes and also different application domain (FindBugs is a bug detection tool, while Azureus is a peer-to-peer client). Finally, we wanted to eliminate the effect of background and experience level on the outcome of the experiment, and therefore me made them controlled variables.\n
  80. Based on the questions, we have the following variables for our experiment. Dependent variables are the ones we measure: correctness and completion time. Independent variables are the ones whose effect we want to measure. Here we find the tool used to solve the tasks (CodeCity vs baseline). Moreover, we wanted to see whether the advantages of using our approach scale with the object system size, which is the second independent variable. This variable has two levels: medium and large and for these we chose two Java systems of different magnitudes and also different application domain (FindBugs is a bug detection tool, while Azureus is a peer-to-peer client). Finally, we wanted to eliminate the effect of background and experience level on the outcome of the experiment, and therefore me made them controlled variables.\n
  81. Based on the questions, we have the following variables for our experiment. Dependent variables are the ones we measure: correctness and completion time. Independent variables are the ones whose effect we want to measure. Here we find the tool used to solve the tasks (CodeCity vs baseline). Moreover, we wanted to see whether the advantages of using our approach scale with the object system size, which is the second independent variable. This variable has two levels: medium and large and for these we chose two Java systems of different magnitudes and also different application domain (FindBugs is a bug detection tool, while Azureus is a peer-to-peer client). Finally, we wanted to eliminate the effect of background and experience level on the outcome of the experiment, and therefore me made them controlled variables.\n
  82. Based on the questions, we have the following variables for our experiment. Dependent variables are the ones we measure: correctness and completion time. Independent variables are the ones whose effect we want to measure. Here we find the tool used to solve the tasks (CodeCity vs baseline). Moreover, we wanted to see whether the advantages of using our approach scale with the object system size, which is the second independent variable. This variable has two levels: medium and large and for these we chose two Java systems of different magnitudes and also different application domain (FindBugs is a bug detection tool, while Azureus is a peer-to-peer client). Finally, we wanted to eliminate the effect of background and experience level on the outcome of the experiment, and therefore me made them controlled variables.\n
  83. Based on the questions, we have the following variables for our experiment. Dependent variables are the ones we measure: correctness and completion time. Independent variables are the ones whose effect we want to measure. Here we find the tool used to solve the tasks (CodeCity vs baseline). Moreover, we wanted to see whether the advantages of using our approach scale with the object system size, which is the second independent variable. This variable has two levels: medium and large and for these we chose two Java systems of different magnitudes and also different application domain (FindBugs is a bug detection tool, while Azureus is a peer-to-peer client). Finally, we wanted to eliminate the effect of background and experience level on the outcome of the experiment, and therefore me made them controlled variables.\n
  84. Based on the questions, we have the following variables for our experiment. Dependent variables are the ones we measure: correctness and completion time. Independent variables are the ones whose effect we want to measure. Here we find the tool used to solve the tasks (CodeCity vs baseline). Moreover, we wanted to see whether the advantages of using our approach scale with the object system size, which is the second independent variable. This variable has two levels: medium and large and for these we chose two Java systems of different magnitudes and also different application domain (FindBugs is a bug detection tool, while Azureus is a peer-to-peer client). Finally, we wanted to eliminate the effect of background and experience level on the outcome of the experiment, and therefore me made them controlled variables.\n
  85. Based on the questions, we have the following variables for our experiment. Dependent variables are the ones we measure: correctness and completion time. Independent variables are the ones whose effect we want to measure. Here we find the tool used to solve the tasks (CodeCity vs baseline). Moreover, we wanted to see whether the advantages of using our approach scale with the object system size, which is the second independent variable. This variable has two levels: medium and large and for these we chose two Java systems of different magnitudes and also different application domain (FindBugs is a bug detection tool, while Azureus is a peer-to-peer client). Finally, we wanted to eliminate the effect of background and experience level on the outcome of the experiment, and therefore me made them controlled variables.\n
  86. Based on the questions, we have the following variables for our experiment. Dependent variables are the ones we measure: correctness and completion time. Independent variables are the ones whose effect we want to measure. Here we find the tool used to solve the tasks (CodeCity vs baseline). Moreover, we wanted to see whether the advantages of using our approach scale with the object system size, which is the second independent variable. This variable has two levels: medium and large and for these we chose two Java systems of different magnitudes and also different application domain (FindBugs is a bug detection tool, while Azureus is a peer-to-peer client). Finally, we wanted to eliminate the effect of background and experience level on the outcome of the experiment, and therefore me made them controlled variables.\n
  87. Based on the questions, we have the following variables for our experiment. Dependent variables are the ones we measure: correctness and completion time. Independent variables are the ones whose effect we want to measure. Here we find the tool used to solve the tasks (CodeCity vs baseline). Moreover, we wanted to see whether the advantages of using our approach scale with the object system size, which is the second independent variable. This variable has two levels: medium and large and for these we chose two Java systems of different magnitudes and also different application domain (FindBugs is a bug detection tool, while Azureus is a peer-to-peer client). Finally, we wanted to eliminate the effect of background and experience level on the outcome of the experiment, and therefore me made them controlled variables.\n
  88. Based on the questions, we have the following variables for our experiment. Dependent variables are the ones we measure: correctness and completion time. Independent variables are the ones whose effect we want to measure. Here we find the tool used to solve the tasks (CodeCity vs baseline). Moreover, we wanted to see whether the advantages of using our approach scale with the object system size, which is the second independent variable. This variable has two levels: medium and large and for these we chose two Java systems of different magnitudes and also different application domain (FindBugs is a bug detection tool, while Azureus is a peer-to-peer client). Finally, we wanted to eliminate the effect of background and experience level on the outcome of the experiment, and therefore me made them controlled variables.\n
  89. Based on the questions, we have the following variables for our experiment. Dependent variables are the ones we measure: correctness and completion time. Independent variables are the ones whose effect we want to measure. Here we find the tool used to solve the tasks (CodeCity vs baseline). Moreover, we wanted to see whether the advantages of using our approach scale with the object system size, which is the second independent variable. This variable has two levels: medium and large and for these we chose two Java systems of different magnitudes and also different application domain (FindBugs is a bug detection tool, while Azureus is a peer-to-peer client). Finally, we wanted to eliminate the effect of background and experience level on the outcome of the experiment, and therefore me made them controlled variables.\n
  90. Based on the questions, we have the following variables for our experiment. Dependent variables are the ones we measure: correctness and completion time. Independent variables are the ones whose effect we want to measure. Here we find the tool used to solve the tasks (CodeCity vs baseline). Moreover, we wanted to see whether the advantages of using our approach scale with the object system size, which is the second independent variable. This variable has two levels: medium and large and for these we chose two Java systems of different magnitudes and also different application domain (FindBugs is a bug detection tool, while Azureus is a peer-to-peer client). Finally, we wanted to eliminate the effect of background and experience level on the outcome of the experiment, and therefore me made them controlled variables.\n
  91. Based on the questions, we have the following variables for our experiment. Dependent variables are the ones we measure: correctness and completion time. Independent variables are the ones whose effect we want to measure. Here we find the tool used to solve the tasks (CodeCity vs baseline). Moreover, we wanted to see whether the advantages of using our approach scale with the object system size, which is the second independent variable. This variable has two levels: medium and large and for these we chose two Java systems of different magnitudes and also different application domain (FindBugs is a bug detection tool, while Azureus is a peer-to-peer client). Finally, we wanted to eliminate the effect of background and experience level on the outcome of the experiment, and therefore me made them controlled variables.\n
  92. Based on the questions, we have the following variables for our experiment. Dependent variables are the ones we measure: correctness and completion time. Independent variables are the ones whose effect we want to measure. Here we find the tool used to solve the tasks (CodeCity vs baseline). Moreover, we wanted to see whether the advantages of using our approach scale with the object system size, which is the second independent variable. This variable has two levels: medium and large and for these we chose two Java systems of different magnitudes and also different application domain (FindBugs is a bug detection tool, while Azureus is a peer-to-peer client). Finally, we wanted to eliminate the effect of background and experience level on the outcome of the experiment, and therefore me made them controlled variables.\n
  93. Based on the questions, we have the following variables for our experiment. Dependent variables are the ones we measure: correctness and completion time. Independent variables are the ones whose effect we want to measure. Here we find the tool used to solve the tasks (CodeCity vs baseline). Moreover, we wanted to see whether the advantages of using our approach scale with the object system size, which is the second independent variable. This variable has two levels: medium and large and for these we chose two Java systems of different magnitudes and also different application domain (FindBugs is a bug detection tool, while Azureus is a peer-to-peer client). Finally, we wanted to eliminate the effect of background and experience level on the outcome of the experiment, and therefore me made them controlled variables.\n
  94. Based on the questions, we have the following variables for our experiment. Dependent variables are the ones we measure: correctness and completion time. Independent variables are the ones whose effect we want to measure. Here we find the tool used to solve the tasks (CodeCity vs baseline). Moreover, we wanted to see whether the advantages of using our approach scale with the object system size, which is the second independent variable. This variable has two levels: medium and large and for these we chose two Java systems of different magnitudes and also different application domain (FindBugs is a bug detection tool, while Azureus is a peer-to-peer client). Finally, we wanted to eliminate the effect of background and experience level on the outcome of the experiment, and therefore me made them controlled variables.\n
  95. Based on the questions, we have the following variables for our experiment. Dependent variables are the ones we measure: correctness and completion time. Independent variables are the ones whose effect we want to measure. Here we find the tool used to solve the tasks (CodeCity vs baseline). Moreover, we wanted to see whether the advantages of using our approach scale with the object system size, which is the second independent variable. This variable has two levels: medium and large and for these we chose two Java systems of different magnitudes and also different application domain (FindBugs is a bug detection tool, while Azureus is a peer-to-peer client). Finally, we wanted to eliminate the effect of background and experience level on the outcome of the experiment, and therefore me made them controlled variables.\n
  96. Based on the questions, we have the following variables for our experiment. Dependent variables are the ones we measure: correctness and completion time. Independent variables are the ones whose effect we want to measure. Here we find the tool used to solve the tasks (CodeCity vs baseline). Moreover, we wanted to see whether the advantages of using our approach scale with the object system size, which is the second independent variable. This variable has two levels: medium and large and for these we chose two Java systems of different magnitudes and also different application domain (FindBugs is a bug detection tool, while Azureus is a peer-to-peer client). Finally, we wanted to eliminate the effect of background and experience level on the outcome of the experiment, and therefore me made them controlled variables.\n
  97. Based on the questions, we have the following variables for our experiment. Dependent variables are the ones we measure: correctness and completion time. Independent variables are the ones whose effect we want to measure. Here we find the tool used to solve the tasks (CodeCity vs baseline). Moreover, we wanted to see whether the advantages of using our approach scale with the object system size, which is the second independent variable. This variable has two levels: medium and large and for these we chose two Java systems of different magnitudes and also different application domain (FindBugs is a bug detection tool, while Azureus is a peer-to-peer client). Finally, we wanted to eliminate the effect of background and experience level on the outcome of the experiment, and therefore me made them controlled variables.\n
  98. And here we have these variables at play. On the one hand, we have two controlled variables, whose combination results in four blocks. On the other hand, the combination of the two independent variables results in 4 treatments (two experimental and two control). Our design is a between-subjects (every subject receives either a control treatment or an experimental one). And it is randomized-blocks, in that we assign a random combination of treatments to each of the four blocks separately. \n\nAll the preparation allowed us to conduct our experience with confidence \n
  99. And here we have these variables at play. On the one hand, we have two controlled variables, whose combination results in four blocks. On the other hand, the combination of the two independent variables results in 4 treatments (two experimental and two control). Our design is a between-subjects (every subject receives either a control treatment or an experimental one). And it is randomized-blocks, in that we assign a random combination of treatments to each of the four blocks separately. \n\nAll the preparation allowed us to conduct our experience with confidence \n
  100. And here we have these variables at play. On the one hand, we have two controlled variables, whose combination results in four blocks. On the other hand, the combination of the two independent variables results in 4 treatments (two experimental and two control). Our design is a between-subjects (every subject receives either a control treatment or an experimental one). And it is randomized-blocks, in that we assign a random combination of treatments to each of the four blocks separately. \n\nAll the preparation allowed us to conduct our experience with confidence \n
  101. And here we have these variables at play. On the one hand, we have two controlled variables, whose combination results in four blocks. On the other hand, the combination of the two independent variables results in 4 treatments (two experimental and two control). Our design is a between-subjects (every subject receives either a control treatment or an experimental one). And it is randomized-blocks, in that we assign a random combination of treatments to each of the four blocks separately. \n\nAll the preparation allowed us to conduct our experience with confidence \n
  102. And here we have these variables at play. On the one hand, we have two controlled variables, whose combination results in four blocks. On the other hand, the combination of the two independent variables results in 4 treatments (two experimental and two control). Our design is a between-subjects (every subject receives either a control treatment or an experimental one). And it is randomized-blocks, in that we assign a random combination of treatments to each of the four blocks separately. \n\nAll the preparation allowed us to conduct our experience with confidence \n
  103. And here we have these variables at play. On the one hand, we have two controlled variables, whose combination results in four blocks. On the other hand, the combination of the two independent variables results in 4 treatments (two experimental and two control). Our design is a between-subjects (every subject receives either a control treatment or an experimental one). And it is randomized-blocks, in that we assign a random combination of treatments to each of the four blocks separately. \n\nAll the preparation allowed us to conduct our experience with confidence \n
  104. \n
  105. The way we planned to conduct the experiment was that for each group of subjects we would start with a training session, which is a one-hour presentation of the approach, concluded with a tool demonstration of CodeCity, which would train the participants for a potential experimental treatment. After the training session, or in the next days, we would follow with a number of experiment sessions. In this diagram, in a blue square we annotate the number of data points obtained with an experimental treatment, and the number in the gray square represents the number of data points a control treatment. The arrow show us the training session used to train the subjects who received the experimental treatments.\n
  106. The way we planned to conduct the experiment was that for each group of subjects we would start with a training session, which is a one-hour presentation of the approach, concluded with a tool demonstration of CodeCity, which would train the participants for a potential experimental treatment. After the training session, or in the next days, we would follow with a number of experiment sessions. In this diagram, in a blue square we annotate the number of data points obtained with an experimental treatment, and the number in the gray square represents the number of data points a control treatment. The arrow show us the training session used to train the subjects who received the experimental treatments.\n
  107. The way we planned to conduct the experiment was that for each group of subjects we would start with a training session, which is a one-hour presentation of the approach, concluded with a tool demonstration of CodeCity, which would train the participants for a potential experimental treatment. After the training session, or in the next days, we would follow with a number of experiment sessions. In this diagram, in a blue square we annotate the number of data points obtained with an experimental treatment, and the number in the gray square represents the number of data points a control treatment. The arrow show us the training session used to train the subjects who received the experimental treatments.\n
  108. The way we planned to conduct the experiment was that for each group of subjects we would start with a training session, which is a one-hour presentation of the approach, concluded with a tool demonstration of CodeCity, which would train the participants for a potential experimental treatment. After the training session, or in the next days, we would follow with a number of experiment sessions. In this diagram, in a blue square we annotate the number of data points obtained with an experimental treatment, and the number in the gray square represents the number of data points a control treatment. The arrow show us the training session used to train the subjects who received the experimental treatments.\n
  109. The way we planned to conduct the experiment was that for each group of subjects we would start with a training session, which is a one-hour presentation of the approach, concluded with a tool demonstration of CodeCity, which would train the participants for a potential experimental treatment. After the training session, or in the next days, we would follow with a number of experiment sessions. In this diagram, in a blue square we annotate the number of data points obtained with an experimental treatment, and the number in the gray square represents the number of data points a control treatment. The arrow show us the training session used to train the subjects who received the experimental treatments.\n
  110. The way we planned to conduct the experiment was that for each group of subjects we would start with a training session, which is a one-hour presentation of the approach, concluded with a tool demonstration of CodeCity, which would train the participants for a potential experimental treatment. After the training session, or in the next days, we would follow with a number of experiment sessions. In this diagram, in a blue square we annotate the number of data points obtained with an experimental treatment, and the number in the gray square represents the number of data points a control treatment. The arrow show us the training session used to train the subjects who received the experimental treatments.\n
  111. The way we planned to conduct the experiment was that for each group of subjects we would start with a training session, which is a one-hour presentation of the approach, concluded with a tool demonstration of CodeCity, which would train the participants for a potential experimental treatment. After the training session, or in the next days, we would follow with a number of experiment sessions. In this diagram, in a blue square we annotate the number of data points obtained with an experimental treatment, and the number in the gray square represents the number of data points a control treatment. The arrow show us the training session used to train the subjects who received the experimental treatments.\n
  112. The way we planned to conduct the experiment was that for each group of subjects we would start with a training session, which is a one-hour presentation of the approach, concluded with a tool demonstration of CodeCity, which would train the participants for a potential experimental treatment. After the training session, or in the next days, we would follow with a number of experiment sessions. In this diagram, in a blue square we annotate the number of data points obtained with an experimental treatment, and the number in the gray square represents the number of data points a control treatment. The arrow show us the training session used to train the subjects who received the experimental treatments.\n
  113. The way we planned to conduct the experiment was that for each group of subjects we would start with a training session, which is a one-hour presentation of the approach, concluded with a tool demonstration of CodeCity, which would train the participants for a potential experimental treatment. After the training session, or in the next days, we would follow with a number of experiment sessions. In this diagram, in a blue square we annotate the number of data points obtained with an experimental treatment, and the number in the gray square represents the number of data points a control treatment. The arrow show us the training session used to train the subjects who received the experimental treatments.\n
  114. The way we planned to conduct the experiment was that for each group of subjects we would start with a training session, which is a one-hour presentation of the approach, concluded with a tool demonstration of CodeCity, which would train the participants for a potential experimental treatment. After the training session, or in the next days, we would follow with a number of experiment sessions. In this diagram, in a blue square we annotate the number of data points obtained with an experimental treatment, and the number in the gray square represents the number of data points a control treatment. The arrow show us the training session used to train the subjects who received the experimental treatments.\n
  115. The way we planned to conduct the experiment was that for each group of subjects we would start with a training session, which is a one-hour presentation of the approach, concluded with a tool demonstration of CodeCity, which would train the participants for a potential experimental treatment. After the training session, or in the next days, we would follow with a number of experiment sessions. In this diagram, in a blue square we annotate the number of data points obtained with an experimental treatment, and the number in the gray square represents the number of data points a control treatment. The arrow show us the training session used to train the subjects who received the experimental treatments.\n
  116. The way we planned to conduct the experiment was that for each group of subjects we would start with a training session, which is a one-hour presentation of the approach, concluded with a tool demonstration of CodeCity, which would train the participants for a potential experimental treatment. After the training session, or in the next days, we would follow with a number of experiment sessions. In this diagram, in a blue square we annotate the number of data points obtained with an experimental treatment, and the number in the gray square represents the number of data points a control treatment. The arrow show us the training session used to train the subjects who received the experimental treatments.\n
  117. The way we planned to conduct the experiment was that for each group of subjects we would start with a training session, which is a one-hour presentation of the approach, concluded with a tool demonstration of CodeCity, which would train the participants for a potential experimental treatment. After the training session, or in the next days, we would follow with a number of experiment sessions. In this diagram, in a blue square we annotate the number of data points obtained with an experimental treatment, and the number in the gray square represents the number of data points a control treatment. The arrow show us the training session used to train the subjects who received the experimental treatments.\n
  118. The way we planned to conduct the experiment was that for each group of subjects we would start with a training session, which is a one-hour presentation of the approach, concluded with a tool demonstration of CodeCity, which would train the participants for a potential experimental treatment. After the training session, or in the next days, we would follow with a number of experiment sessions. In this diagram, in a blue square we annotate the number of data points obtained with an experimental treatment, and the number in the gray square represents the number of data points a control treatment. The arrow show us the training session used to train the subjects who received the experimental treatments.\n
  119. The way we planned to conduct the experiment was that for each group of subjects we would start with a training session, which is a one-hour presentation of the approach, concluded with a tool demonstration of CodeCity, which would train the participants for a potential experimental treatment. After the training session, or in the next days, we would follow with a number of experiment sessions. In this diagram, in a blue square we annotate the number of data points obtained with an experimental treatment, and the number in the gray square represents the number of data points a control treatment. The arrow show us the training session used to train the subjects who received the experimental treatments.\n
  120. The way we planned to conduct the experiment was that for each group of subjects we would start with a training session, which is a one-hour presentation of the approach, concluded with a tool demonstration of CodeCity, which would train the participants for a potential experimental treatment. After the training session, or in the next days, we would follow with a number of experiment sessions. In this diagram, in a blue square we annotate the number of data points obtained with an experimental treatment, and the number in the gray square represents the number of data points a control treatment. The arrow show us the training session used to train the subjects who received the experimental treatments.\n
  121. The way we planned to conduct the experiment was that for each group of subjects we would start with a training session, which is a one-hour presentation of the approach, concluded with a tool demonstration of CodeCity, which would train the participants for a potential experimental treatment. After the training session, or in the next days, we would follow with a number of experiment sessions. In this diagram, in a blue square we annotate the number of data points obtained with an experimental treatment, and the number in the gray square represents the number of data points a control treatment. The arrow show us the training session used to train the subjects who received the experimental treatments.\n
  122. The way we planned to conduct the experiment was that for each group of subjects we would start with a training session, which is a one-hour presentation of the approach, concluded with a tool demonstration of CodeCity, which would train the participants for a potential experimental treatment. After the training session, or in the next days, we would follow with a number of experiment sessions. In this diagram, in a blue square we annotate the number of data points obtained with an experimental treatment, and the number in the gray square represents the number of data points a control treatment. The arrow show us the training session used to train the subjects who received the experimental treatments.\n
  123. The way we planned to conduct the experiment was that for each group of subjects we would start with a training session, which is a one-hour presentation of the approach, concluded with a tool demonstration of CodeCity, which would train the participants for a potential experimental treatment. After the training session, or in the next days, we would follow with a number of experiment sessions. In this diagram, in a blue square we annotate the number of data points obtained with an experimental treatment, and the number in the gray square represents the number of data points a control treatment. The arrow show us the training session used to train the subjects who received the experimental treatments.\n
  124. The way we planned to conduct the experiment was that for each group of subjects we would start with a training session, which is a one-hour presentation of the approach, concluded with a tool demonstration of CodeCity, which would train the participants for a potential experimental treatment. After the training session, or in the next days, we would follow with a number of experiment sessions. In this diagram, in a blue square we annotate the number of data points obtained with an experimental treatment, and the number in the gray square represents the number of data points a control treatment. The arrow show us the training session used to train the subjects who received the experimental treatments.\n
  125. The way we planned to conduct the experiment was that for each group of subjects we would start with a training session, which is a one-hour presentation of the approach, concluded with a tool demonstration of CodeCity, which would train the participants for a potential experimental treatment. After the training session, or in the next days, we would follow with a number of experiment sessions. In this diagram, in a blue square we annotate the number of data points obtained with an experimental treatment, and the number in the gray square represents the number of data points a control treatment. The arrow show us the training session used to train the subjects who received the experimental treatments.\n
  126. The way we planned to conduct the experiment was that for each group of subjects we would start with a training session, which is a one-hour presentation of the approach, concluded with a tool demonstration of CodeCity, which would train the participants for a potential experimental treatment. After the training session, or in the next days, we would follow with a number of experiment sessions. In this diagram, in a blue square we annotate the number of data points obtained with an experimental treatment, and the number in the gray square represents the number of data points a control treatment. The arrow show us the training session used to train the subjects who received the experimental treatments.\n
  127. The first step was to organize a pilot studies with 7 master and 2 phd students, which would allow us to improve our questionnaire and to resolve the most common problems that would appear.\n
  128. After this we started the experiment, which spanned 4 cities in Switzerland, Italy and Belgium and took around 6 month. We managed to collect 41 valid data points, of which 20 from industry practitioners.\n
  129. After this we started the experiment, which spanned 4 cities in Switzerland, Italy and Belgium and took around 6 month. We managed to collect 41 valid data points, of which 20 from industry practitioners.\n
  130. After this we started the experiment, which spanned 4 cities in Switzerland, Italy and Belgium and took around 6 month. We managed to collect 41 valid data points, of which 20 from industry practitioners.\n
  131. After this we started the experiment, which spanned 4 cities in Switzerland, Italy and Belgium and took around 6 month. We managed to collect 41 valid data points, of which 20 from industry practitioners.\n
  132. After this we started the experiment, which spanned 4 cities in Switzerland, Italy and Belgium and took around 6 month. We managed to collect 41 valid data points, of which 20 from industry practitioners.\n
  133. After this we started the experiment, which spanned 4 cities in Switzerland, Italy and Belgium and took around 6 month. We managed to collect 41 valid data points, of which 20 from industry practitioners.\n
  134. Here is the final distribution of treatments to subjects across the four groups.\n
  135. To collect the data from our subjects, we used questionnaires containing the tasks, which gives us the raw data for measuring correctness, alternating with pages dedicated to entering the time, which allowed us to compute the completion time for each task.\n
  136. To collect the data from our subjects, we used questionnaires containing the tasks, which gives us the raw data for measuring correctness, alternating with pages dedicated to entering the time, which allowed us to compute the completion time for each task.\n
  137. To collect the data from our subjects, we used questionnaires containing the tasks, which gives us the raw data for measuring correctness, alternating with pages dedicated to entering the time, which allowed us to compute the completion time for each task.\n
  138. To collect the data from our subjects, we used questionnaires containing the tasks, which gives us the raw data for measuring correctness, alternating with pages dedicated to entering the time, which allowed us to compute the completion time for each task.\n
  139. To collect the data from our subjects, we used questionnaires containing the tasks, which gives us the raw data for measuring correctness, alternating with pages dedicated to entering the time, which allowed us to compute the completion time for each task.\n
  140. To collect the data from our subjects, we used questionnaires containing the tasks, which gives us the raw data for measuring correctness, alternating with pages dedicated to entering the time, which allowed us to compute the completion time for each task.\n
  141. To collect the data from our subjects, we used questionnaires containing the tasks, which gives us the raw data for measuring correctness, alternating with pages dedicated to entering the time, which allowed us to compute the completion time for each task.\n
  142. To collect the data from our subjects, we used questionnaires containing the tasks, which gives us the raw data for measuring correctness, alternating with pages dedicated to entering the time, which allowed us to compute the completion time for each task.\n
  143. To collect the data from our subjects, we used questionnaires containing the tasks, which gives us the raw data for measuring correctness, alternating with pages dedicated to entering the time, which allowed us to compute the completion time for each task.\n
  144. To collect the data from our subjects, we used questionnaires containing the tasks, which gives us the raw data for measuring correctness, alternating with pages dedicated to entering the time, which allowed us to compute the completion time for each task.\n
  145. To collect the data from our subjects, we used questionnaires containing the tasks, which gives us the raw data for measuring correctness, alternating with pages dedicated to entering the time, which allowed us to compute the completion time for each task.\n
  146. To collect the data from our subjects, we used questionnaires containing the tasks, which gives us the raw data for measuring correctness, alternating with pages dedicated to entering the time, which allowed us to compute the completion time for each task.\n
  147. To collect the data from our subjects, we used questionnaires containing the tasks, which gives us the raw data for measuring correctness, alternating with pages dedicated to entering the time, which allowed us to compute the completion time for each task.\n
  148. We implemented a web app for controlling the time during the experiment. It provides a common time and shows, for each subject the current task and remaining time for the task. We allotted a finite maximum time for each task (10 minutes).\n
  149. We implemented a web app for controlling the time during the experiment. It provides a common time and shows, for each subject the current task and remaining time for the task. We allotted a finite maximum time for each task (10 minutes).\n
  150. We implemented a web app for controlling the time during the experiment. It provides a common time and shows, for each subject the current task and remaining time for the task. We allotted a finite maximum time for each task (10 minutes).\n
  151. While measuring time was straightforward, it was not the case with correctness. For this we build one oracle per treatment and used it to grade the participants’ solutions. Moreover, we used blinding for two of us, such that they did not know while correcting whether the solution was obtained with our approach or with the baseline.\n
  152. What did we find from our experiment?\n
  153. For the two dependent variables (correctness and completion time), we performed a two-way analysis of variance using the SPSS tool, at a 95% confidence interval.\n
  154. Using our approach, the subjects obtained more correct results on average than when using Eclipse and Excel, regardless of the object system size. Namely 24 percent more correct, a statistically significant improvement. This, according to Cohen’s d, is a large effect size.\n
  155. In these boxplots you can see the distribution of the data points (the box covers the 50% of the data around the median. The lonely data point shows one data point outlier, representing an exceptionally best result for the control group with a large object system.\n
  156. What about completion time. Again, our approach performed better than the baseline, again regardless of the object system size. Our approach enabled an average improvement of 12 percent of the task completion time. This is a statistically significant result and the effect size of this result is moderate, according to Cohen’s d.\n
  157. Here are the boxplots which show the distribution of the data points for the completion time dependent variable.\n
  158. Although it sounds like a total eclipse for the ones of you who are using Eclipse on a regular basis, the good news is that we know what to do to make it better...\n
  159. Our experiment is easily replicable. In our technical report we provided every detail needed and, if you are interested, please feel free to replicate it.\n
  160. Although these are only a subset of the stories around the experiment, I need to wrap up here.\n
  161. The main points are: we designed our experiment based on a list of desiderata extracted from the body of literature. We then conducted the experiment over a period of 6 months in 4 locations spanning three countries and managed to engage 41 subjects, half of which are industry practitioners. The main results of our experiment show that our approach improved the performance of our experimental participants over the control participants in both correctness and completion time.\n
  162. The main points are: we designed our experiment based on a list of desiderata extracted from the body of literature. We then conducted the experiment over a period of 6 months in 4 locations spanning three countries and managed to engage 41 subjects, half of which are industry practitioners. The main results of our experiment show that our approach improved the performance of our experimental participants over the control participants in both correctness and completion time.\n
  163. The main points are: we designed our experiment based on a list of desiderata extracted from the body of literature. We then conducted the experiment over a period of 6 months in 4 locations spanning three countries and managed to engage 41 subjects, half of which are industry practitioners. The main results of our experiment show that our approach improved the performance of our experimental participants over the control participants in both correctness and completion time.\n
  164. The main points are: we designed our experiment based on a list of desiderata extracted from the body of literature. We then conducted the experiment over a period of 6 months in 4 locations spanning three countries and managed to engage 41 subjects, half of which are industry practitioners. The main results of our experiment show that our approach improved the performance of our experimental participants over the control participants in both correctness and completion time.\n
  165. The main points are: we designed our experiment based on a list of desiderata extracted from the body of literature. We then conducted the experiment over a period of 6 months in 4 locations spanning three countries and managed to engage 41 subjects, half of which are industry practitioners. The main results of our experiment show that our approach improved the performance of our experimental participants over the control participants in both correctness and completion time.\n
  166. The point I’d like you to take away from this presentation is that our approach is at least a viable alternative to the current state of the practice, non-visual approaches for software exploration.\n
  167. An experiment of such scale is not possible without the support and participation of many people. We’d like to thank them all. And I’d like to thank you for your time and attention.\n
  168. \n