SlideShare a Scribd company logo
1 of 46
Download to read offline
An Empirical Study on Requirements Traceability
                     Using Eye-Tracking
                                               ICSM 2012, Riva Del Garda, Italy




Nasir Ali, Zohreh Sharafi, Yann-Gaël Guéhéneuc, and Giuliano Antoniol

                    … featuring Bram Adams J
Requirements Traceability

 Requirements traceability is defined as “the
 ability to describe and follow the life of a
 requirement, in both a forwards and backwards
 direction” [Gotel, 1994]


Program comprehension   Code location of new requirement

             Conformance to specification

                        ICSM 2012                     2
What’s Requirements Traceability Good For?

 — Ease program comprehension

 — Discover what code must change to handle a
   new requirement

 — Help in determining whether a specification is
   completely implemented


                      ICSM 2012                     3
Linking Requirements with Source Code

                                            Requirements

                                       y
                                 ilarit     RSS option for an
                            f sim
                   2 0% o                  instant messenger
           C ODE                                ………….
     RCE
 SOU




                      ICSM 2012                            4
IR-based Approaches

• Vector Space Model [Antoniol et al. 2002]

• Latent Semantic Indexing [Marcus and Maletic, 2003]

• Jensen Shannon Divergence [Abadi et al. 2008]

• Latent Dirichlet Allocation [Asuncion, 2010]

                           ICSM 2012                    5
Linking Requirements with Source Code

                                                  Requirements
                                             y
                                   sim ilarit
                               f
                   2 0% o                         RSS option for an
                                                 instant messenger
           C ODE                                      ………….
     RCE
 SOU




                   ICSM 2012                                     6
How IR Techniques see Source Code: TF/IDF

                                      This is all text.
                                          TF/IDF




                 Term Frequency: importance of            IR
                      term in a document
   X                    X
                 Inverse Document Frequency: rareness
   ||                of term across all the documents
weight of term                ICSM 2012                        7
What the Developer Really Saw

                Class, Method,
             Variables, Comments
                  ……………….




            ICSM 2012
                             Developer   8
Our Conjecture
Understanding how developers verify RT links
 can allow developing an improved IR-based
  technique to recover RT links with better
     accuracy than previous techniques




                   ICSM 2012                   9
Part I: Eye-tracking Study




           ICSM 2012         10
RQ1: What Source Code Entities (SCEs) do
      Developers Value the Most?


Physical                    Conceptual


    q Class Name
    q Method Name                 q Domain Related Terms
    q Variable Name               q Application Related Terms
    q Comments




                      ICSM 2012                             11
Hypotheses for RQ1


• H01 – There is no difference between the
  importance of SCEs for developers.


• H02 – There is no difference between the
  importance of domain- and implementation-
  related SCEs for developers.

                     ICSM 2012                12
Statistical Tests

• Wilcoxon Rank Sum
  – It is defined as how many times a Y precedes X
    in the samples. It is a nonparametric test and an
    alternative to the two-sample t-test.

• Kruskual-Wallis
  – Kruskal-Wallis rank sum test is a non-
    parametric method for testing the equality of
    the population medians among different
    groups.

                       ICSM 2012                    13
Approach: Eye-tracking Experiment




                          eye fixation
                              ==
                      developer importance
              ICSM 2012                  14
Such Information is Relatively Easy to Obtain!

 • Facelab by Seeing Machine
    – Built-in cameras
    – Infrared pad
    – Monitor Screen




                         ICSM 2012         15
Case Study Setup

                          Subject Statistics
Total Subjects                       26
Ph.D.                                22 (1 rejected: pilot study)
M.Sc.                                4 (1 rejected: glasses)
Java Experience                      On Average 3.39 Years
Traceability Experience              90% subjects


                      Source Code Statistics
    Programming Language             Java
    Total Source Code Snippet        6
    Total LOC                        19, 18, 19, 18, 24, 28

                                ICSM 2012                           16
Example Task




                                     ple
                                 S am
                             e
                         C od
                   rce
                Sou

    ICSM 2012                              17
Post-experiment Question

“Please rank the source code entities that help
  you more to comprehend source code and
           verify a traceability link.”

      Source Code Entities           Preference Ranks
      Class Name(s)
      Method Name(s)
      Comments
      Variable Name(s)


                             ICSM 2012                  18
Post-experiment Questions (1)

• Question1: Was the source code easy to read?
  – Yes – Easy to read
  – No – Source code font was too small and hard to
    read

• Question2: Were the source code identifiers easy
  to understand?
   – Yes
   – No

                       ICSM 2012                     19
Post-experiment Questions (1)

• Question1: Was the source code easy to read?
  – Yes – Easy to read [unanimously]



• Question2: Were the source code identifiers easy
  to understand?
   – Yes [unanimously]

                      ICSM 2012                  20
Post-experiment Questions (1)

• Question1: Was the source code easy to read?
  –Yes – Easy to read (All the subjects
   answered in Yes)


• Question2: Were the source code identifiers
  easy to understand?
  –Yes (All the subjects answered in Yes)
                     ICSM 2012                  21
Eye-tracking Experiment Results




             ICSM 2012            22
Eye-tracking Experiment Results

  Source Code Entities              Average Fixation Time (ms) of All Subjects

  Method Name(s)                                      5701.10
  Comments                                            4542.41
                                                                                       .01
  Variable Name(s)                                    3181.81
                                                                                     <0
  Class Name(s)                                       2317.25
                                                                             a lue
                                                                       p-V


                                         Average Fixation Time (ms) of All Subjects

Domain (48% of all terms)                                  4865.30
Implementation (52% of all terms)                          1729.80            < 0.01
                                                                         alue
                                                                      p-V
                                       ICSM 2012                                      23
Post-experiment Question

“Please rank the source code entities that help
  you more to comprehend source code and
           verify a traceability link.”

    Source Code Entities       Average Rank of All Subjects
    Method Name(s)                          1.44
    Comments                                2.00
    Variable Name(s)                        2.61
    Class Name(s)                           3.87



                           ICSM 2012                          24
RQ1: What Source Code Entities (SCEs) do
      Developers Value the Most?


Physical                    Conceptual


    1.     Method Name
    2.     Comments              1. Domain Related Terms
    3.     Variable Name         2. Application Related Terms
    4.     Class Name




                           ICSM 2012                            25
Part II: Improved Retrieval Technique




                ICSM 2012          26
RQ2: Can we Improve IR Techniques by making
  them Aware of the Developers’ Interests?
                         1. Method
                        2. Comment
                        3. Variables
                          4. Class




                                       IR

                   ICSM 2012                27
Alternative Weighting Scheme 1: SE/IDF

                 class name not in requirements
                        class name in requirements
                method name not in requirements
                        method name in requirements
                variable name not in requirements
                        variable name in requirements
                comment not in requirements
                       comment in requirements
             ti requirement not in code



                  ICSM 2012                        28
Alternative Weighting Scheme 2: DOI/IDF



                             domain term
                             implementation term




                 ICSM 2012                         29
Research Questions for RQ2

• RQ2.1 – Does using SE/IDF allow developing a
  RT links recovery technique with a better
  accuracy than a technique using TF/IDF?

• RQ2.2 – Does using DOI/IDF allow developing
  a RT links recovery technique with a better
  accuracy than a technique using TF/IDF?


                     ICSM 2012                  30
Hypotheses for RQ2

• H01 – There is no difference between the accuracy
  of an LSI-based technique using TF/IDF and using the
  novel weighting scheme, SE/IDF in terms of F-
  measure.

• H02 – There is no difference between the accuracy
  of an LSI-based technique using TF/IDF and using the
  novel weighting scheme, DOI/IDF in terms of F-
  measure.


                        ICSM 2012                    31
Case Study Systems

iTrust: Medical application
Pooka: An email Client


                       Pooka                iTrust
Programming Language   Java                 Java
Version                2.0                  10
Number of Classes      298                  526
Number of Methods      20,868               3,404
LOC                    244K                 19,604



                                ICSM 2012            32
IR-based Traceability Link Recovery

• LSI (Latent Semantic Indexing)
• Processed corpus is transformed into a term-by-
  document matrix
• The values of the matrix cells represent the
  weights of the terms in the documents

• Use TF/IDF, SE/IDF, and DOI/IDF
  weights to link requirements and source code

                      ICSM 2012                  33
IR Quality Measures




       ICSM 2012      34
Document Preprocessing

• Extract all the source code and requirements
  terms
• Store the mapping of each source code term


  • Filter (#43@$)
  • Stop words (the, is, an….)
  • Stemmer (attachment -> attach)

                      ICSM 2012                  35
Separate Domain and Implementation Terms

 • LDA (Latent Dirichlet Allocation)
      • LDA Parameter Settings 1
           • α = 25 (50/K)
           • β = 0.1
           • K=2
 • Mallet LDA Implementation
  1S. Abebe and P. Tonella, “Towards the extraction of domain concepts from the identifiers,” in 18th
  Working Conference on Reverse Engineering (WCRE), 2011, pp. 77 –86.


                                                  ICSM 2012                                             36
SE/IDF and DOI/IDF Parameter Settings

 Parameters   Values   SCE
 α            0.1      Class
 β            0.4      Method              Weighted
 γ            0.2      Variable           according to
 δ            0.3      Comments           developers’
 ϒ            0.74     Domain               interest
 Φ            0.26     Implementation
 Ψ            δ/2      Only in Requirements
 λ            2        Terms in Source Code ánd Requirements



                       ICSM 2012                               37
Pooka F1 Results: DOI/IDF > SE/IDF > TF/IDF




                   ICSM 2012             38
iTrust F1 Results: DOI/IDF > SE/IDF > TF/IDF




                   ICSM 2012              39
Average Results across All Thresholds

         Approaches Avg. Precision Avg. Recall       F1 Measures P-Values

         LSI (TF/IDF)    14.71               21.07      9.52

         LSI (SE/IDF)    18.30               24.86      12.47
Pooka                                                               < 0.01

         LSI (DOI/IDF)   25.73               26.42      14.56

         LSI (TF/IDF)    36.43               33.14      17.76

         LSI (SE/IDF)    38.66               33.97      18.21
iTrust                                                              < 0.01

         LSI (DOI/IDF)   39.55               35.07      20.64



                                 ICSM 2012                                   40
Research Questions for RQ2

• RQ2.1 – Does using SE/IDF allow developing a
  RT links recovery technique with a better
  accuracy than a technique using TF/IDF?

• RQ2.2 – Does using DOI/IDF allow developing
  a RT links recovery technique with a better
  accuracy than a technique using TF/IDF?


                     ICSM 2012                  41
Hypotheses for RQ2

• H01 – There is no difference between the accuracy
  of an LSI-based technique using TF/IDF and using the
  novel weighting scheme, SE/IDF in term of F-
  measure
                                    REJECT
• H02 – There is no difference between the accuracy
  of an LSI-based technique using TF/IDF and using the
  novel weighting scheme, DOI/IDF in term of F-
  measure
                                    REJECT
                        ICSM 2012                    42
Summary

• Integrating the developers’ knowledge, i.e.,
  SCEs preferences, in an IR technique
  statistically improves its accuracy


• Different terms should be weighted according to
  their positions in the source code




                      ICSM 2012                  43
What+ Developer+
    the+       Really+
                     Saw+

              Class,+
                    Method,+
           Variables,+Comments+
                ……………….+




          ICSM 2012
                           Developer   8




                                                       Hypotheses+ RQ2+
                                                                 for+

                                           • H01+ There+ no+
                                                –+     is+ difference+between+the+
                                             accuracy+ an+ based+
                                                     of+ LSIX      technique+using+
                                                                                  TF/
                                             IDF+ using+ novel+
                                                and+     the+     weigh: ng+
                                                                           scheme,+ SE/
                                             IDF+ term+ FX
                                                in+    of+ measure+      REJECT
                                           • H02+ There+ no+
                                                –+       is+ difference+between+the+
                                             accuracy+ an+ based+
                                                      of+ LSIX       technique+using+
                                                                                    TF/
                                             IDF+ using+ novel+
                                                and+       the+     weigh: ng+
                                                                             scheme,+
                                             DOI/IDF+ term+ FX
                                                     in+     of+ measure+REJECT

                                                               ICSM 2012                  42
                                   ICSM 2012                                         44
SCAM 201   45
What+ Developer+
    the+       Really+
                     Saw+

              Class,+
                    Method,+
           Variables,+Comments+
                ……………….+




          ICSM 2012
                           Developer   8




                                                       Hypotheses+ RQ2+
                                                                 for+

                                           • H01+ There+ no+
                                                –+     is+ difference+between+the+
                                             accuracy+ an+ based+
                                                     of+ LSIX      technique+using+
                                                                                  TF/
                                             IDF+ using+ novel+
                                                and+     the+     weigh: ng+
                                                                           scheme,+ SE/
                                             IDF+ term+ FX
                                                in+    of+ measure+      REJECT
                                           • H02+ There+ no+
                                                –+       is+ difference+between+the+
                                             accuracy+ an+ based+
                                                      of+ LSIX       technique+using+
                                                                                    TF/
                                             IDF+ using+ novel+
                                                and+       the+     weigh: ng+
                                                                             scheme,+
                                             DOI/IDF+ term+ FX
                                                     in+     of+ measure+REJECT

                                                               ICSM 2012                  42
                                   ICSM 2012                                         46

More Related Content

Similar to ICSM12.ppt

A preliminary study on using code smells to improve bug localization
A preliminary study on using code smells to improve bug localizationA preliminary study on using code smells to improve bug localization
A preliminary study on using code smells to improve bug localizationkrws
 
Our research lines on Model-Driven Engineering and Software Engineering
Our research lines on Model-Driven Engineering and Software EngineeringOur research lines on Model-Driven Engineering and Software Engineering
Our research lines on Model-Driven Engineering and Software EngineeringJordi Cabot
 
On Modeling and Testing When Unpredictability Becomes the Pattern (April 2nd,...
On Modeling and Testing When Unpredictability Becomes the Pattern (April 2nd,...On Modeling and Testing When Unpredictability Becomes the Pattern (April 2nd,...
On Modeling and Testing When Unpredictability Becomes the Pattern (April 2nd,...Benoit Combemale
 
CMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureMasud Rahman
 
IA3_presentation.pptx
IA3_presentation.pptxIA3_presentation.pptx
IA3_presentation.pptxKtonNguyn2
 
OpenSees: Future Directions
OpenSees: Future DirectionsOpenSees: Future Directions
OpenSees: Future Directionsopenseesdays
 
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsTowards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsGábor Szárnyas
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Activity report on Deep-learning based compression
Activity report on Deep-learning based compressionActivity report on Deep-learning based compression
Activity report on Deep-learning based compressionMarco Cagnazzo
 
The secret life of rules in Software Engineering
The secret life of rules in Software EngineeringThe secret life of rules in Software Engineering
The secret life of rules in Software EngineeringJordi Cabot
 
Automating Software Development Using Artificial Intelligence (AI)
Automating Software Development Using Artificial Intelligence (AI)Automating Software Development Using Artificial Intelligence (AI)
Automating Software Development Using Artificial Intelligence (AI)Jeremy Bradbury
 
What Impact Will Entity Framework Have On Architecture
What Impact Will Entity Framework Have On ArchitectureWhat Impact Will Entity Framework Have On Architecture
What Impact Will Entity Framework Have On ArchitectureEric Nelson
 
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...CSCJournals
 
Slides SRS 2012
Slides SRS 2012Slides SRS 2012
Slides SRS 2012gsansone
 

Similar to ICSM12.ppt (20)

Icsm12.ppt
Icsm12.pptIcsm12.ppt
Icsm12.ppt
 
A preliminary study on using code smells to improve bug localization
A preliminary study on using code smells to improve bug localizationA preliminary study on using code smells to improve bug localization
A preliminary study on using code smells to improve bug localization
 
Our research lines on Model-Driven Engineering and Software Engineering
Our research lines on Model-Driven Engineering and Software EngineeringOur research lines on Model-Driven Engineering and Software Engineering
Our research lines on Model-Driven Engineering and Software Engineering
 
On Modeling and Testing When Unpredictability Becomes the Pattern (April 2nd,...
On Modeling and Testing When Unpredictability Becomes the Pattern (April 2nd,...On Modeling and Testing When Unpredictability Becomes the Pattern (April 2nd,...
On Modeling and Testing When Unpredictability Becomes the Pattern (April 2nd,...
 
CMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lecture
 
IA3_presentation.pptx
IA3_presentation.pptxIA3_presentation.pptx
IA3_presentation.pptx
 
STRICT-SANER2015
STRICT-SANER2015STRICT-SANER2015
STRICT-SANER2015
 
OpenSees: Future Directions
OpenSees: Future DirectionsOpenSees: Future Directions
OpenSees: Future Directions
 
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsTowards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
 
Icpc11c.ppt
Icpc11c.pptIcpc11c.ppt
Icpc11c.ppt
 
Icpc11c.ppt
Icpc11c.pptIcpc11c.ppt
Icpc11c.ppt
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Activity report on Deep-learning based compression
Activity report on Deep-learning based compressionActivity report on Deep-learning based compression
Activity report on Deep-learning based compression
 
The secret life of rules in Software Engineering
The secret life of rules in Software EngineeringThe secret life of rules in Software Engineering
The secret life of rules in Software Engineering
 
Automating Software Development Using Artificial Intelligence (AI)
Automating Software Development Using Artificial Intelligence (AI)Automating Software Development Using Artificial Intelligence (AI)
Automating Software Development Using Artificial Intelligence (AI)
 
What Impact Will Entity Framework Have On Architecture
What Impact Will Entity Framework Have On ArchitectureWhat Impact Will Entity Framework Have On Architecture
What Impact Will Entity Framework Have On Architecture
 
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
 
CCDE Experience
CCDE ExperienceCCDE Experience
CCDE Experience
 
Paderborn
PaderbornPaderborn
Paderborn
 
Slides SRS 2012
Slides SRS 2012Slides SRS 2012
Slides SRS 2012
 

More from Ptidej Team

From IoT to Software Miniaturisation
From IoT to Software MiniaturisationFrom IoT to Software Miniaturisation
From IoT to Software MiniaturisationPtidej Team
 
Presentation by Lionel Briand
Presentation by Lionel BriandPresentation by Lionel Briand
Presentation by Lionel BriandPtidej Team
 
Manel Abdellatif
Manel AbdellatifManel Abdellatif
Manel AbdellatifPtidej Team
 
Azadeh Kermansaravi
Azadeh KermansaraviAzadeh Kermansaravi
Azadeh KermansaraviPtidej Team
 
CSED - Manel Grichi
CSED - Manel GrichiCSED - Manel Grichi
CSED - Manel GrichiPtidej Team
 
Cristiano Politowski
Cristiano PolitowskiCristiano Politowski
Cristiano PolitowskiPtidej Team
 
Will io t trigger the next software crisis
Will io t trigger the next software crisisWill io t trigger the next software crisis
Will io t trigger the next software crisisPtidej Team
 
Thesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptThesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptPtidej Team
 
Thesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.pptThesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.pptPtidej Team
 
Thesis+of+étienne+duclos.ppt
Thesis+of+étienne+duclos.pptThesis+of+étienne+duclos.ppt
Thesis+of+étienne+duclos.pptPtidej Team
 

More from Ptidej Team (20)

From IoT to Software Miniaturisation
From IoT to Software MiniaturisationFrom IoT to Software Miniaturisation
From IoT to Software Miniaturisation
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
Presentation by Lionel Briand
Presentation by Lionel BriandPresentation by Lionel Briand
Presentation by Lionel Briand
 
Manel Abdellatif
Manel AbdellatifManel Abdellatif
Manel Abdellatif
 
Azadeh Kermansaravi
Azadeh KermansaraviAzadeh Kermansaravi
Azadeh Kermansaravi
 
Mouna Abidi
Mouna AbidiMouna Abidi
Mouna Abidi
 
CSED - Manel Grichi
CSED - Manel GrichiCSED - Manel Grichi
CSED - Manel Grichi
 
Cristiano Politowski
Cristiano PolitowskiCristiano Politowski
Cristiano Politowski
 
Will io t trigger the next software crisis
Will io t trigger the next software crisisWill io t trigger the next software crisis
Will io t trigger the next software crisis
 
MIPA
MIPAMIPA
MIPA
 
Thesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptThesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.ppt
 
Thesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.pptThesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.ppt
 
Medicine15.ppt
Medicine15.pptMedicine15.ppt
Medicine15.ppt
 
Qrs17b.ppt
Qrs17b.pptQrs17b.ppt
Qrs17b.ppt
 
Icsme16.ppt
Icsme16.pptIcsme16.ppt
Icsme16.ppt
 
Msr17a.ppt
Msr17a.pptMsr17a.ppt
Msr17a.ppt
 
Icsoc15.ppt
Icsoc15.pptIcsoc15.ppt
Icsoc15.ppt
 
Thesis+of+étienne+duclos.ppt
Thesis+of+étienne+duclos.pptThesis+of+étienne+duclos.ppt
Thesis+of+étienne+duclos.ppt
 

Recently uploaded

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Recently uploaded (20)

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

ICSM12.ppt

  • 1. An Empirical Study on Requirements Traceability Using Eye-Tracking ICSM 2012, Riva Del Garda, Italy Nasir Ali, Zohreh Sharafi, Yann-Gaël Guéhéneuc, and Giuliano Antoniol … featuring Bram Adams J
  • 2. Requirements Traceability Requirements traceability is defined as “the ability to describe and follow the life of a requirement, in both a forwards and backwards direction” [Gotel, 1994] Program comprehension Code location of new requirement Conformance to specification ICSM 2012 2
  • 3. What’s Requirements Traceability Good For? — Ease program comprehension — Discover what code must change to handle a new requirement — Help in determining whether a specification is completely implemented ICSM 2012 3
  • 4. Linking Requirements with Source Code Requirements y ilarit RSS option for an f sim 2 0% o instant messenger C ODE …………. RCE SOU ICSM 2012 4
  • 5. IR-based Approaches • Vector Space Model [Antoniol et al. 2002] • Latent Semantic Indexing [Marcus and Maletic, 2003] • Jensen Shannon Divergence [Abadi et al. 2008] • Latent Dirichlet Allocation [Asuncion, 2010] ICSM 2012 5
  • 6. Linking Requirements with Source Code Requirements y sim ilarit f 2 0% o RSS option for an instant messenger C ODE …………. RCE SOU ICSM 2012 6
  • 7. How IR Techniques see Source Code: TF/IDF This is all text. TF/IDF Term Frequency: importance of IR term in a document X X Inverse Document Frequency: rareness || of term across all the documents weight of term ICSM 2012 7
  • 8. What the Developer Really Saw Class, Method, Variables, Comments ………………. ICSM 2012 Developer 8
  • 9. Our Conjecture Understanding how developers verify RT links can allow developing an improved IR-based technique to recover RT links with better accuracy than previous techniques ICSM 2012 9
  • 10. Part I: Eye-tracking Study ICSM 2012 10
  • 11. RQ1: What Source Code Entities (SCEs) do Developers Value the Most? Physical Conceptual q Class Name q Method Name q Domain Related Terms q Variable Name q Application Related Terms q Comments ICSM 2012 11
  • 12. Hypotheses for RQ1 • H01 – There is no difference between the importance of SCEs for developers. • H02 – There is no difference between the importance of domain- and implementation- related SCEs for developers. ICSM 2012 12
  • 13. Statistical Tests • Wilcoxon Rank Sum – It is defined as how many times a Y precedes X in the samples. It is a nonparametric test and an alternative to the two-sample t-test. • Kruskual-Wallis – Kruskal-Wallis rank sum test is a non- parametric method for testing the equality of the population medians among different groups. ICSM 2012 13
  • 14. Approach: Eye-tracking Experiment eye fixation == developer importance ICSM 2012 14
  • 15. Such Information is Relatively Easy to Obtain! • Facelab by Seeing Machine – Built-in cameras – Infrared pad – Monitor Screen ICSM 2012 15
  • 16. Case Study Setup Subject Statistics Total Subjects 26 Ph.D. 22 (1 rejected: pilot study) M.Sc. 4 (1 rejected: glasses) Java Experience On Average 3.39 Years Traceability Experience 90% subjects Source Code Statistics Programming Language Java Total Source Code Snippet 6 Total LOC 19, 18, 19, 18, 24, 28 ICSM 2012 16
  • 17. Example Task ple S am e C od rce Sou ICSM 2012 17
  • 18. Post-experiment Question “Please rank the source code entities that help you more to comprehend source code and verify a traceability link.” Source Code Entities Preference Ranks Class Name(s) Method Name(s) Comments Variable Name(s) ICSM 2012 18
  • 19. Post-experiment Questions (1) • Question1: Was the source code easy to read? – Yes – Easy to read – No – Source code font was too small and hard to read • Question2: Were the source code identifiers easy to understand? – Yes – No ICSM 2012 19
  • 20. Post-experiment Questions (1) • Question1: Was the source code easy to read? – Yes – Easy to read [unanimously] • Question2: Were the source code identifiers easy to understand? – Yes [unanimously] ICSM 2012 20
  • 21. Post-experiment Questions (1) • Question1: Was the source code easy to read? –Yes – Easy to read (All the subjects answered in Yes) • Question2: Were the source code identifiers easy to understand? –Yes (All the subjects answered in Yes) ICSM 2012 21
  • 23. Eye-tracking Experiment Results Source Code Entities Average Fixation Time (ms) of All Subjects Method Name(s) 5701.10 Comments 4542.41 .01 Variable Name(s) 3181.81 <0 Class Name(s) 2317.25 a lue p-V Average Fixation Time (ms) of All Subjects Domain (48% of all terms) 4865.30 Implementation (52% of all terms) 1729.80 < 0.01 alue p-V ICSM 2012 23
  • 24. Post-experiment Question “Please rank the source code entities that help you more to comprehend source code and verify a traceability link.” Source Code Entities Average Rank of All Subjects Method Name(s) 1.44 Comments 2.00 Variable Name(s) 2.61 Class Name(s) 3.87 ICSM 2012 24
  • 25. RQ1: What Source Code Entities (SCEs) do Developers Value the Most? Physical Conceptual 1. Method Name 2. Comments 1. Domain Related Terms 3. Variable Name 2. Application Related Terms 4. Class Name ICSM 2012 25
  • 26. Part II: Improved Retrieval Technique ICSM 2012 26
  • 27. RQ2: Can we Improve IR Techniques by making them Aware of the Developers’ Interests? 1. Method 2. Comment 3. Variables 4. Class IR ICSM 2012 27
  • 28. Alternative Weighting Scheme 1: SE/IDF class name not in requirements class name in requirements method name not in requirements method name in requirements variable name not in requirements variable name in requirements comment not in requirements comment in requirements ti requirement not in code ICSM 2012 28
  • 29. Alternative Weighting Scheme 2: DOI/IDF domain term implementation term ICSM 2012 29
  • 30. Research Questions for RQ2 • RQ2.1 – Does using SE/IDF allow developing a RT links recovery technique with a better accuracy than a technique using TF/IDF? • RQ2.2 – Does using DOI/IDF allow developing a RT links recovery technique with a better accuracy than a technique using TF/IDF? ICSM 2012 30
  • 31. Hypotheses for RQ2 • H01 – There is no difference between the accuracy of an LSI-based technique using TF/IDF and using the novel weighting scheme, SE/IDF in terms of F- measure. • H02 – There is no difference between the accuracy of an LSI-based technique using TF/IDF and using the novel weighting scheme, DOI/IDF in terms of F- measure. ICSM 2012 31
  • 32. Case Study Systems iTrust: Medical application Pooka: An email Client Pooka iTrust Programming Language Java Java Version 2.0 10 Number of Classes 298 526 Number of Methods 20,868 3,404 LOC 244K 19,604 ICSM 2012 32
  • 33. IR-based Traceability Link Recovery • LSI (Latent Semantic Indexing) • Processed corpus is transformed into a term-by- document matrix • The values of the matrix cells represent the weights of the terms in the documents • Use TF/IDF, SE/IDF, and DOI/IDF weights to link requirements and source code ICSM 2012 33
  • 34. IR Quality Measures ICSM 2012 34
  • 35. Document Preprocessing • Extract all the source code and requirements terms • Store the mapping of each source code term • Filter (#43@$) • Stop words (the, is, an….) • Stemmer (attachment -> attach) ICSM 2012 35
  • 36. Separate Domain and Implementation Terms • LDA (Latent Dirichlet Allocation) • LDA Parameter Settings 1 • α = 25 (50/K) • β = 0.1 • K=2 • Mallet LDA Implementation 1S. Abebe and P. Tonella, “Towards the extraction of domain concepts from the identifiers,” in 18th Working Conference on Reverse Engineering (WCRE), 2011, pp. 77 –86. ICSM 2012 36
  • 37. SE/IDF and DOI/IDF Parameter Settings Parameters Values SCE α 0.1 Class β 0.4 Method Weighted γ 0.2 Variable according to δ 0.3 Comments developers’ ϒ 0.74 Domain interest Φ 0.26 Implementation Ψ δ/2 Only in Requirements λ 2 Terms in Source Code ánd Requirements ICSM 2012 37
  • 38. Pooka F1 Results: DOI/IDF > SE/IDF > TF/IDF ICSM 2012 38
  • 39. iTrust F1 Results: DOI/IDF > SE/IDF > TF/IDF ICSM 2012 39
  • 40. Average Results across All Thresholds Approaches Avg. Precision Avg. Recall F1 Measures P-Values LSI (TF/IDF) 14.71 21.07 9.52 LSI (SE/IDF) 18.30 24.86 12.47 Pooka < 0.01 LSI (DOI/IDF) 25.73 26.42 14.56 LSI (TF/IDF) 36.43 33.14 17.76 LSI (SE/IDF) 38.66 33.97 18.21 iTrust < 0.01 LSI (DOI/IDF) 39.55 35.07 20.64 ICSM 2012 40
  • 41. Research Questions for RQ2 • RQ2.1 – Does using SE/IDF allow developing a RT links recovery technique with a better accuracy than a technique using TF/IDF? • RQ2.2 – Does using DOI/IDF allow developing a RT links recovery technique with a better accuracy than a technique using TF/IDF? ICSM 2012 41
  • 42. Hypotheses for RQ2 • H01 – There is no difference between the accuracy of an LSI-based technique using TF/IDF and using the novel weighting scheme, SE/IDF in term of F- measure REJECT • H02 – There is no difference between the accuracy of an LSI-based technique using TF/IDF and using the novel weighting scheme, DOI/IDF in term of F- measure REJECT ICSM 2012 42
  • 43. Summary • Integrating the developers’ knowledge, i.e., SCEs preferences, in an IR technique statistically improves its accuracy • Different terms should be weighted according to their positions in the source code ICSM 2012 43
  • 44. What+ Developer+ the+ Really+ Saw+ Class,+ Method,+ Variables,+Comments+ ……………….+ ICSM 2012 Developer 8 Hypotheses+ RQ2+ for+ • H01+ There+ no+ –+ is+ difference+between+the+ accuracy+ an+ based+ of+ LSIX technique+using+ TF/ IDF+ using+ novel+ and+ the+ weigh: ng+ scheme,+ SE/ IDF+ term+ FX in+ of+ measure+ REJECT • H02+ There+ no+ –+ is+ difference+between+the+ accuracy+ an+ based+ of+ LSIX technique+using+ TF/ IDF+ using+ novel+ and+ the+ weigh: ng+ scheme,+ DOI/IDF+ term+ FX in+ of+ measure+REJECT ICSM 2012 42 ICSM 2012 44
  • 45. SCAM 201 45
  • 46. What+ Developer+ the+ Really+ Saw+ Class,+ Method,+ Variables,+Comments+ ……………….+ ICSM 2012 Developer 8 Hypotheses+ RQ2+ for+ • H01+ There+ no+ –+ is+ difference+between+the+ accuracy+ an+ based+ of+ LSIX technique+using+ TF/ IDF+ using+ novel+ and+ the+ weigh: ng+ scheme,+ SE/ IDF+ term+ FX in+ of+ measure+ REJECT • H02+ There+ no+ –+ is+ difference+between+the+ accuracy+ an+ based+ of+ LSIX technique+using+ TF/ IDF+ using+ novel+ and+ the+ weigh: ng+ scheme,+ DOI/IDF+ term+ FX in+ of+ measure+REJECT ICSM 2012 42 ICSM 2012 46