SlideShare a Scribd company logo
1 of 29
Download to read offline
Reuse or Rewrite: Combining Textual,
                             Static, and Dynamic
                       Analyses to Assess the Cost of
                        Keeping a System Up-to-date

                      Giuliano (Giulio) Antoniol, Jane Huffman Hayes,
                      Yann-Gaël Guéhéneuc, and Massimiliano Di Penta




ICSM 2008, Beijing
       1/28
Content

                     Problem statement

                     ReORe Idea

                     ReORe Process

                     Technologies

                     The Lynx Browser

                     Case Study Results

                     Conclusions


ICSM 2008, Beijing
       2/28
The Challenge

                     I own a system Sprop

                        it is becoming obsolete, technology, market shifts, etc.;


                     My competitors have introduced new
                     technologies, new pieces of functionality that my
                     Sprop lacks;

                     I can:
                        maintain Sprop to bring it up-to-date;
                        rewrite a new system from scratch;
                        get out of the market;
                              Retire and go fishing…
ICSM 2008, Beijing
       3/28
The “Reuse” Or “Rewrite” – ReORe idea

                     Leverage knowledge, existing code, and
                     technologies:
                       Use competitors systems and your customer base to
                       define a requirement baseline;
                       Assess if it is worth reusing;
                       Apply state of the art technologies to reduce your
                       assessment costs.
                     Resources have different costs: a junior
                     programmer is not as expensive as the senior
                     architect.
                     Carefully plan the use of resources: minimize
                     the usage of the most valuable resources.

ICSM 2008, Beijing
       4/28
The ReORe Process


                                                                        the browser shall support
                                                                        to zoom unzoom page
                                                                        details


                       Entities ej                                 ej
                                                                                                                    Matches                        Validated matches
                                                                                                                 between each ri                    between each ri
                                                                        hthost name get host                     and a set of ej                      and a set of ej
                                                                        details hostname
                                        Stop-word Removal                                                                            Manual
                                            Stemming                                                  TF-IDF
                                                                                                                                   Evaluation by
                     Requirements ri       Tokenization                                                                             Developers
                                                                   ri


                                                1                                                       2                              3




                                       Validated matches between
                                                 each ri
                                              and a set of ej



                                                                                                                    Manual
                                                                          Scenario                  Trace Data    Evaluation by
                                                                          Execution                                Architects
                                              Scenarios


                                                                               4
                                                                                                                       5




ICSM 2008, Beijing
       5/28
The Company Resources


                     A young developer applying
                     state of the art technology.


                         Two (still pretty young ☺) experts
                         programmers filtering the
                         “young” developer’s findings.


                     The very expensive “guru,” the
                     system architect.


ICSM 2008, Beijing
       6/28
ReORe Technology

                     Standard information retrieval vector space
                     model.
                     Indexing process:
                       Robust source code parsing and fact extractor;
                       Camel case splitting;
                       Stopper;
                       Stemmer;
                       Thesaurus (not vital but helps);
                       TF-IDF indexing.


                     Scenario-based dynamic trace collection and
                     analysis.

ICSM 2008, Beijing
       7/28
PrereqIR (see TR or WCRE’08 paper)

                     We need a pre-requirement document:
                        what the competitor systems do;
                        what our customer base needs.


                     Obtain and vet a list of requirements from diverse
                     stakeholders.

                     Structure the requirements by mapping them into
                     a representation suitable for grouping via pattern-
                     recognition and similarity-based clustering.

                     Analyze the clustered requirements to divide
                     them into a set of essential and a set of optional
                     requirements.
ICSM 2008, Beijing
       8/28
Step 1 – Inexpensive resources

                     Process both:
                       the set of requirements R;
                       the set of entities E extracted from the source code.


                     Any requirement or entity is mapped into a
                     textual representation.

                     Map textual representation into vector space.




ICSM 2008, Beijing
       9/28
Step 2 – Inexpensive resources

                     Trace requirements into code entities.

                     Apply a threshold:
                        to remove low similarity matches;
                        reduce manual verification.


                     Compute the threshold via outlier identification;
                        much likely it is better to get a very tight threshold.


                     Overall, try to reduce the search space of orders
                     of magnitude.

ICSM 2008, Beijing
      10/28
Step 3 – Experts/Developers

                     Manually evaluate the rankings from Step 1 – 2.

                     For each requirement, only retain entities
                     believed to concretely participate in the
                     implementation.

                     Tag selected entities as reusable.

                     This phase is necessarily manual:
                        only the developers can assess, using their knowledge
                        of the system and contextual information, whether an
                        entity really participates in the implementation of
                        requirement.

ICSM 2008, Beijing
      11/28
Step 4 – Inexpensive and patient…

                     Build and execute scenarios:
                        exercise the requirements for which a consensus
                        concerning their implementing entities exist.
                     Develop two kinds of scenarios:
                        Core scenarios: scenarios exercising core
                        functionalities and requirements;
                        Specific scenarios: scenarios specifically devoted to
                        some particular requirement.
                     Use core scenarios to identify the core
                     implementation entities:
                        the infrastructure of the system.


                     Validate if entities mapped in Step 1 – 3 really
                     show up.
ICSM 2008, Beijing
      12/28
Step 5 – Most valuable resource


                     Identifies and validates the entities implementing
                     a requirement.

                     Works on the reduced information produced in
                     Step 1 – 4.

                     While validating data, can also recover missing
                     links and entities.




ICSM 2008, Beijing
      13/28
Case Study

                                       Research questions:
                     RQ1: is ReORe able to effectively reduce the information
                     processed in the different steps while ensuring the
                     traceability between requirements and the existing code?
                     RQ2: what is the effort required for the application of
                     ReORe?

                                               Context:
                     Reuse components from a text-oriented Web browser
                     (Lynx) to develop a modern browser (e.g., Firefox)
                     Lynx is a text oriented (curses) browser developed in C
                         91 source files about 147 KLOC and 2074 functions;
                         156 header and configuration files about 27 KLOC;
                         some nice macros…
                     Requirements:
                        128 essential;
                        181 optional.

ICSM 2008, Beijing
      14/28
Threshold Determination

                     Out of 128+181 = 309 × 2,074 = 640,866 links,
                     138,896 have similarity > 0.
                     Similarity:
                       median 0.012;
                       percentiles 25%: 0.006 – 75%: 0.026 - IQR: 0.021.
                     Standard outlier threshold 6% (0.026 + 3 × 0.021)
                     still too high – gives about 12,000 matches…




                     Not really happy…
                         Take 8/9 times IQR which gives 20%.
ICSM 2008, Beijing
      15/28
Requirement Example

                     “the system shall allow internet cookies to be
                     downloaded in the local file system”

                         Functions                              Ranks
                         src/LYCookie.c::newCookie              0.3489
                         src/LYCookie.c::LYProcessSetCookies    0.3088
                         src/LYCookie.c::LYSetCookie            0.2942
                         src/LYCookie.c::ARGS1                  0.2797
                         src/LYDownload.c::LYdownload_options   0.2559




                         Not all requirements have at least
                         one candidate link with similarity
                         greater than 20%
ICSM 2008, Beijing
      16/28
Search Space – First Reduction

                       Input Step 1: 309 × 2,074 = 640,866 Links




                                            138,896 Links with
                                            Similarity > 0

                                                                      Nbr of links

                     Output Step 3: 738 Links with   700000
                                                     600000


                     Similarity > 20%                500000
                                                     400000
                                                     300000
                                                     200000
                                                     100000
                                                         0
                                                              Links          Sim>0   Sim>0.2




ICSM 2008, Beijing
      17/28
Matching Details

                     Set A: at least one match > 0.2
                     Set B: no match above 0.2
                     200
                     180
                     160
                                                                       A further problem:
                     140
                     120
                                                                       Average list length
                                                            Set A
                     100
                                                            Set B
                      80
                      60
                                                                       464 (A)              425 (B)
                      40
                      20
                       0
                           Essential   Optional   Overall                    Essential REQ Mapped




                                                                    Mapped
                                       40         83                 68%



ICSM 2008, Beijing
      18/28
The Junior Developer Dilemma

                     Reduce to 738 possible links.

                     Know that only 88 + 98 = 186 requirements have
                     at least one link >0.2, but:
                        each link has an average 450 matches;
                        unsure if the “experts” will vet by hand an average of
                        450 links…

                     Trick the “experts”
                        just take the top 5 matches;
                        add another 5 matches extracted from the list with
                        uniform distribution.
                     Perhaps discover missed links, at least make
                     sure experts do not cheat…
ICSM 2008, Beijing
      19/28
Step 3 – Expert Manual Decision


                               Set   Ranking   Yes   No    Only
                                     Range                 One
                                                           Yes
                               A     1-5       128   226   324


                                     6-10      2     790   20
                               B     1-5       25    407   69
                                     6-10      0     570   10




                     Experts agree more often on top five in set A!
                     Also agree most of the time for a NO on set B

ICSM 2008, Beijing
      20/28
Step 4 – Core functionalities

                     Any browser will handle HTTP, SSL, and FTP

                              Scenario   Number of Functions
                              HTTP       382
                              SSL        137
                              FTP        28




ICSM 2008, Beijing
      21/28
Step 4 – Dynamic Analysis Filter




                     Set   Ranking   Retained   Discarded
                           Range
                     A     1-5       77         51
                           6-10      1          1
                     B     1-5       10         15
                           6-10      -          -




ICSM 2008, Beijing
      22/28
Step 5 – Architect Decision

                     Set   Ranking   Retained   Discarded      Added
                           Range
                     A     1-5       73         4              12
                           6-10      1          -              -
                     B     1-5       9          1              1               3 are
                           6-10      -          -              -               duplicate
                                                                               reqt.
                     Overall         83                        13
                                                           .
                                                    Requirement        Set A     Set B
                                                    Essential (128)    23        5 (2)
                                                    Optional (181)     23        4
                                                    Overall            46        9

                     Overall: 25 essential requirements
ICSM 2008, Beijing   and 27 optional requirements.
      23/28
Accuracy

                                                        Accuracy
                                        Dynamic Analysis         Final Inspection
                     Set A       1-5    60%                      95%
                                 6-10   50%                      -
                     Set B       1-5    40%                      90%
                                 6-10   -                        -
                     Overall            56%                      94%


                     Each step plays the role of the Oracle for previous steps.

                     Accuracy: the percentage of matches for which an
                     agreement between any two subsequent steps was found.

                     It is representative of the effort saved.
ICSM 2008, Beijing
      24/28
Effort (RQ2)

                     Young Developer            10
                     Still Young                25
                     Young 4 Scenarios          40
                     Architect                  10




                                                 Architect   Young Developer
                                                   12%            12%



                                                                                    Young Developer
                                                                                    Still Young
                                                                    Still Young
                                                                        29%         Young 4 Scenarios
                                                                                    Architect
                                   Young 4 Scenarios
                                         47%




ICSM 2008, Beijing
      25/28
Discussion

                     Clearly there is manual activity involved.

                     Company knowledge should often be sufficient.

                     Some of the functions contribute to core
                     requirements as well as to specific requirements
                     (the switch course).

                     Only one function in Set B was not in Set A:
                        don’t even try to collect if similarity is low.
                     Overall, we spent 85 – 90 hours of work.

ICSM 2008, Beijing
      26/28
Threats to validity

                     External validity (generalization of findings):
                         Particular case study adopted → results may or may not be
                         similar for other systems;
                         Participants were from academia;
                     Construct validity (relationship theory-observation):
                         Thresholds play a big role in ranking requirements;
                             Experimented different thresholds;
                         Dynamic analysis depends upon test cases;
                         Final validation aims at reducing construct validity threats;
                     Internal validity (external factors might influence results):
                         Multiple steps with different techniques reduce the degree of
                         subjectivity;
                         Architect involved in the last step was not aware of how the
                         previous steps were performed;
                     Reliability validity (repeatability of results):
                         Process fully detailed;
                         System available;
                         Other data available upon request.
ICSM 2008, Beijing
      27/28
Conclusions

                     ReORe uses standard state of the art
                     technologies:
                       Eases the task of deciding “Reuse” Or “Rewrite”.


                     ReORe effectively filters information reducing the
                     manual intervention of most expensive resources.

                     For our case study (Lynx):
                       it doesn’t pay-off investigating low similarity matches;
                       only 25 % of code can be reused and only 25 essential
                       requirements out of 128 have some initial
                       implementation.

ICSM 2008, Beijing
      28/28
The longer you wait, the worst it gets




                                Questions ?




ICSM 2008, Beijing
      29/28

More Related Content

Viewers also liked

Cs audizione isfol
Cs audizione isfolCs audizione isfol
Cs audizione isfolFabio Bolo
 
What Matters Now
What Matters NowWhat Matters Now
What Matters NowJosh Amer
 
C.003 task based risk assessment
C.003 task based risk assessmentC.003 task based risk assessment
C.003 task based risk assessmentboris sukljan
 
Seduta commissione lavoro 27 marzo2104
Seduta commissione lavoro 27 marzo2104Seduta commissione lavoro 27 marzo2104
Seduta commissione lavoro 27 marzo2104Fabio Bolo
 
Ayman Mohamed Reda Cv
Ayman Mohamed Reda CvAyman Mohamed Reda Cv
Ayman Mohamed Reda Cvayman mohamed
 
Kiska india limited led bid
Kiska india limited led bidKiska india limited led bid
Kiska india limited led bidSayak Biswas
 
Аренда склада 1640 м2, Москва, Дмитровское ш., 2 км до МКАД
Аренда склада 1640 м2, Москва, Дмитровское ш., 2 км до МКАДАренда склада 1640 м2, Москва, Дмитровское ш., 2 км до МКАД
Аренда склада 1640 м2, Москва, Дмитровское ш., 2 км до МКАДСклад Менеджмент
 
柏瑞週報20160301
柏瑞週報20160301柏瑞週報20160301
柏瑞週報20160301Pinebridge
 
Overcoming The Impedance Mismatch Between Source Code And Architecture
Overcoming The Impedance Mismatch Between Source Code And ArchitectureOvercoming The Impedance Mismatch Between Source Code And Architecture
Overcoming The Impedance Mismatch Between Source Code And ArchitecturePeter Friese
 
CM10.2 Power Builder Source Control
CM10.2 Power Builder Source ControlCM10.2 Power Builder Source Control
CM10.2 Power Builder Source ControlEric Saperstein
 
Unit Testing
Unit TestingUnit Testing
Unit TestingDotitude
 
Revision control system (RCS)
Revision control system (RCS)Revision control system (RCS)
Revision control system (RCS)Abu Sufian
 
Source code management
Source code managementSource code management
Source code managementWidoyo PH
 
Kewirausahaan
KewirausahaanKewirausahaan
KewirausahaanWidoyo PH
 
Code Generation 2014 - ALF, the Standard Programming Language for UML
Code Generation 2014  - ALF, the Standard Programming Language for UMLCode Generation 2014  - ALF, the Standard Programming Language for UML
Code Generation 2014 - ALF, the Standard Programming Language for UMLJürgen Mutschall
 
Source Code Control System (SCCS)
Source Code Control System (SCCS)Source Code Control System (SCCS)
Source Code Control System (SCCS)Saurabh Kumar
 

Viewers also liked (20)

Cs audizione isfol
Cs audizione isfolCs audizione isfol
Cs audizione isfol
 
Alessandra ponte
Alessandra ponteAlessandra ponte
Alessandra ponte
 
Anunci I i Y
Anunci I i YAnunci I i Y
Anunci I i Y
 
What Matters Now
What Matters NowWhat Matters Now
What Matters Now
 
C.003 task based risk assessment
C.003 task based risk assessmentC.003 task based risk assessment
C.003 task based risk assessment
 
Seduta commissione lavoro 27 marzo2104
Seduta commissione lavoro 27 marzo2104Seduta commissione lavoro 27 marzo2104
Seduta commissione lavoro 27 marzo2104
 
Ayman Mohamed Reda Cv
Ayman Mohamed Reda CvAyman Mohamed Reda Cv
Ayman Mohamed Reda Cv
 
Dl1428
Dl1428Dl1428
Dl1428
 
Kiska india limited led bid
Kiska india limited led bidKiska india limited led bid
Kiska india limited led bid
 
Аренда склада 1640 м2, Москва, Дмитровское ш., 2 км до МКАД
Аренда склада 1640 м2, Москва, Дмитровское ш., 2 км до МКАДАренда склада 1640 м2, Москва, Дмитровское ш., 2 км до МКАД
Аренда склада 1640 м2, Москва, Дмитровское ш., 2 км до МКАД
 
柏瑞週報20160301
柏瑞週報20160301柏瑞週報20160301
柏瑞週報20160301
 
Overcoming The Impedance Mismatch Between Source Code And Architecture
Overcoming The Impedance Mismatch Between Source Code And ArchitectureOvercoming The Impedance Mismatch Between Source Code And Architecture
Overcoming The Impedance Mismatch Between Source Code And Architecture
 
CM10.2 Power Builder Source Control
CM10.2 Power Builder Source ControlCM10.2 Power Builder Source Control
CM10.2 Power Builder Source Control
 
Unit Testing
Unit TestingUnit Testing
Unit Testing
 
Revision control system (RCS)
Revision control system (RCS)Revision control system (RCS)
Revision control system (RCS)
 
Source code management
Source code managementSource code management
Source code management
 
Kewirausahaan
KewirausahaanKewirausahaan
Kewirausahaan
 
Ws002 use cases
Ws002 use casesWs002 use cases
Ws002 use cases
 
Code Generation 2014 - ALF, the Standard Programming Language for UML
Code Generation 2014  - ALF, the Standard Programming Language for UMLCode Generation 2014  - ALF, the Standard Programming Language for UML
Code Generation 2014 - ALF, the Standard Programming Language for UML
 
Source Code Control System (SCCS)
Source Code Control System (SCCS)Source Code Control System (SCCS)
Source Code Control System (SCCS)
 

Similar to ICSM08a.ppt

Application scenarios in streaming oriented embedded-system design
Application scenarios in streaming oriented embedded-system designApplication scenarios in streaming oriented embedded-system design
Application scenarios in streaming oriented embedded-system designMr. Chanuwan
 
NTT DATA Vertex Open2test Overview
NTT DATA Vertex Open2test OverviewNTT DATA Vertex Open2test Overview
NTT DATA Vertex Open2test OverviewJorrit26
 
Ntt data vertex open2 test overview presentation
Ntt data vertex open2 test overview presentationNtt data vertex open2 test overview presentation
Ntt data vertex open2 test overview presentationJorrit26
 
Approximate Semantic Matching of Heterogeneous Events
Approximate Semantic Matching of Heterogeneous EventsApproximate Semantic Matching of Heterogeneous Events
Approximate Semantic Matching of Heterogeneous EventsEdward Curry
 
Scientific and Grid Workflow Management (SGS09)
Scientific and Grid Workflow Management (SGS09)Scientific and Grid Workflow Management (SGS09)
Scientific and Grid Workflow Management (SGS09)Cesare Pautasso
 
Process Project Mgt Seminar 8 Apr 2009(2)
Process Project Mgt Seminar 8 Apr 2009(2)Process Project Mgt Seminar 8 Apr 2009(2)
Process Project Mgt Seminar 8 Apr 2009(2)avitale1998
 
Debs Presentation 2009 July62009
Debs Presentation 2009 July62009Debs Presentation 2009 July62009
Debs Presentation 2009 July62009Opher Etzion
 
Deploying Functional Qualification at STMicroelectronics
Deploying Functional Qualification at STMicroelectronicsDeploying Functional Qualification at STMicroelectronics
Deploying Functional Qualification at STMicroelectronicsDVClub
 
Ow2 Open World Forum09 Trustie Project
Ow2 Open World Forum09 Trustie ProjectOw2 Open World Forum09 Trustie Project
Ow2 Open World Forum09 Trustie ProjectOW2
 
Approximate Semantic Matching of Heterogeneous Events
Approximate Semantic Matching of Heterogeneous EventsApproximate Semantic Matching of Heterogeneous Events
Approximate Semantic Matching of Heterogeneous EventsSouleiman Hasan
 
Sc08 Talk Final
Sc08 Talk FinalSc08 Talk Final
Sc08 Talk Finalmrmeswani
 
Implicit Middleware
Implicit MiddlewareImplicit Middleware
Implicit MiddlewareTill Riedel
 
BA conf presentation 2010
BA conf presentation 2010BA conf presentation 2010
BA conf presentation 2010Julen Mohanty
 
6.Live Framework 和Mesh Services
6.Live Framework 和Mesh Services6.Live Framework 和Mesh Services
6.Live Framework 和Mesh ServicesGaryYoung
 
20111104 s4 overview
20111104 s4 overview20111104 s4 overview
20111104 s4 overviewLeo Neumeyer
 
V Code And V Data Illustrating A New Framework For Supporting The Video Annot...
V Code And V Data Illustrating A New Framework For Supporting The Video Annot...V Code And V Data Illustrating A New Framework For Supporting The Video Annot...
V Code And V Data Illustrating A New Framework For Supporting The Video Annot...GoogleTecTalks
 

Similar to ICSM08a.ppt (20)

Icsm08a.ppt
Icsm08a.pptIcsm08a.ppt
Icsm08a.ppt
 
Application scenarios in streaming oriented embedded-system design
Application scenarios in streaming oriented embedded-system designApplication scenarios in streaming oriented embedded-system design
Application scenarios in streaming oriented embedded-system design
 
DAC 2012
DAC 2012DAC 2012
DAC 2012
 
NTT DATA Vertex Open2test Overview
NTT DATA Vertex Open2test OverviewNTT DATA Vertex Open2test Overview
NTT DATA Vertex Open2test Overview
 
Ntt data vertex open2 test overview presentation
Ntt data vertex open2 test overview presentationNtt data vertex open2 test overview presentation
Ntt data vertex open2 test overview presentation
 
Approximate Semantic Matching of Heterogeneous Events
Approximate Semantic Matching of Heterogeneous EventsApproximate Semantic Matching of Heterogeneous Events
Approximate Semantic Matching of Heterogeneous Events
 
Scientific and Grid Workflow Management (SGS09)
Scientific and Grid Workflow Management (SGS09)Scientific and Grid Workflow Management (SGS09)
Scientific and Grid Workflow Management (SGS09)
 
Process Project Mgt Seminar 8 Apr 2009(2)
Process Project Mgt Seminar 8 Apr 2009(2)Process Project Mgt Seminar 8 Apr 2009(2)
Process Project Mgt Seminar 8 Apr 2009(2)
 
Debs Presentation 2009 July62009
Debs Presentation 2009 July62009Debs Presentation 2009 July62009
Debs Presentation 2009 July62009
 
Benjamin q4 2008_bristol
Benjamin q4 2008_bristolBenjamin q4 2008_bristol
Benjamin q4 2008_bristol
 
Deploying Functional Qualification at STMicroelectronics
Deploying Functional Qualification at STMicroelectronicsDeploying Functional Qualification at STMicroelectronics
Deploying Functional Qualification at STMicroelectronics
 
Ow2 Open World Forum09 Trustie Project
Ow2 Open World Forum09 Trustie ProjectOw2 Open World Forum09 Trustie Project
Ow2 Open World Forum09 Trustie Project
 
Approximate Semantic Matching of Heterogeneous Events
Approximate Semantic Matching of Heterogeneous EventsApproximate Semantic Matching of Heterogeneous Events
Approximate Semantic Matching of Heterogeneous Events
 
Sc08 Talk Final
Sc08 Talk FinalSc08 Talk Final
Sc08 Talk Final
 
Data-Intensive Research
Data-Intensive ResearchData-Intensive Research
Data-Intensive Research
 
Implicit Middleware
Implicit MiddlewareImplicit Middleware
Implicit Middleware
 
BA conf presentation 2010
BA conf presentation 2010BA conf presentation 2010
BA conf presentation 2010
 
6.Live Framework 和Mesh Services
6.Live Framework 和Mesh Services6.Live Framework 和Mesh Services
6.Live Framework 和Mesh Services
 
20111104 s4 overview
20111104 s4 overview20111104 s4 overview
20111104 s4 overview
 
V Code And V Data Illustrating A New Framework For Supporting The Video Annot...
V Code And V Data Illustrating A New Framework For Supporting The Video Annot...V Code And V Data Illustrating A New Framework For Supporting The Video Annot...
V Code And V Data Illustrating A New Framework For Supporting The Video Annot...
 

More from Ptidej Team

From IoT to Software Miniaturisation
From IoT to Software MiniaturisationFrom IoT to Software Miniaturisation
From IoT to Software MiniaturisationPtidej Team
 
Presentation by Lionel Briand
Presentation by Lionel BriandPresentation by Lionel Briand
Presentation by Lionel BriandPtidej Team
 
Manel Abdellatif
Manel AbdellatifManel Abdellatif
Manel AbdellatifPtidej Team
 
Azadeh Kermansaravi
Azadeh KermansaraviAzadeh Kermansaravi
Azadeh KermansaraviPtidej Team
 
CSED - Manel Grichi
CSED - Manel GrichiCSED - Manel Grichi
CSED - Manel GrichiPtidej Team
 
Cristiano Politowski
Cristiano PolitowskiCristiano Politowski
Cristiano PolitowskiPtidej Team
 
Will io t trigger the next software crisis
Will io t trigger the next software crisisWill io t trigger the next software crisis
Will io t trigger the next software crisisPtidej Team
 
Thesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptThesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptPtidej Team
 
Thesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.pptThesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.pptPtidej Team
 

More from Ptidej Team (20)

From IoT to Software Miniaturisation
From IoT to Software MiniaturisationFrom IoT to Software Miniaturisation
From IoT to Software Miniaturisation
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
Presentation by Lionel Briand
Presentation by Lionel BriandPresentation by Lionel Briand
Presentation by Lionel Briand
 
Manel Abdellatif
Manel AbdellatifManel Abdellatif
Manel Abdellatif
 
Azadeh Kermansaravi
Azadeh KermansaraviAzadeh Kermansaravi
Azadeh Kermansaravi
 
Mouna Abidi
Mouna AbidiMouna Abidi
Mouna Abidi
 
CSED - Manel Grichi
CSED - Manel GrichiCSED - Manel Grichi
CSED - Manel Grichi
 
Cristiano Politowski
Cristiano PolitowskiCristiano Politowski
Cristiano Politowski
 
Will io t trigger the next software crisis
Will io t trigger the next software crisisWill io t trigger the next software crisis
Will io t trigger the next software crisis
 
MIPA
MIPAMIPA
MIPA
 
Thesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptThesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.ppt
 
Thesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.pptThesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.ppt
 
Medicine15.ppt
Medicine15.pptMedicine15.ppt
Medicine15.ppt
 
Qrs17b.ppt
Qrs17b.pptQrs17b.ppt
Qrs17b.ppt
 
Icpc11c.ppt
Icpc11c.pptIcpc11c.ppt
Icpc11c.ppt
 
Icsme16.ppt
Icsme16.pptIcsme16.ppt
Icsme16.ppt
 
Msr17a.ppt
Msr17a.pptMsr17a.ppt
Msr17a.ppt
 
Icsoc15.ppt
Icsoc15.pptIcsoc15.ppt
Icsoc15.ppt
 

Recently uploaded

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 

Recently uploaded (20)

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 

ICSM08a.ppt

  • 1. Reuse or Rewrite: Combining Textual, Static, and Dynamic Analyses to Assess the Cost of Keeping a System Up-to-date Giuliano (Giulio) Antoniol, Jane Huffman Hayes, Yann-Gaël Guéhéneuc, and Massimiliano Di Penta ICSM 2008, Beijing 1/28
  • 2. Content Problem statement ReORe Idea ReORe Process Technologies The Lynx Browser Case Study Results Conclusions ICSM 2008, Beijing 2/28
  • 3. The Challenge I own a system Sprop it is becoming obsolete, technology, market shifts, etc.; My competitors have introduced new technologies, new pieces of functionality that my Sprop lacks; I can: maintain Sprop to bring it up-to-date; rewrite a new system from scratch; get out of the market; Retire and go fishing… ICSM 2008, Beijing 3/28
  • 4. The “Reuse” Or “Rewrite” – ReORe idea Leverage knowledge, existing code, and technologies: Use competitors systems and your customer base to define a requirement baseline; Assess if it is worth reusing; Apply state of the art technologies to reduce your assessment costs. Resources have different costs: a junior programmer is not as expensive as the senior architect. Carefully plan the use of resources: minimize the usage of the most valuable resources. ICSM 2008, Beijing 4/28
  • 5. The ReORe Process the browser shall support to zoom unzoom page details Entities ej ej Matches Validated matches between each ri between each ri hthost name get host and a set of ej and a set of ej details hostname Stop-word Removal Manual Stemming TF-IDF Evaluation by Requirements ri Tokenization Developers ri 1 2 3 Validated matches between each ri and a set of ej Manual Scenario Trace Data Evaluation by Execution Architects Scenarios 4 5 ICSM 2008, Beijing 5/28
  • 6. The Company Resources A young developer applying state of the art technology. Two (still pretty young ☺) experts programmers filtering the “young” developer’s findings. The very expensive “guru,” the system architect. ICSM 2008, Beijing 6/28
  • 7. ReORe Technology Standard information retrieval vector space model. Indexing process: Robust source code parsing and fact extractor; Camel case splitting; Stopper; Stemmer; Thesaurus (not vital but helps); TF-IDF indexing. Scenario-based dynamic trace collection and analysis. ICSM 2008, Beijing 7/28
  • 8. PrereqIR (see TR or WCRE’08 paper) We need a pre-requirement document: what the competitor systems do; what our customer base needs. Obtain and vet a list of requirements from diverse stakeholders. Structure the requirements by mapping them into a representation suitable for grouping via pattern- recognition and similarity-based clustering. Analyze the clustered requirements to divide them into a set of essential and a set of optional requirements. ICSM 2008, Beijing 8/28
  • 9. Step 1 – Inexpensive resources Process both: the set of requirements R; the set of entities E extracted from the source code. Any requirement or entity is mapped into a textual representation. Map textual representation into vector space. ICSM 2008, Beijing 9/28
  • 10. Step 2 – Inexpensive resources Trace requirements into code entities. Apply a threshold: to remove low similarity matches; reduce manual verification. Compute the threshold via outlier identification; much likely it is better to get a very tight threshold. Overall, try to reduce the search space of orders of magnitude. ICSM 2008, Beijing 10/28
  • 11. Step 3 – Experts/Developers Manually evaluate the rankings from Step 1 – 2. For each requirement, only retain entities believed to concretely participate in the implementation. Tag selected entities as reusable. This phase is necessarily manual: only the developers can assess, using their knowledge of the system and contextual information, whether an entity really participates in the implementation of requirement. ICSM 2008, Beijing 11/28
  • 12. Step 4 – Inexpensive and patient… Build and execute scenarios: exercise the requirements for which a consensus concerning their implementing entities exist. Develop two kinds of scenarios: Core scenarios: scenarios exercising core functionalities and requirements; Specific scenarios: scenarios specifically devoted to some particular requirement. Use core scenarios to identify the core implementation entities: the infrastructure of the system. Validate if entities mapped in Step 1 – 3 really show up. ICSM 2008, Beijing 12/28
  • 13. Step 5 – Most valuable resource Identifies and validates the entities implementing a requirement. Works on the reduced information produced in Step 1 – 4. While validating data, can also recover missing links and entities. ICSM 2008, Beijing 13/28
  • 14. Case Study Research questions: RQ1: is ReORe able to effectively reduce the information processed in the different steps while ensuring the traceability between requirements and the existing code? RQ2: what is the effort required for the application of ReORe? Context: Reuse components from a text-oriented Web browser (Lynx) to develop a modern browser (e.g., Firefox) Lynx is a text oriented (curses) browser developed in C 91 source files about 147 KLOC and 2074 functions; 156 header and configuration files about 27 KLOC; some nice macros… Requirements: 128 essential; 181 optional. ICSM 2008, Beijing 14/28
  • 15. Threshold Determination Out of 128+181 = 309 × 2,074 = 640,866 links, 138,896 have similarity > 0. Similarity: median 0.012; percentiles 25%: 0.006 – 75%: 0.026 - IQR: 0.021. Standard outlier threshold 6% (0.026 + 3 × 0.021) still too high – gives about 12,000 matches… Not really happy… Take 8/9 times IQR which gives 20%. ICSM 2008, Beijing 15/28
  • 16. Requirement Example “the system shall allow internet cookies to be downloaded in the local file system” Functions Ranks src/LYCookie.c::newCookie 0.3489 src/LYCookie.c::LYProcessSetCookies 0.3088 src/LYCookie.c::LYSetCookie 0.2942 src/LYCookie.c::ARGS1 0.2797 src/LYDownload.c::LYdownload_options 0.2559 Not all requirements have at least one candidate link with similarity greater than 20% ICSM 2008, Beijing 16/28
  • 17. Search Space – First Reduction Input Step 1: 309 × 2,074 = 640,866 Links 138,896 Links with Similarity > 0 Nbr of links Output Step 3: 738 Links with 700000 600000 Similarity > 20% 500000 400000 300000 200000 100000 0 Links Sim>0 Sim>0.2 ICSM 2008, Beijing 17/28
  • 18. Matching Details Set A: at least one match > 0.2 Set B: no match above 0.2 200 180 160 A further problem: 140 120 Average list length Set A 100 Set B 80 60 464 (A) 425 (B) 40 20 0 Essential Optional Overall Essential REQ Mapped Mapped 40 83 68% ICSM 2008, Beijing 18/28
  • 19. The Junior Developer Dilemma Reduce to 738 possible links. Know that only 88 + 98 = 186 requirements have at least one link >0.2, but: each link has an average 450 matches; unsure if the “experts” will vet by hand an average of 450 links… Trick the “experts” just take the top 5 matches; add another 5 matches extracted from the list with uniform distribution. Perhaps discover missed links, at least make sure experts do not cheat… ICSM 2008, Beijing 19/28
  • 20. Step 3 – Expert Manual Decision Set Ranking Yes No Only Range One Yes A 1-5 128 226 324 6-10 2 790 20 B 1-5 25 407 69 6-10 0 570 10 Experts agree more often on top five in set A! Also agree most of the time for a NO on set B ICSM 2008, Beijing 20/28
  • 21. Step 4 – Core functionalities Any browser will handle HTTP, SSL, and FTP Scenario Number of Functions HTTP 382 SSL 137 FTP 28 ICSM 2008, Beijing 21/28
  • 22. Step 4 – Dynamic Analysis Filter Set Ranking Retained Discarded Range A 1-5 77 51 6-10 1 1 B 1-5 10 15 6-10 - - ICSM 2008, Beijing 22/28
  • 23. Step 5 – Architect Decision Set Ranking Retained Discarded Added Range A 1-5 73 4 12 6-10 1 - - B 1-5 9 1 1 3 are 6-10 - - - duplicate reqt. Overall 83 13 . Requirement Set A Set B Essential (128) 23 5 (2) Optional (181) 23 4 Overall 46 9 Overall: 25 essential requirements ICSM 2008, Beijing and 27 optional requirements. 23/28
  • 24. Accuracy Accuracy Dynamic Analysis Final Inspection Set A 1-5 60% 95% 6-10 50% - Set B 1-5 40% 90% 6-10 - - Overall 56% 94% Each step plays the role of the Oracle for previous steps. Accuracy: the percentage of matches for which an agreement between any two subsequent steps was found. It is representative of the effort saved. ICSM 2008, Beijing 24/28
  • 25. Effort (RQ2) Young Developer 10 Still Young 25 Young 4 Scenarios 40 Architect 10 Architect Young Developer 12% 12% Young Developer Still Young Still Young 29% Young 4 Scenarios Architect Young 4 Scenarios 47% ICSM 2008, Beijing 25/28
  • 26. Discussion Clearly there is manual activity involved. Company knowledge should often be sufficient. Some of the functions contribute to core requirements as well as to specific requirements (the switch course). Only one function in Set B was not in Set A: don’t even try to collect if similarity is low. Overall, we spent 85 – 90 hours of work. ICSM 2008, Beijing 26/28
  • 27. Threats to validity External validity (generalization of findings): Particular case study adopted → results may or may not be similar for other systems; Participants were from academia; Construct validity (relationship theory-observation): Thresholds play a big role in ranking requirements; Experimented different thresholds; Dynamic analysis depends upon test cases; Final validation aims at reducing construct validity threats; Internal validity (external factors might influence results): Multiple steps with different techniques reduce the degree of subjectivity; Architect involved in the last step was not aware of how the previous steps were performed; Reliability validity (repeatability of results): Process fully detailed; System available; Other data available upon request. ICSM 2008, Beijing 27/28
  • 28. Conclusions ReORe uses standard state of the art technologies: Eases the task of deciding “Reuse” Or “Rewrite”. ReORe effectively filters information reducing the manual intervention of most expensive resources. For our case study (Lynx): it doesn’t pay-off investigating low similarity matches; only 25 % of code can be reused and only 25 essential requirements out of 128 have some initial implementation. ICSM 2008, Beijing 28/28
  • 29. The longer you wait, the worst it gets Questions ? ICSM 2008, Beijing 29/28