Toward a Recommendation System for focusing Testing


Published on

Presentation by Segla Kpodjedo at the RSSE 2008 Workshop.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Toward a Recommendation System for focusing Testing

    1. 1. Sègla Kpodjedo, Filippo Ricca, Philippe Galinier and Giuliano (Giulio) Antoniol RSSE 2008, Atlanta   Toward a Recommendation System for focusing Testing
    2. 2. Outline <ul><li>The Challenge </li></ul><ul><li>Related work </li></ul><ul><li>Our Approach </li></ul><ul><ul><li>ClassRank </li></ul></ul><ul><ul><li>ECGM and Evolution Cost </li></ul></ul><ul><li>The Case Study: Mozilla </li></ul><ul><li>Results </li></ul><ul><li>Discussion </li></ul><ul><li>Conclusion </li></ul>RSSE 2008, Atlanta
    3. 3. The Challenge <ul><li>Context: An OO software project </li></ul><ul><li>Software Testing: </li></ul><ul><ul><li>Efforts aimed at providing information about the quality of a software </li></ul></ul><ul><ul><li>Necessary compromise between information accuracy and available resources (computation power, time …) </li></ul></ul><ul><li>Recommend key classes to focus on </li></ul>RSSE 2008, Atlanta
    4. 4. Related Work <ul><li>Ostrand et al. </li></ul><ul><ul><li>Method: negative binomial regression model </li></ul></ul><ul><ul><li>Metrics: </li></ul></ul><ul><ul><ul><li>LOC (Lines Of Code) </li></ul></ul></ul><ul><ul><ul><li>File size, number of faults in earlier releases, number of changes etc. </li></ul></ul></ul><ul><ul><li>Result: identify the top 20% most fault-prone files </li></ul></ul><ul><li>Nagappan </li></ul><ul><ul><li>Method: logistic regression technique </li></ul></ul><ul><ul><li>Metrics </li></ul></ul><ul><ul><ul><li>dependency graphs between software components </li></ul></ul></ul><ul><ul><ul><li>code churns (i.e., changes) of the components between subsequent versions (delta LOCs, changed files and number of changes) </li></ul></ul></ul><ul><li>Limitations </li></ul><ul><ul><li>Strong prior information and heavy infrastructure </li></ul></ul><ul><ul><li>Large subset of returned files (20% may be too much) </li></ul></ul>RSSE 2008, Atlanta
    5. 5. Our Approach <ul><li>Assumptions </li></ul><ul><ul><li>(i) “the most important classes” must be tested more deeply </li></ul></ul><ul><ul><li>(ii) frequently changed classes are the most complex and then the most fault-prone </li></ul></ul><ul><li>Metrics </li></ul><ul><ul><li>PageRank score: relative importance of each class in a system accounting for overall structure of relations among classes </li></ul></ul><ul><ul><li>« Evolution Cost »: quantifies the amount of change for a class and its relations in a time frame. </li></ul></ul><ul><li>Grid of criticality </li></ul><ul><ul><li>X axis: Evolution Cost </li></ul></ul><ul><ul><li>Y axis: ClassRank </li></ul></ul>RSSE 2008, Atlanta
    6. 6. Class Diagrams are labeled graphs RSSE 2008, Atlanta Classes: Nodes labeled with properties (class name, attributes, methods …) Relations: Labeled Edges (i.e., association, aggregation or inheritance)
    7. 7. Random Walks and “ClassRank” <ul><li>Given a directed graph, what is the probability of being in a given node at a time t? </li></ul><ul><li>A possibility of answer: Random Walks </li></ul><ul><ul><li>proceeding from node to node using the arcs. </li></ul></ul><ul><li>A very efficient RW algorithm: PageRank </li></ul><ul><ul><li>measures the relative importance of each element of a hyperlinked set and assigns it a numerical weighting </li></ul></ul><ul><li>ClassRank </li></ul><ul><ul><li>PageRank applied to class diagrams </li></ul></ul>RSSE 2008, Atlanta
    8. 8. Evolution Cost RSSE 2008, Atlanta Snapshot N Snapshot N-1 Snapshot 1 1 2 3 4 5 … 6 2 8
    9. 9. Error Correcting Graph Matching G 1 G 2 M G1 M G2 D G1 I G2 <ul><ul><li>An ECGM indicate edit operations transforming </li></ul></ul><ul><ul><li>the first graph into the second </li></ul></ul><ul><ul><ul><li>Deletion of D G1 : the nodes and all their adjacent edges </li></ul></ul></ul><ul><ul><ul><li>Insertion of I G2 : the nodes and all their adjacent edges </li></ul></ul></ul><ul><ul><ul><li>Matching M G1 to M G2 => « errors » </li></ul></ul></ul>Visualisation of an ECGM RSSE 2008, Atlanta M G1 M G2 D G1 I G2
    10. 10. ECGM Costs <ul><ul><li>« Errors » and related costs </li></ul></ul><ul><ul><li>Node matching errors (M G1 to M G2 ) </li></ul></ul><ul><ul><ul><li>Dissimilarity of related information (labels) </li></ul></ul></ul><ul><ul><li>Edge Matching errors (M G1 to M G2 ) </li></ul></ul><ul><ul><ul><li>Structural error : a relation present in only one graph </li></ul></ul></ul><ul><ul><ul><li> or </li></ul></ul></ul><ul><ul><ul><li>Label error : different relations from one graph to the other </li></ul></ul></ul>xyz yxw a b C nm C es C el <ul><ul><li>* C nd , C ni , C ed , C ed </li></ul></ul>RSSE 2008, Atlanta
    11. 11. Node Cost RSSE 2008, Atlanta <ul><ul><li>How we assign a cost to a matched node </li></ul></ul><ul><ul><li>Internal Changes </li></ul></ul><ul><ul><li>Cost of the dissimilarity between information from matched nodes </li></ul></ul><ul><ul><li>+ </li></ul></ul><ul><ul><li>Structural changes </li></ul></ul><ul><ul><li>Cost of changes for outgoing edges </li></ul></ul><ul><ul><ul><li>Deleted from the first graph </li></ul></ul></ul><ul><ul><ul><li>Inserted in the second graph </li></ul></ul></ul><ul><ul><ul><li>Matched with an other edge </li></ul></ul></ul><ul><ul><li>Incoming edges are not changes within the node </li></ul></ul>
    12. 12. Evolution Cost RSSE 2008, Atlanta Snapshot N Snapshot N-1 Snapshot 1 1 2 3 4 5 … 6 2 8
    13. 13. Case Study <ul><li>Software project: Mozilla suite </li></ul><ul><ul><li>open-source suite implementing aWeb browser and other tools such as mailers and newsreaders. </li></ul></ul><ul><li>Data Collection </li></ul><ul><ul><li>CVS repository each 15 days through 2007 </li></ul></ul><ul><ul><li>24 Reverse engineered class diagrams </li></ul></ul><ul><li>RandomWalk Implementation </li></ul><ul><ul><li>PageRank </li></ul></ul><ul><li>ECGM Implementation </li></ul><ul><ul><li>Tabu Algorithm </li></ul></ul>RSSE 2008, Atlanta
    14. 14. Mozilla Case Study: Results RSSE 2008, Atlanta
    15. 15. Case Study: Discussion <ul><li>Grid for assessing the criticality of any class splitted in 4 areas </li></ul><ul><ul><li>Upper Left: High Rank Low Cost </li></ul></ul><ul><ul><li>Upper Right: High Rank High Cost </li></ul></ul><ul><ul><li>Lower Left: Low Rank Low Cost </li></ul></ul><ul><ul><li>Lower Right: Low Rank High Cost </li></ul></ul><ul><li>Pareto Front </li></ul><ul><li>Additional information about restructuring </li></ul><ul><ul><li>Evolution of ClassRank through history (dramatic changes) </li></ul></ul>RSSE 2008, Atlanta
    16. 16. Conclusion <ul><li>A RS aiming at identifying critical classes in an OO application. </li></ul><ul><ul><li>Critical classes: frequently subject to changes and with high connectivity with other classes. </li></ul></ul><ul><li>Preliminary study </li></ul><ul><ul><li>No empirical evidence of a correlation with error proneness, severity or priority of defects </li></ul></ul><ul><ul><li>Assumptions highly worth exploring </li></ul></ul><ul><li>Future work </li></ul><ul><ul><li>Study correlation between our index and bug severity/priority </li></ul></ul><ul><ul><li>Implement an Eclipse plug-in supporting the approach </li></ul></ul>RSSE 2008, Atlanta