Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Delving (Smalltalk) Source Code
   with Formal Concept Analysis



 Pr. Kim Mens           Dr. Tom Tourwé
 INGI / UCL     ...
Overview
     Research goal
 

     A crash course on formal concept analysis
 

     Delving Smalltalk source code with...
Research Goal
     A lightweight source-code mining tool
 
      –  get “initial understanding” of structure of software ...
A crash course on FCA — example

                    object-                                       static   dynamic
      ...
A crash course on FCA — theory
A.  Starts from
      –  a set of elements
      –  a set of properties of those elements
 ...
A. Incidence table

                    object-                                       static   dynamic
                   ...
B. Concept 1

                    object-                                       static   dynamic
                         ...
B. Concept 2

                    object-                                       static   dynamic
                         ...
B. Concept 3

                    object-                                       static   dynamic
                         ...
B. More concepts …

                    object-                                       static   dynamic
                   ...
B. More concepts …

                    object-                                       static   dynamic
                   ...
B. More concepts …

                    object-                                       static   dynamic
                   ...
B. Concepts

                    object-                                       static   dynamic
                          ...
C. Concept Lattice




September 6, 2004   ESUG 2004 Research Track   14
Delving ST source code with FCA
     Elements : classes, methods, argument names
 

     Properties : substrings of class...
Foo            Zork
                                                                             import: aFoo




Delving ...
DelfSTof, our Code Delving Tool




September 6, 2004   ESUG 2004 Research Track   17
Foo            Zork
                                                                                     import: aFoo




...
Foo              Zork
                                                                                                    ...
Foo              Zork
                                                                                                    ...
Foo            Zork
                                                                                             import: a...
Foo



4. Classification,
                                                                                                ...
Foo            Zork
                                                                                             import: a...
Foo



Discovered “indications”
                                                                                   Zork
  ...
Conclusion
     Current status : feasibility study
 
       –  Approach produced relevant results
       –  Efficiency is...
Upcoming SlideShare
Loading in …5
×

Delving Source Code with Formal Concept Analysis

822 views

Published on

Presentation of a paper "Delving (Smalltalk) Source Code with Formal Concept Analysis" presented at the ESUG 2004 research track.

Getting an initial understanding of the structure of a software system, whether it is for software maintenance, evolution or reengineering purposes, is a nontrivial task. We propose a lightweight approach to delve a system's source code automatically and efficiently for relevant concepts of interest: what concerns are addressed in the code, what patterns, coding idioms and conventions have been adopted, and where and how are they implemented. We use formal concept analysis to do the actual source-code mining, and then filter, classify and combine the results to present them in a format that is more convenient to a software engineer. We applied a prototype tool that implements this approach to several small to medium-sized Smalltalk applications. For each of these, the tool uncovered several design pattern instances, coding and naming conventions, refactoring opportunities and important domain concepts. Although the tool and approach can still be improved in many ways, the tool does already provides useful results when trying to get an initial understanding of a system. The obtained results also illustrate the relevance and feasibility of using formal concept analysis as an efficient technique for source-code mining.

Published in: Science, Technology
  • Be the first to comment

Delving Source Code with Formal Concept Analysis

  1. 1. Delving (Smalltalk) Source Code with Formal Concept Analysis Pr. Kim Mens Dr. Tom Tourwé INGI / UCL SEN / CWI Monday, September 6, 2004
  2. 2. Overview Research goal   A crash course on formal concept analysis   Delving Smalltalk source code with FCA   Experiments   Results   Conclusion   September 6, 2004 ESUG 2004 Research Track 2
  3. 3. Research Goal A lightweight source-code mining tool   –  get “initial understanding” of structure of software system –  detect recurring patterns in the source code Formal concept analysis (FCA)   –  A mathematical technique –  Known applications in data analysis and knowledge processing Can we use FCA to delve code for indications of patterns?   –  Coding conventions –  Programming idioms and design patterns –  Opportunities for refactoring –  Relevant domain concepts September 6, 2004 ESUG 2004 Research Track 3
  4. 4. A crash course on FCA — example object- static dynamic functional logic oriented typing typing C++ Java Find relevant taxonomy of programming languages Smalltalk based on their common properties Scheme Prolog September 6, 2004 ESUG 2004 Research Track 4
  5. 5. A crash course on FCA — theory A.  Starts from –  a set of elements –  a set of properties of those elements –  incidence table B.  Determines concepts –  Maximal groups of elements and properties –  Group: •  Every element of the concept has those properties •  Every property of the concept holds for those elements –  Maximal •  No other element (outside the concept) has those same properties •  No other property (outside the concept) is shared by all elements C.  Organizes these concepts in a lattice structure September 6, 2004 ESUG 2004 Research Track 5
  6. 6. A. Incidence table object- static dynamic functional logic oriented typing typing C++ X - - X - Java X - - X - Smalltalk X - - - X Scheme - X - - X Prolog - - X - X September 6, 2004 ESUG 2004 Research Track 6
  7. 7. B. Concept 1 object- static dynamic functional logic oriented typing typing C++ X - - X - Java X - - X - Smalltalk X - - - X Scheme - X - - X Prolog - - X - X September 6, 2004 ESUG 2004 Research Track 7
  8. 8. B. Concept 2 object- static dynamic functional logic oriented typing typing C++ X - - X - Java X - - X - Smalltalk X - - - X Scheme - X - - X Prolog - - X - X September 6, 2004 ESUG 2004 Research Track 8
  9. 9. B. Concept 3 object- static dynamic functional logic oriented typing typing C++ X - - X - Java X - - X - Smalltalk X - - - X Scheme - X - - X Prolog - - X - X September 6, 2004 ESUG 2004 Research Track 9
  10. 10. B. More concepts … object- static dynamic functional logic oriented typing typing C++ X - - X - Java X - - X - Smalltalk X - - - X Scheme - X - - X Prolog - - X - X September 6, 2004 ESUG 2004 Research Track 10
  11. 11. B. More concepts … object- static dynamic functional logic oriented typing typing C++ X - - X - Java X - - X - Smalltalk X - - - X Scheme - X - - X Prolog - - X - X September 6, 2004 ESUG 2004 Research Track 11
  12. 12. B. More concepts … object- static dynamic functional logic oriented typing typing C++ X - - X - Java X - - X - Smalltalk X - - - X Scheme - X - - X Prolog - - X - X September 6, 2004 ESUG 2004 Research Track 12
  13. 13. B. Concepts object- static dynamic functional logic oriented typing typing C++ X - - X - Java X - - X - Smalltalk X - - - X Scheme - X - - X Prolog - - X - X September 6, 2004 ESUG 2004 Research Track 13
  14. 14. C. Concept Lattice September 6, 2004 ESUG 2004 Research Track 14
  15. 15. Delving ST source code with FCA Elements : classes, methods, argument names   Properties : substrings of classes, methods, …   The “Foo” concept Foo Zork import: aFoo Bar asFoo: anObject September 6, 2004 ESUG 2004 Research Track 15
  16. 16. Foo Zork import: aFoo Delving source code Bar asFoo: anObject 1.  Generate the formal context •  Elements, properties & incidence relation 2.  Concept Analysis •  Calculate the formal concepts •  Organize them into a concept lattice 3.  Filtering •  Remove irrelevant concepts (false positives, noise, useless, …) 4.  Classify, combine and annotate concepts •  In a way that is more easy for a software engineer to interpret September 6, 2004 ESUG 2004 Research Track 16
  17. 17. DelfSTof, our Code Delving Tool September 6, 2004 ESUG 2004 Research Track 17
  18. 18. Foo Zork import: aFoo 1. Generate formal context Bar asFoo: anObject We want to group elements that share a substring   As elements we collect   –  all classes, methods and parameters –  in some package(s) of interest As properties : “relevant” substrings of element names   –  Normalisation : •  extract terms based on where uppercases occur •  convert to lower case and remove special characters like ‘:’ •  QuotedCodeConstant { quoted, code, constant } → –  Elimination of stopwords : with, do, object –  Stemming : reduce words to their root Incidence relation : An element has a certain property if   –  It has the substring in its name September 6, 2004 ESUG 2004 Research Track 18
  19. 19. Foo Zork import: aFoo 2. Concept Analysis Bar asFoo: anObject unify index env source message functor variable … Object>>unifyWithObject: inEnv: X X X X - - … myIndex: hisIndex: inSource: Variable>>unifyWithMessageFunctor: X X X X X X - … inEnv: myIndex: hisIndex: inSource: AbstractTerm>>unifyWith: inEnv: X X X X - - - … myIndex: hisIndex: inSource: AbstractTerm>>unifyWithVariable: X X X X - X X … inEnv: myIndex: hisIndex: inSource: … X X X X … … … … September 6, 2004 ESUG 2004 Research Track 19
  20. 20. Foo Zork import: aFoo 2. Concept Analysis Bar asFoo: anObject unify index env source message functor variable … Object>>unifyWithObject: inEnv: X X X X - - … myIndex: hisIndex: inSource: Variable>>unifyWithMessageFunctor: X X X X X X - … inEnv: myIndex: hisIndex: inSource: AbstractTerm>>unifyWith: inEnv: X X X X - - - … myIndex: hisIndex: inSource: AbstractTerm>>unifyWithVariable: X X X X - X X … inEnv: myIndex: hisIndex: inSource: … X X X X … … … … September 6, 2004 ESUG 2004 Research Track 20
  21. 21. Foo Zork import: aFoo 3. Filtering Bar asFoo: anObject Preprocessing to filter irrelevant properties :   –  with little meaning : “do”, “with”, “for”, “from”, “the”, “ifTrue”, … –  too small (< 3 chars) –  ignore plurals, uppercase and colons Extra filtering   –  Drop top & bottom concept when empty –  Drop concepts with two elements are less –  Drop concepts that group only classes More filtering needed (ongoing work)   –  Recombine substrings belonging together –  Require some minimal coverage of element name by properties –  Concepts higher in the lattice may be more relevant (more properties) –  Avoid redundancy in discovered concepts •  Make better use of the lattice structure (Now it is “flattened”) September 6, 2004 ESUG 2004 Research Track 21
  22. 22. Foo 4. Classification, Zork import: aFoo Bar asFoo: anObject Combination & Annotation Annotate concepts with their properties   –  i.e. with the substring(s) shared by their elements Classification   –  Single class concepts •  Elements are methods (or their parameters) in that class –  Hierarchy concepts •  Group classes, methods and parameters in same class hierarchy •  Annotate concept with root of hierarchy •  Annotate methods with implementing class –  Crosscutting concepts •  When two different class hierarchies are involved Combine concepts   –  that belong together (subconcept relationship) Group methods   –  belonging to the same class September 6, 2004 ESUG 2004 Research Track 22
  23. 23. Foo Zork import: aFoo Quantitative results Bar asFoo: anObject Case study #elements #properties #raw #filtered time (sec) DelfSTof 802 (135) 247 650 131 5 StarBrowser 731 (52) 352 740 115 7 Soul 1488 (111) 438 1206 284 22 CodeCrawler 1370 (93) 477 1419 327 24 Ref.Browser 4834 (271) 736 4228 1243 447 Time to compute = a few seconds / minutes    properties  <  elements  is a good sign   Still too much concepts remain after filtering   Upperlimit: theoretical < 2min(#elements, #properties); experimental < #elements September 6, 2004 ESUG 2004 Research Track 23
  24. 24. Foo Discovered “indications” Zork import: aFoo Bar asFoo: anObject of patterns Code duplication   Design patterns   –  Visitor, Abstract Factory, Builder, Observer Programming idioms   –  Accessing methods, chained messages, delegating methods, polymorphism Relevant domain concepts   –  Correspond to frequently occurring properties –  “Unification”, “Bindings”, “Horn clauses”, “resolution” Opportunities for refactoring   Some crosscutting concerns   September 6, 2004 ESUG 2004 Research Track 24
  25. 25. Conclusion Current status : feasibility study   –  Approach produced relevant results –  Efficiency is acceptable –  Tool needs refinement •  More advanced filtering ; extra checking a posteriori Future work : applying FCA to delve source code for   –  aspects and crosscutting concerns •  based on “generic parse trees” •  by using an incidence relation that represents “message sends” –  refactoring opportunities –  Both Smalltalk and Java source code September 6, 2004 ESUG 2004 Research Track 25

×