Component Search  and Retrieval Advanced Reuse Seminars Eduardo Cruz
Information Retrieval - 1948 <ul><li>Structured Documents </li></ul><ul><li>Unstructured Documents </li></ul><ul><ul><li>N...
Mooers' Law: “An information retrieval system will tend not to be used whenever it is more  painful and troublesome for a ...
Mass Production Software components   [Mcllroy, 1968]
<ul><li>“ software industry is weakly founded,  </li></ul><ul><li>and that one aspect of this weakness  </li></ul><ul><li>...
<ul><li>“ The storage and retrieval of software assets  </li></ul><ul><li>is nothing but a specialized form of  </li></ul>...
Software Library <ul><li>Browsing – Inspecting without a predefined criterion </li></ul><ul><li>Retrieval – Satisfy a pred...
Classification Scheme <ul><li>Facet-based </li></ul><ul><ul><li>Better than hierarchical classification </li></ul></ul><ul...
Recall and Precision <ul><li>High Precision – Most retrieved elements are relevant </li></ul><ul><li>High Recall – Few ele...
Asset Representation <ul><li>Library representation is made in full knowledge of the artifact. User representation is made...
Asset retrieval Goals <ul><li>Exact retrieval – Black box reuse </li></ul><ul><li>Approximate retrieval – White box reuse ...
Usually non included information <ul><li>Interface description </li></ul><ul><li>Non-functional requirements </li></ul><ul...
Situational Model x System Model Component retrieval model [Lucrédio et. al , 2004 ]
<ul><li>“ Repository representation is made  </li></ul><ul><li>in full knowledge of the artifact at hand” </li></ul><ul><l...
Scott Henninger
Tools
Component Search Tools <ul><li>Web </li></ul><ul><ul><li>Delphi Search Engine </li></ul></ul><ul><ul><li>Ispey </li></ul><...
 
Delphi Search Engine
Ispey.com
SPARS-J – (2003) Filter
SourceBank Filter
CSourceSearch.Net – (2004)
Koders.com – (2004)
CODASE – Launched Sep 9, 2005 Example Searches Browsing Multiple Search Options “… based on the number of people in your c...
CODASE - Browsing
Other Tools
 
AGORA - Location and Indexing (1998) INTERNET AltaVista Search Index Server Filter INDEX AltaVista  Query Server Web Serve...
Component Rank (1998) V 1 V 3 V 2 0.2 0.2 0.2 0.2 0.4 0.4 0.4 D12 = 0.5 D13 = 0.5 D23 = 1 D31 = 1 Nodes v Edges e Graph G ...
<ul><li>“ Classes defining data structures and their containers are highly ranked” </li></ul>
Clustered Component Graph V1 ≡  V4  ,  V2  ≡  V6 V7 V’26 V’14 V’5 V’3 V3 V2 V1 V7 V6 V4 V5
<ul><li>NO MORE  </li></ul><ul><li>MULTIPLE  </li></ul><ul><li>DISCONNECTED  </li></ul><ul><li>COMPONENTS </li></ul>V3 V2 ...
Component Rank System Architecture .java file ≡ component (1) Similarity  Measurement (2) Clustering (3) Use Relation  Ext...
Simple Copied  Components Copied  Components Other Components Non-clustered component Graph 1/4 Clustering Before Weight C...
<ul><li>DO NOT COUNT </li></ul><ul><li>SIMPLY DUPLICATED  </li></ul><ul><li>COMPONENTS </li></ul>
Copied AND MODIFIED  Components Copied and Modified Components Other Components Non-clustered component Graph Clustering B...
Beyond Searching and Browsing <ul><li>Searching and browsing </li></ul><ul><ul><li>Require users to initiate the informati...
CodeBroker – (2001) <ul><li>Components repositories are often so large that software developers cannot learn about all of ...
<ul><li>May not have suficient knowledge about the reuse repository </li></ul><ul><li>May perceive that reuse costs more t...
Information Islands Belief Vaguely Known Well  Known  L4: Entire Information Space Unknown components
CodeBroker L3: Belief L2: Vaguely Known L1:  Well  Known L4: Entire Information Space Information Use: L1 – Use by Memory ...
Program Aspects <ul><li>Concept </li></ul><ul><ul><li>Formal </li></ul></ul><ul><ul><li>Informal  </li></ul></ul><ul><ul><...
Information delivery <ul><li>Feedback </li></ul><ul><ul><li>After execution of the action </li></ul></ul><ul><li>Feedforwa...
Information delivery <ul><li>Interruptive </li></ul><ul><li>Noninterruptive  </li></ul>
Latent Semantic Analysis (LSA) <ul><li>Synonymy  </li></ul><ul><li>Polysemy  </li></ul><ul><li>“ Text documents and querie...
 
Comments signature Discourse model User model
Koders Enterprise – (2004)
M.A.R.A.C.A.T.U. –  M odern  A rchitecture for  R etrieving  A ll  C omponents  A t  T he  U niverse (2005)
Using Structural Context  to Recommend Source  Code Examples Reid Holmes and Gail C. Murphy University of British Columbia...
The Problem: A Concrete Example <ul><li>Frameworks can improve developer productivity.  But developers can become stuck tr...
Project Repository Development Environment Using Structural Context to Recommend Source  Code Examples -  Reid Holmes and ...
Strathcona: Extract Structural Context ViewPart SampleView setMessage(String) IStatusLineManager setMessage(String)
Strathcona: Example Navigation <ul><li>Visual representation </li></ul><ul><ul><li>Highlights key relationships between ex...
Strathcona: Viewing Example Source <ul><li>Code view </li></ul><ul><ul><li>Example shows how to get a status line manager ...
Conclusion <ul><li>Information Delivery </li></ul><ul><li>Similarity Analyser </li></ul><ul><li>Ranking – Metrics </li></u...
References <ul><li>[McIlroy, 1968] M. D. McIlroy,  Mass Produced Software Components  , NATO Software Engineering Conferen...
Bibliography <ul><li>[Inoue, 2003] K. Inoue et al.: &quot; Component Rank: Relative Significance Rank for Software Compone...
“ Imperfect technology in a working market is sustainable;  perfect technology without any market will vanish” [Szyperski,...
Upcoming SlideShare
Loading in …5
×

Component Search and Retrieval

1,558 views
1,471 views

Published on

Component Search an Retrieval

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,558
On SlideShare
0
From Embeds
0
Number of Embeds
46
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Component Search and Retrieval

    1. 1. Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz
    2. 2. Information Retrieval - 1948 <ul><li>Structured Documents </li></ul><ul><li>Unstructured Documents </li></ul><ul><ul><li>No software documentation standard </li></ul></ul><ul><li>Semi-Structured Documents </li></ul>Calvin Northrup Mooers
    3. 3. Mooers' Law: “An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have it,” 1959 Calvin Northrup Mooers
    4. 4. Mass Production Software components [Mcllroy, 1968]
    5. 5. <ul><li>“ software industry is weakly founded, </li></ul><ul><li>and that one aspect of this weakness </li></ul><ul><li>is the absence of a </li></ul><ul><li>software components subindustry” </li></ul><ul><li>[McIlroy, 1968] </li></ul>
    6. 6. <ul><li>“ The storage and retrieval of software assets </li></ul><ul><li>is nothing but a specialized form of </li></ul><ul><li>information storage and retrieval” </li></ul><ul><li>[Mili, 1998] </li></ul>
    7. 7. Software Library <ul><li>Browsing – Inspecting without a predefined criterion </li></ul><ul><li>Retrieval – Satisfy a predefined matching criterion </li></ul>
    8. 8. Classification Scheme <ul><li>Facet-based </li></ul><ul><ul><li>Better than hierarchical classification </li></ul></ul><ul><ul><li>Manual classification different facets </li></ul></ul><ul><ul><li>Automatic classification </li></ul></ul><ul><ul><ul><li>Controlled Vocabulary </li></ul></ul></ul><ul><ul><ul><ul><li>Semantic information </li></ul></ul></ul></ul><ul><ul><ul><li>Uncontrolled Vocabulary </li></ul></ul></ul><ul><ul><ul><ul><li>Big software libraries </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Little or no descriptors </li></ul></ul></ul></ul>
    9. 9. Recall and Precision <ul><li>High Precision – Most retrieved elements are relevant </li></ul><ul><li>High Recall – Few elements left behind </li></ul><ul><li>Spreading Activation ( Relaxed Search ) – Related matches are retrieved </li></ul><ul><li>Coverage – The average number of assets that are visited over the total size of the library </li></ul>
    10. 10. Asset Representation <ul><li>Library representation is made in full knowledge of the artifact. User representation is made in ignorance of the artifact </li></ul><ul><li>Asset representation is purposefully abstract to capture important features while overlooking miner or irrelevant details </li></ul><ul><li>Asset's surrogate is used in retrieval literature </li></ul>
    11. 11. Asset retrieval Goals <ul><li>Exact retrieval – Black box reuse </li></ul><ul><li>Approximate retrieval – White box reuse </li></ul><ul><ul><li>Generative modification – Reusing the design </li></ul></ul><ul><ul><li>Compositional modification – using building blocks of the retrieved asset </li></ul></ul>
    12. 12. Usually non included information <ul><li>Interface description </li></ul><ul><li>Non-functional requirements </li></ul><ul><li>Interoperability </li></ul>
    13. 13. Situational Model x System Model Component retrieval model [Lucrédio et. al , 2004 ]
    14. 14. <ul><li>“ Repository representation is made </li></ul><ul><li>in full knowledge of the artifact at hand” </li></ul><ul><li>“ User representation is made </li></ul><ul><li>in ignorance of the artifact” </li></ul><ul><li>[Mili, 1998] </li></ul>
    15. 15. Scott Henninger
    16. 16. Tools
    17. 17. Component Search Tools <ul><li>Web </li></ul><ul><ul><li>Delphi Search Engine </li></ul></ul><ul><ul><li>Ispey </li></ul></ul><ul><ul><li>CSourceSearch.net (2004) </li></ul></ul><ul><ul><li>Gonzui </li></ul></ul><ul><ul><li>SourceBank </li></ul></ul><ul><ul><li>Koders (2004) </li></ul></ul><ul><ul><li>Codase (2005) </li></ul></ul><ul><li>Aplications </li></ul><ul><ul><li>Agora (1998) </li></ul></ul><ul><ul><li>Codebroker (2002) </li></ul></ul><ul><ul><li>Koders Enterprise (2004) </li></ul></ul><ul><ul><li>Maracatu (2005) </li></ul></ul>
    18. 19. Delphi Search Engine
    19. 20. Ispey.com
    20. 21. SPARS-J – (2003) Filter
    21. 22. SourceBank Filter
    22. 23. CSourceSearch.Net – (2004)
    23. 24. Koders.com – (2004)
    24. 25. CODASE – Launched Sep 9, 2005 Example Searches Browsing Multiple Search Options “… based on the number of people in your company, starting from $5,000 USD ”
    25. 26. CODASE - Browsing
    26. 27. Other Tools
    27. 29. AGORA - Location and Indexing (1998) INTERNET AltaVista Search Index Server Filter INDEX AltaVista Query Server Web Server JavaBeans Agent JavaBeans Introspector JavaBeans Agent JavaBeans Introspector JavaBeans Agent JavaBeans Introspector
    28. 30. Component Rank (1998) V 1 V 3 V 2 0.2 0.2 0.2 0.2 0.4 0.4 0.4 D12 = 0.5 D13 = 0.5 D23 = 1 D31 = 1 Nodes v Edges e Graph G Weight w Distribution Ratio d
    29. 31. <ul><li>“ Classes defining data structures and their containers are highly ranked” </li></ul>
    30. 32. Clustered Component Graph V1 ≡ V4 , V2 ≡ V6 V7 V’26 V’14 V’5 V’3 V3 V2 V1 V7 V6 V4 V5
    31. 33. <ul><li>NO MORE </li></ul><ul><li>MULTIPLE </li></ul><ul><li>DISCONNECTED </li></ul><ul><li>COMPONENTS </li></ul>V3 V2 V1 V7 V6 V4 V5
    32. 34. Component Rank System Architecture .java file ≡ component (1) Similarity Measurement (2) Clustering (3) Use Relation Extraction (4) Component Graph Construction (5) Component Rank Computation by Repetition (6) De-Clustering to Original Component Graph INPUT OUTPUT Order of Weights ≡ Component Rank of .java files
    33. 35. Simple Copied Components Copied Components Other Components Non-clustered component Graph 1/4 Clustering Before Weight Computation 1/4 1/4 1/4 1/3 Clustering After Weight Computation 1/3 1/6 1/6 A B A B X Y A’ B’ X’ Y’ A’ B’ X’ Y’
    34. 36. <ul><li>DO NOT COUNT </li></ul><ul><li>SIMPLY DUPLICATED </li></ul><ul><li>COMPONENTS </li></ul>
    35. 37. Copied AND MODIFIED Components Copied and Modified Components Other Components Non-clustered component Graph Clustering Before Weight Computation 1/5 1/5 Original Components A B’ C’ 2/5 1/5 1/5 Clustering Before Weight Computation 1/5 1/6 A’ B’ C’ 1/3 1/6 1/6 A B A C X Y X’ Y’ X’ Y’
    36. 38. Beyond Searching and Browsing <ul><li>Searching and browsing </li></ul><ul><ul><li>Require users to initiate the information seeking process </li></ul></ul><ul><li>Information access and Information Delivery </li></ul>
    37. 39. CodeBroker – (2001) <ul><li>Components repositories are often so large that software developers cannot learn about all of the components </li></ul><ul><li>Component repositories are not static </li></ul><ul><ul><li>New components added </li></ul></ul><ul><ul><li>Old components updated </li></ul></ul><ul><li>Context-Aware browsing </li></ul>
    38. 40. <ul><li>May not have suficient knowledge about the reuse repository </li></ul><ul><li>May perceive that reuse costs more than developing from scratch </li></ul><ul><li>May not be able to use the repository by formulating a proper query </li></ul><ul><li>May not be able to understand the found components </li></ul>
    39. 41. Information Islands Belief Vaguely Known Well Known L4: Entire Information Space Unknown components
    40. 42. CodeBroker L3: Belief L2: Vaguely Known L1: Well Known L4: Entire Information Space Information Use: L1 – Use by Memory L2 – Use by Recall L3 – Use by Anticipation L4 – Use by Delivery Already Known Components Irrelevant Components Task Relevant Information
    41. 43. Program Aspects <ul><li>Concept </li></ul><ul><ul><li>Formal </li></ul></ul><ul><ul><li>Informal </li></ul></ul><ul><ul><ul><li>Indentation, comments, identifier names (semantic) </li></ul></ul></ul><ul><ul><li>Executability </li></ul></ul><ul><li>Code </li></ul><ul><li>Constraint environment </li></ul><ul><ul><li>Signature </li></ul></ul>
    42. 44. Information delivery <ul><li>Feedback </li></ul><ul><ul><li>After execution of the action </li></ul></ul><ul><li>Feedforward </li></ul><ul><ul><li>Affects the execution of the action </li></ul></ul>
    43. 45. Information delivery <ul><li>Interruptive </li></ul><ul><li>Noninterruptive </li></ul>
    44. 46. Latent Semantic Analysis (LSA) <ul><li>Synonymy </li></ul><ul><li>Polysemy </li></ul><ul><li>“ Text documents and queries are represented as vectors in the semantic space, based on the words contained and the similarity between a query and a document is determined by the distance of their respective vectors” </li></ul>
    45. 48. Comments signature Discourse model User model
    46. 49. Koders Enterprise – (2004)
    47. 50. M.A.R.A.C.A.T.U. – M odern A rchitecture for R etrieving A ll C omponents A t T he U niverse (2005)
    48. 51. Using Structural Context to Recommend Source Code Examples Reid Holmes and Gail C. Murphy University of British Columbia Software Practices Lab
    49. 52. The Problem: A Concrete Example <ul><li>Frameworks can improve developer productivity. But developers can become stuck trying to use the APIs </li></ul><ul><ul><li>Imagine trying to use the Eclipse APIs to place text in the status line of the Eclipse IDE </li></ul></ul><ul><ul><li>Eclipse has 38,000 public methods </li></ul></ul>
    50. 53. Project Repository Development Environment Using Structural Context to Recommend Source Code Examples - Reid Holmes and Gail C. Murphy Structural Context Examples
    51. 54. Strathcona: Extract Structural Context ViewPart SampleView setMessage(String) IStatusLineManager setMessage(String)
    52. 55. Strathcona: Example Navigation <ul><li>Visual representation </li></ul><ul><ul><li>Highlights key relationships between example and query </li></ul></ul><ul><ul><li>Multiple examples can be quickly viewed </li></ul></ul>
    53. 56. Strathcona: Viewing Example Source <ul><li>Code view </li></ul><ul><ul><li>Example shows how to get a status line manager </li></ul></ul><ul><ul><li>Example is not a perfect match, but good enough to help </li></ul></ul>
    54. 57. Conclusion <ul><li>Information Delivery </li></ul><ul><li>Similarity Analyser </li></ul><ul><li>Ranking – Metrics </li></ul><ul><li>Context </li></ul><ul><li>Automatic Facet Classification </li></ul><ul><ul><li>Uncontrolled vocabulary + additional terms </li></ul></ul>
    55. 58. References <ul><li>[McIlroy, 1968] M. D. McIlroy, Mass Produced Software Components , NATO Software Engineering Conference Report, Garmisch, Germany, October, 1968, pp. 79-85. </li></ul><ul><li>[Mili, 1998] A. Mili, R. Mili, R. T. Mittermeir, A survey of software reuse libraries , Annals of Software Engineering, Vol. 5, 1998, pp. 349-414 </li></ul><ul><li>[Seacord, 1998] Robert C. Seacord, Scott A. Hissam, Kurt C. Wallnau. &quot; Agora: A Search Engine for Software Components ,&quot; IEEE Internet Computing , vol. 02,  no. 6,  pp. 62-70,  November/December,  1998 </li></ul><ul><li>[Szyperski, 1999] Szyperski C., “ Component Software: Beyond Object-Oriented Programming ”. Addison Wesley, 1999 </li></ul><ul><li>[Dey, 2001] Dey, A.. Understanding and Using Context . Personal Ubiquitous Comput. 5, 1 (Jan. 2001) </li></ul><ul><li>[Greengrass, 2001] Greengrass, Ed. Information retrieval: A survey . DOD Technical Report TR-R52-008-001, 2001 </li></ul><ul><li>[Ye, 2001] Ye, Y. and Fischer, G. Context-Aware Browsing of Large Component Repositories . In Proceedings of the 16th IEEE international Conference on Automated Software Engineering (November 26 - 29, 2001). ASE. IEEE Computer Society, Washington, DC, 99. </li></ul><ul><li>[Ye, 2002] Y. Yunwen and G. Fischer. Information delivery in support of learning reusable software components on demand . In Proceedings of the 7th international conference on Intelligent user interfaces, California, USA </li></ul><ul><li>[Ye, 2002] Ye, Y. and Fischer, G. Supporting Reuse by Delivering Task Relevant and Personalized Information . In Proceedings of the 24th International Conference on Software Engineering. p. 513-523, Orlando, Florida, May, 2002 </li></ul>
    56. 59. Bibliography <ul><li>[Inoue, 2003] K. Inoue et al.: &quot; Component Rank: Relative Significance Rank for Software Component Search &quot;, Proceedings of ICSE 2003 </li></ul><ul><li>[Maxville, 2003] Valerie Maxville, Chiou Peng Lam, Jocelyn Armarego. &quot; Selecting Components: a Process for Context-Driven Evaluation ,&quot; apsec , p. 456,  10th Asia-Pacific Software Engineering Conference (APSEC'03),  2003 </li></ul><ul><li>[Maxville, 2004] Valerie Maxville, Jocelyn Armarego, Chiou Peng Lam. &quot; Intelligent Component Selection ,&quot; compsac, pp. 244-249,  28th Annual International Computer Software and Applications Conference (COMPSAC'04),  2004. </li></ul><ul><li>[Prado, 2004] Lucrédio, D.; Almeida, E, S.; Prado, A, F. A Survey on Software Components Search and Retrieval, In the 30th IEEE EUROMICRO Conference, Component-Based Software Engineering Track, 2004, Rennes - France. IEEE Press,2004 </li></ul><ul><li>[Holmes, 2005] Holmes, R. and Murphy, G. C. 2005. Using structural context to recommend source code examples. In Proceedings of the 27th international Conference on Software Engineering (St. Louis, MO, USA, May 15 - 21, 2005). ICSE '05 </li></ul>
    57. 60. “ Imperfect technology in a working market is sustainable; perfect technology without any market will vanish” [Szyperski, 1999]

    ×