0
Searching Repositories of Web Application Models <ul><li>Alessandro Bozzon, Marco Brambilla, Piero Fraternali </li></ul><u...
Context <ul><li>Project repositories   are a central asset in (Web) software development </li></ul><ul><ul><li>they  prese...
Addressed Problem <ul><li>Objective:  easing the discovery of useful information   from past software projects </li></ul><...
Related work: Component search <ul><li>Retrieval of annotated pieces of software  dates back to the '90s. Various approach...
Related work: Source code search <ul><ul><li>Several communities and  on-line tools for sharing and retrieving code :  Goo...
Related work: Model search <ul><li>The problem is usually restricted to  Searching UML or ER models </li></ul><ul><ul><li>...
Related work: Business Process Discovery <ul><li>Different approaches to  extraction of BP models from repositories </li><...
Our contribution <ul><li>A  model-based search solution , with several innovations:  </li></ul><ul><li>it  automatically e...
Overall Architecture of the System Engineering Web Search Application Bozzon, Brambilla, Tutorial @ICWE2010
Overall Architecture of the System The  Content Processing Flow  extracts meaningful information from projects and uses it...
Overall Architecture of the System The  Content Processing Flow  extracts meaningful information from projects and uses it...
Overall Architecture of the System The  query and result presentation Flow  deals with the submitted queries and the produ...
Overall Architecture of the System The  query and result presentation Flow  deals with the submitted queries and the produ...
Design Dimensions of Model Retrieval (1/2) <ul><li>Segmentation Granularity :  the “size” of atomic unit of retrieval for ...
Example of Model Indexing Metamodel Model Model  XML Representation Product Catalogue Catalogue Home Page List Products Li...
Example of Model Indexing Metamodel Model Model  XML Representation Product|2.0 Catalogue|2.0   Catalogue|1.0 Home|1.0 Pag...
Design Dimensions of Model Retrieval (2/2) <ul><li>Query Language and Result Presentation :  the way queries and results a...
Our model-based search engine prototype <ul><li>General purpose, model-independent, configurable system : </li></ul><ul><u...
Detailed indexing process
Experiment Settings - Dataset <ul><li>48   real-world   WebML  projects from WebRatio </li></ul><ul><ul><li>trouble ticket...
Experiment settings - Configurations <ul><li>3 different settings of the design dimensions:  A ,  B ,  C </li></ul><ul><ul...
Experiment  C  – model-based scoring function  <ul><li>Experiments   A  and  B  exploit a traditional TF-IDF ranking funct...
User Interface <ul><li>(A)  Rendered result set and facets </li></ul><ul><li>(B)  Snippet window with highlighted matches ...
User Evaluation – Perceived Quality <ul><li>User study has been conducted with 5 expert WebML designers to assess the qual...
User Evaluation - Acceptance <ul><li>Users were asked 10 questions about the features of the application </li></ul><ul><li...
Performance Evaluation - Query Time <ul><li>About 400 2-terms and 3-terms randomly generated keyword queries  </li></ul><u...
Performance Evaluation - Index Size  <ul><li>Size grows almost linearly with the number of projects in all configurations ...
Conclusions and future directions <ul><li>A  metamodel-aware  approach and a  system prototype  for searches over model re...
Thanks!  Questions? Alessandro Bozzon Marco Brambilla  Piero Fraternali [email_address] ? Searching Repositories of Web Ap...
Upcoming SlideShare
Loading in...5
×

Searching Repositories of Web Application Models

1,012

Published on

Project repositories are a central asset in software development, as they preserve the technical knowledge gathered in past development activities. However, locating relevant information in a vast project repository is problematic, because it requires manually tagging projects with accurate metadata, an activity which is time consuming and prone to errors and omissions. This paper investigates the use of classical Information Retrieval techniques for easing the discovery of useful information from past projects. Differently from approaches based on textual search over the source code of applications or on querying structured metadata, we propose to index and search the models of applications, which are available in companies applying Model-Driven Engineering practices. We contrast alternative index structures and result presentations, and evaluate a prototype implementation on real-world experimental data.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,012
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
25
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • ANTICIPA PARLA DEI 2 PROCESSI
  • Transcript of "Searching Repositories of Web Application Models"

    1. 1. Searching Repositories of Web Application Models <ul><li>Alessandro Bozzon, Marco Brambilla, Piero Fraternali </li></ul><ul><li>ICWE 2010 </li></ul><ul><li>Vienna, July 7th 2010 </li></ul>
    2. 2. Context <ul><li>Project repositories are a central asset in (Web) software development </li></ul><ul><ul><li>they preserve the technical knowledge gathered in past development activities </li></ul></ul><ul><ul><li>repositories now overcome the boundaries of individual organizations and have a social role in the diffusion of coding and design solutions </li></ul></ul><ul><ul><li>they allow for reuse of knowledge and artifacts </li></ul></ul><ul><li>Locating relevant information in a vast project repository is problematic </li></ul><ul><ul><li>Two options </li></ul></ul><ul><ul><ul><li>Manual tagging  time consuming and prone to errors , omissions and incoherencies </li></ul></ul></ul><ul><ul><ul><li>Automatic analysis  a lot of semantic can be lost in the process </li></ul></ul></ul>
    3. 3. Addressed Problem <ul><li>Objective: easing the discovery of useful information from past software projects </li></ul><ul><li>Main resource: application models </li></ul><ul><ul><li>available in companies applying Model-Driven Engineering practices </li></ul></ul><ul><ul><li>In contrast to existing solutions, that mainly focus on discovery of code, documentation, and annotations </li></ul></ul><ul><li>Why dealing with application models is an advantage? </li></ul><ul><ul><li>Increased result quality (thanks to the more valuable information embedded in models wrt to the code) </li></ul></ul><ul><ul><li>Less need for manual tagging </li></ul></ul>
    4. 4. Related work: Component search <ul><li>Retrieval of annotated pieces of software dates back to the '90s. Various approaches: </li></ul><ul><ul><li>worldwide search engine based on JavaBeans and Corba [Agora, Internet Computing, 1998] </li></ul></ul><ul><ul><li>Search engines for Web services based on indexed Vector Space Model characterization of their properties [Dustdar et al., ECWS 2005] </li></ul></ul><ul><ul><li>Significance based search that exploits graph models of a software component library (usage relations used as links propagating significance) [Inoue et al., TOSEM 2005] </li></ul></ul><ul><ul><li>Combination of formal and semi-formal specification to describe behaviour and structure of components [Khalifa et al., ASEA 2008] </li></ul></ul>
    5. 5. Related work: Source code search <ul><ul><li>Several communities and on-line tools for sharing and retrieving code : Google code , Snipplr , Koders , Codase, Jexamples, SourceForge </li></ul></ul><ul><ul><ul><li>Keyword queries directly matched to the code </li></ul></ul></ul><ul><ul><ul><li>Results are the exact locations where the keyword(s) appear </li></ul></ul></ul><ul><ul><ul><li>Plus advanced behaviours: regular expressions (Google), wildcards (Codase), restriction to specific concept types (Jexamples, Codase), advanced ranking , e.g., based on rank results based on relevance of match, activity, date of registration, recency of last update (SourceForge) </li></ul></ul></ul><ul><li>Other approaches: </li></ul><ul><ul><li>Information retrieval techniques for software reuse [Frakes et al., SIGIR Forum 1987] </li></ul></ul><ul><ul><li>taking advantage of code structural information [Holmes and Murphy, ICSE 2005] and [ Sourcerer Project by Bajracharya et al., SUITE ICSE workshop 2009] </li></ul></ul>
    6. 6. Related work: Model search <ul><li>The problem is usually restricted to Searching UML or ER models </li></ul><ul><ul><li>XML / XMI format for indexing seamlessly UML models, text files, and others [Gibb et al., 2000] [Lorens et al., 2004] [Moogle: Lucredio et al., Models 2008] </li></ul></ul><ul><ul><li>UML artifacts classified with WordNet terms and extracted though Case-Based Reasoning [Gomes et al., AI Comm., 2004] </li></ul></ul><ul><ul><li>database conceptual model retrieval based on text search, schema matching, and structurally-aware scoring methods, with queries by example and keword-based [Schemr: Chen, Halevy, SIGMOD09] </li></ul></ul><ul><ul><li>IR techniques applied to models and code together, for tracing the association between requirements, design artifacts, and code [Antoniol et al., 2000] […] </li></ul></ul>
    7. 7. Related work: Business Process Discovery <ul><li>Different approaches to extraction of BP models from repositories </li></ul><ul><ul><li>Based on the workflow topology only: graph-based comparison or XML-based querying [Beeri et al., VLDB 2006] [Lu et al., BPM 2006] [Shao et al., ICDE 2009] </li></ul></ul><ul><ul><li>Based on semantic reasoning and discovery, using SPARQL, query by example, SQL-like languages, and so on [Kiefer et al., ESWC 2007] [Goderis et al., ICWS 2006] [Awad et al., EDOC 2008] [Zhuge 2002] [Belhajjame, Brambilla, BPMDS 2009] </li></ul></ul><ul><ul><li>Based on IR techniques [ Dongen, Dijkman et al., Caise 2008 ] </li></ul></ul>
    8. 8. Our contribution <ul><li>A model-based search solution , with several innovations: </li></ul><ul><li>it automatically exploits the semantics from the searched conceptual models </li></ul><ul><li>It does not require manual annotation </li></ul><ul><li>it supports alternative indexing and ranking functions, based of the meta-model of the considered DSL(s) </li></ul><ul><li>it is based on a model-independent framework , which can be customized to any meta-model </li></ul><ul><li>User study to evaluate acceptance and the quality perceived by users </li></ul><ul><li>Performance tests to evaluate scalability </li></ul>
    9. 9. Overall Architecture of the System Engineering Web Search Application Bozzon, Brambilla, Tutorial @ICWE2010
    10. 10. Overall Architecture of the System The Content Processing Flow extracts meaningful information from projects and uses it to create the search engine index. <ul><li>1. CONTENT PROCESSING </li></ul><ul><li>project analysis captures project-level, global metadata </li></ul><ul><li>segmentation splits the project into smaller units </li></ul><ul><li>segment analysis extracts from segments the information to be indexed </li></ul><ul><li>linguistic normalization applies the typical normalization operations of IR </li></ul>
    11. 11. Overall Architecture of the System The Content Processing Flow extracts meaningful information from projects and uses it to create the search engine index. <ul><li>2. INDEXING </li></ul><ul><li>each project or segment is physically represented as a document </li></ul><ul><li>the search engine indexes are built based on the documents </li></ul><ul><li>the DSL metamodel is taken into account </li></ul>
    12. 12. Overall Architecture of the System The query and result presentation Flow deals with the submitted queries and the production of the result set. <ul><li>1. USER INTERFACE supports </li></ul><ul><li>Keyword-based queries </li></ul><ul><li>Content-based queries (aka QBE) </li></ul><ul><li>Rendering of the results </li></ul>
    13. 13. Overall Architecture of the System The query and result presentation Flow deals with the submitted queries and the production of the result set. <ul><li>2. QUERY PROCESSING </li></ul><ul><li>matches the query to the indexed content using a given similarity criteria </li></ul><ul><li>produces ranked results </li></ul>
    14. 14. Design Dimensions of Model Retrieval (1/2) <ul><li>Segmentation Granularity : the “size” of atomic unit of retrieval for the user </li></ul><ul><ul><li>Project </li></ul></ul><ul><ul><li>Sub-project </li></ul></ul><ul><ul><li>Model concepts (all or only the main ones) </li></ul></ul><ul><li>Index structure : o ne or more fields (associated with an boosting score ) </li></ul><ul><ul><li>Flat: a simple list of terms without taking into account model semantics </li></ul></ul><ul><ul><li>Weighted: model concepts used to weight terms in the ranking </li></ul></ul><ul><ul><li>Multi-field: terms belonging to different model concepts are collected into separate fields </li></ul></ul><ul><ul><li>Structured: the model is translated </li></ul></ul><ul><ul><li>into a representation that reflects the </li></ul></ul><ul><ul><li>hierarchies and associations among concepts </li></ul></ul>
    15. 15. Example of Model Indexing Metamodel Model Model XML Representation Product Catalogue Catalogue Home Page List Products List of product in the catalogue View Details Details of a selected product HYPERTEXT MODEL 1 ID Product Catalogue Application PROJECT NAME Multi-Field
    16. 16. Example of Model Indexing Metamodel Model Model XML Representation Product|2.0 Catalogue|2.0 Catalogue|1.0 Home|1.0 Page|1.0 List|0.5 Products|0.5 List|0.2 of|0.2 products|0.2 in|0.2 the|0.2 catalogue|0.2 View Details Details of a selected product HYPERTEXT MODEL 1 ID Product Catalogue Application PROJECT NAME Multi-Field, Weighted Index 2.0 1.0 0.5
    17. 17. Design Dimensions of Model Retrieval (2/2) <ul><li>Query Language and Result Presentation : the way queries and results are presented. </li></ul><ul><ul><li>Keyword-based search </li></ul></ul><ul><ul><li>Document-based search: the system extracts the most significant words and submits them as a query </li></ul></ul><ul><ul><li>Search by example : the query is a model, analyzed and matched by similarity </li></ul></ul><ul><ul><li>Faceted search : exploration using facets (i.e., property-value pairs) extracted from the indexed documents </li></ul></ul><ul><ul><li>Snippet visualization : with the matching points highlighted in graphical or textual form </li></ul></ul>
    18. 18. Our model-based search engine prototype <ul><li>General purpose, model-independent, configurable system : </li></ul><ul><ul><li>Configuration of a general purpose search engine according to the selected design dimensions </li></ul></ul><ul><ul><li>metamodel-aware rules to analyze models and populate the index </li></ul></ul><ul><ul><ul><li>segmentation and text-extraction steps  model transformation rules </li></ul></ul></ul><ul><ul><ul><li>Offline collection analysis  compute statistics for fine-tuning the retrieval and ranking </li></ul></ul></ul><ul><ul><ul><ul><li>Stop Domain Concept removal </li></ul></ul></ul></ul><ul><ul><ul><ul><li>optimization of the weights assigned to each model concept </li></ul></ul></ul></ul><ul><ul><li>Provides a visual interface to perform queries and inspect results. </li></ul></ul><ul><li>Content processing has been implemented by extending the text processing and analysis components provided by Apache Lucene </li></ul>
    19. 19. Detailed indexing process
    20. 20. Experiment Settings - Dataset <ul><li>48 real-world WebML projects from WebRatio </li></ul><ul><ul><li>trouble ticketing, human resource management, multimedia search engines, Web portals, etc. </li></ul></ul><ul><ul><li>Italian and English </li></ul></ul><ul><ul><li>~ 250 Modeling Concepts </li></ul></ul><ul><ul><li>3,800 data model entities (with about 35,000 attributes and 3,800 relationships) </li></ul></ul><ul><ul><li>138 site views with about 10,000 pages and 470,000 units, and 20 Web services. </li></ul></ul><ul><li>The overall repository takes around 85MB of disk space </li></ul>
    21. 21. Experiment settings - Configurations <ul><li>3 different settings of the design dimensions: A , B , C </li></ul><ul><ul><li>A flat index structure; </li></ul></ul><ul><ul><li>B and C multi-field weighted ( projectID, projectName, documentType, text) </li></ul></ul>Option Description A B C Segmentation Granularity Project Entire project X Sub-project Subproject X X Single-Concept Arbitrary model concepts X X Index Structure Flat Flat list of words X X Weighted Words weighted by the model concepts they belong to X Multi-field Words belonging to each model concept in separate fields X X Query Language and Result Presentation Keyword-based Query By Keywords X X X Faceted Query refined through specific dimensions X X X Snippets Visualization and exploration of result previews X X X
    22. 22. Experiment C – model-based scoring function <ul><li>Experiments A and B exploit a traditional TF-IDF ranking function </li></ul><ul><li>Experiment C exploits the DSL metamodel </li></ul><ul><li>mtw(m, t) : Model Term Weight , a metamodel specific boost that depends on the concept m containing the term t </li></ul><ul><li>dw(d) : Document Weight , a metamodel and model-specific boosting value that expresses the importance of a given document (according to the selected granularity) </li></ul>
    23. 23. User Interface <ul><li>(A) Rendered result set and facets </li></ul><ul><li>(B) Snippet window with highlighted matches </li></ul>
    24. 24. User Evaluation – Perceived Quality <ul><li>User study has been conducted with 5 expert WebML designers to assess the quality and perception of alternative configurations </li></ul><ul><ul><li>Users rated the results in the result sets, </li></ul></ul><ul><ul><li>Votes ranged from 1 (highly inappropriate) to 5 (highly appropriate) </li></ul></ul><ul><li>Experiment B and C got more votes in high range of the scale </li></ul><ul><li>Success Factor : </li></ul><ul><ul><li>Injecting the semantic </li></ul></ul><ul><ul><li>of the meta-model </li></ul></ul>
    25. 25. User Evaluation - Acceptance <ul><li>Users were asked 10 questions about the features of the application </li></ul><ul><li>Votes ranged from 1 (bad) to 5 (good) </li></ul><ul><ul><ul><ul><li>useful for model maintenance and reuse </li></ul></ul></ul></ul><ul><ul><ul><ul><li>role in improving the quality of the applications </li></ul></ul></ul></ul><ul><ul><ul><ul><li>a certain distance between the overall judged quality and the adoption likelihood </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>But there is a bias due to the lack of a graphical viewer </li></ul></ul></ul></ul></ul>Avg. Var.. Features Keyword Search 3.6 0.24 Search Result Ranking 3.2 0.16 Faceted Search 3.8 0.16 Match Highlighting 3.6 0.24 Application Help reducing the maintenance costs 3.2 0.56 Help improving the quality of the delivered application? 3.0 0.4 Help understanding the model assets in the company? 4.4 0.24 Help providing better estimates for future application costs? 2.8 0.56 Wrap Up Overall Evaluation of the system 4.0 0.4 Would you use the system in your activities? 3.0 1.2
    26. 26. Performance Evaluation - Query Time <ul><li>About 400 2-terms and 3-terms randomly generated keyword queries </li></ul><ul><li>Each query has been executed 20 times </li></ul><ul><li>Query time is abundantly sub-second and curves indicate a sub-linear growth </li></ul><ul><li>The addition of Faceted Search and Snippet Visualization impacts heavily </li></ul><ul><li>with the number of inde </li></ul><ul><li>NOSeg: No Segmentation </li></ul><ul><li>Seg: Segmentation </li></ul><ul><li>KS: Keyword Search </li></ul><ul><li>FS: Faceted Search </li></ul><ul><li>Snip: Snippet </li></ul>
    27. 27. Performance Evaluation - Index Size <ul><li>Size grows almost linearly with the number of projects in all configurations </li></ul><ul><li>Baseline configurations feature index sizes about 10 times smaller than the repository size </li></ul><ul><li>Faceted Search doubles the index size </li></ul><ul><li>NOSeg: No Segmentation </li></ul><ul><li>Seg: Segmentation </li></ul><ul><li>KS: Keyword Search </li></ul><ul><li>FS: Faceted Search </li></ul><ul><li>Snip: Snippet </li></ul>
    28. 28. Conclusions and future directions <ul><li>A metamodel-aware approach and a system prototype for searches over model repositories </li></ul><ul><li>Scalability tests and user studies in different experimental settings </li></ul><ul><li>Future works: </li></ul><ul><ul><ul><ul><li>Integration of content-based search </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Improve result visualization: integration in the WebRatio tool-suite for WebML and visual highlighting of the matches in the projects </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Adaptive fine-tuning for improving precision and recall </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Experiments with more modeling languages (e.g. BPMN) </li></ul></ul></ul></ul><ul><ul><ul><li>Definition of generic benchmark criteria for model-driven repository search </li></ul></ul></ul>
    29. 29. Thanks! Questions? Alessandro Bozzon Marco Brambilla Piero Fraternali [email_address] ? Searching Repositories of Web Application Models
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×