Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Mira Dontcheva Steven M. Drucker David Salesin Michael F. Cohen Oct. 2007 UIST '07 Lin Yen Ling 20080930 /23
  2. 2. OUTLINE <ul><li>Introduction </li></ul><ul><li>Related Work </li></ul><ul><li>The Summaries Framework </li></ul><ul><li>Example Scenario </li></ul><ul><li>System Overview </li></ul><ul><li>Retrieval Using Relationships </li></ul><ul><li>Authoring Cards </li></ul><ul><li>Template-based Search </li></ul><ul><li>Exploratory User Study </li></ul><ul><li>Conclusions and Future Work </li></ul>/23
  3. 3. INTRODUCTION <ul><li>To Focuse on helping people interact and gather Web content. </li></ul><ul><li>Goal is to Lower the effort necessary for collecting, organizing, managing, and sharing that content. </li></ul><ul><li>They present three new techniques that build on the existing summaries framework and interaction paradigms. </li></ul>/23
  4. 4. INTRODUCTION <ul><li>Three new techniques: </li></ul><ul><ul><li>An interaction technique that allows users to specify relations between websites and use these relations to automatically collect data from multiple websites. </li></ul></ul><ul><ul><li>An interface for merging content from multiple websites and organizing it visually. </li></ul></ul><ul><ul><li>To introduce a novel search paradigm for collecting content from the Web with search templates. </li></ul></ul>/23
  5. 5. RELATED WORK <ul><li>Managing web content </li></ul><ul><ul><li>WebBook(1996) 、 Data Mountain(1998) 、 TopicShop(2003) </li></ul></ul><ul><ul><li>Hunter Gatherer(2002) 、 Internet Scrapbook(1998) </li></ul></ul><ul><li>Semantic Web community </li></ul><ul><ul><li>Piggy Bank(2005) 、 Thresher(2005) </li></ul></ul><ul><li>Summaries framework(2006) </li></ul><ul><li>To give the user intercative tools for specifying relations between disparate data sources. </li></ul>/23
  6. 6. RELATED WORK <ul><li>Collecting content using relations </li></ul><ul><ul><li>Data integration </li></ul></ul><ul><ul><li>Complementary to database research </li></ul></ul><ul><ul><li>End-user programming for the Web. </li></ul></ul><ul><ul><ul><li>Chickenfoot(2005) </li></ul></ul></ul><ul><ul><ul><li>RecipeSheet(2006) </li></ul></ul></ul><ul><ul><ul><li>C3W (2004) </li></ul></ul></ul><ul><ul><ul><li>Marmite (2007) and Yahoo Pipes </li></ul></ul></ul><ul><ul><li>Simple graphical interface for mixing content from different source. </li></ul></ul><ul><li>Interactive layout editing </li></ul><ul><ul><li>Sketch Pad a man-machine graphical communication system. (1963) </li></ul></ul><ul><ul><li>Inferring constraints from multiple snapshots. (1993) </li></ul></ul><ul><ul><li>Inferring graphical constraints with rockit.(1993) </li></ul></ul><ul><ul><li>Similar to commerical HTML editors. </li></ul></ul>/23
  7. 7. RELATED WORK <ul><li>Formatting search results </li></ul><ul><ul><li>Stuff I’ve Seen (2003) </li></ul></ul><ul><ul><li>Clusty (2004): go one step further and cluster the search results according to topic. </li></ul></ul><ul><ul><li>Grokker (2005) </li></ul></ul><ul><ul><li>To Go beyond clustering and reogranizing URLs. </li></ul></ul>/23
  8. 8. THE SUMMARIES FRAMEWORK <ul><li>Built on top of the summaries framework . </li></ul><ul><li>Provide an interface for interactively creating extraction patterns for webpages. </li></ul><ul><li>Implemented as a browser extension. </li></ul><ul><li>Written in Javascript and XUL. </li></ul><ul><li>Use the extraction techniques that are already part of summaries framework. </li></ul><ul><ul><li>DOM 、 Context-based rule </li></ul></ul><ul><li>Focus on new application for the extracted content. </li></ul>/23
  9. 9. EXAMPLE SCENARIO <ul><li>Show the steps taken by a user as he looks for a restaurant for a night out in Seattle. </li></ul>/23
  10. 10. EXAMPLE SCENARIO - Relations /23
  11. 11. EXAMPLE SCENARIO - Cards /23
  12. 12. EXAMPLE SCENARIO- Search templates /23
  13. 13. EXAMPLE SCENARIO- Search templates /23
  14. 14. SYSTEM OVERVIEW <ul><li>The system includes </li></ul><ul><ul><li>A data repository </li></ul></ul><ul><ul><ul><li>To holds all of the content collected by the user according to the source webpage and semantic tags of the webpage elements. </li></ul></ul></ul><ul><ul><li>A set of user-defined cards </li></ul></ul><ul><ul><ul><li>webpage elements within a relation tree should be displayed and their visual arrangement. </li></ul></ul></ul><ul><ul><li>A set of search templates </li></ul></ul><ul><ul><ul><li>To include a set of websites and possibly relations for those websites. </li></ul></ul></ul>/23
  15. 15. RETRIEVAL USING RELATIONSHIPS <ul><li>To define a relation as a directed connection from tagi from websiteA to tagj from websiteB. </li></ul><ul><li>All relations are stored in the data repository and are available to the user at any time. </li></ul><ul><li>To define this process more formally, the execution of a relation can be expressed as a database query. </li></ul><ul><li>For a given relation r, where r = websiteA.tagi -> websiteB.tagj </li></ul><ul><ul><li>collect content for any new data record from websiteA for tagi as a JOIN operation or the following SQL pseudo-query. SELECT * FROM websiteB WHERE websiteB.tagj = websiteA.tagi </li></ul></ul>/23
  16. 16. RETRIEVAL USING RELATIONSHIPS (Query formulation) <ul><li>To formulate the keyword query, we typically use only the extracted text content. </li></ul><ul><ul><li>We find that this type of query is usually sufficient and returns the appropriate result within the top eight search results. </li></ul></ul><ul><li>To reformulate the query using heuristics. </li></ul><ul><ul><li>We found this approach particularly effective for situations in which something is described in multiple ways or is part of multiple categories. </li></ul></ul><ul><li>Other approaches for reformulating queries include using the semantic tag associated with the webpage element or using additional webpage elements. </li></ul>/23
  17. 17. RETRIEVAL USING RELATIONSHIPS (Search result comparison) <ul><li>For each query we extract the first eight search results and rank the extracted content according to similarity to the webpage content that triggered the query. </li></ul><ul><li>To compute similarity we compare the extracted webpage elements using the correspondence specified in the relation that triggered the search. </li></ul><ul><li>For example when collecting content for the “Ambrosia” restaurant from nwsource.com, the system issues the query “Ambrosia” limiting the results to the yelp.com domain. </li></ul>/23
  18. 18. RETRIEVAL USING RELATIONSHIPS (Limitations) <ul><li>To extract content from only eight search results because the Google AJAX Search API limits the search results to a maximum of eight. </li></ul><ul><li>To handle these dynamic webpages, in subsequent work we hope to leverage research into macro recording systems such asWebVCR (2000),Turquoise (1997), Web Macros (1999), TrIAs (2000), PLOW (2006),and Creo (2006). </li></ul><ul><li>Madhavan et al.(2007) </li></ul><ul><li>In the current implementation we allow the user to specify only one-to-one relations. </li></ul>/23
  19. 19. AUTHORING CARDS <ul><li>The user can view his collection of Web content through cards. </li></ul><ul><li>Cards are persistent, can be reused, and shared with others. </li></ul><ul><li>It does not currently capabilies for specifying interaction. </li></ul><ul><li>Combine it with the Exbit API(2007). </li></ul>/23
  20. 20. AUTHORING CARDS /23
  22. 22. EXPLORATORY USER STUDY <ul><li>Four graduate students and two were staff in the university. </li></ul><ul><li>Relations </li></ul><ul><ul><li>To Explore exposing possible relations to the user as he collects new content. </li></ul></ul><ul><li>Cards </li></ul><ul><ul><li>A good card designer should make it possible to create quickly but also give the user control. </li></ul></ul><ul><li>Search Templates </li></ul><ul><ul><li>To give the user feedback about the available search results. </li></ul></ul>/23
  23. 23. CONCLUSIONS AND FUTURE WORK <ul><li>This Work combines content extraction and Web search to provide services and tools that are much needed and can help users with challenging information tasks. </li></ul><ul><li>Such a web of relationships can enable a new shift in Web applications and bring about a World Wide Web that is both more personal and collaborative. </li></ul><ul><li>They plan to continue evolving the card designer to provide light-weight card authoring for the novice. </li></ul><ul><li>They plan to explore approaches for providing more feedback so that the user can understand search results and quickly and easily iterate through queries. </li></ul><ul><li>They hope To explore which websites people relate together, how often they create new cards, and how well they can use search templates. </li></ul>/23