Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
433
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
1
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Mira Dontcheva Steven M. Drucker David Salesin Michael F. Cohen Oct. 2007 UIST '07 Lin Yen Ling 20080930 /23
  • 2. OUTLINE
    • Introduction
    • Related Work
    • The Summaries Framework
    • Example Scenario
    • System Overview
    • Retrieval Using Relationships
    • Authoring Cards
    • Template-based Search
    • Exploratory User Study
    • Conclusions and Future Work
    /23
  • 3. INTRODUCTION
    • To Focuse on helping people interact and gather Web content.
    • Goal is to Lower the effort necessary for collecting, organizing, managing, and sharing that content.
    • They present three new techniques that build on the existing summaries framework and interaction paradigms.
    /23
  • 4. INTRODUCTION
    • Three new techniques:
      • An interaction technique that allows users to specify relations between websites and use these relations to automatically collect data from multiple websites.
      • An interface for merging content from multiple websites and organizing it visually.
      • To introduce a novel search paradigm for collecting content from the Web with search templates.
    /23
  • 5. RELATED WORK
    • Managing web content
      • WebBook(1996) 、 Data Mountain(1998) 、 TopicShop(2003)
      • Hunter Gatherer(2002) 、 Internet Scrapbook(1998)
    • Semantic Web community
      • Piggy Bank(2005) 、 Thresher(2005)
    • Summaries framework(2006)
    • To give the user intercative tools for specifying relations between disparate data sources.
    /23
  • 6. RELATED WORK
    • Collecting content using relations
      • Data integration
      • Complementary to database research
      • End-user programming for the Web.
        • Chickenfoot(2005)
        • RecipeSheet(2006)
        • C3W (2004)
        • Marmite (2007) and Yahoo Pipes
      • Simple graphical interface for mixing content from different source.
    • Interactive layout editing
      • Sketch Pad a man-machine graphical communication system. (1963)
      • Inferring constraints from multiple snapshots. (1993)
      • Inferring graphical constraints with rockit.(1993)
      • Similar to commerical HTML editors.
    /23
  • 7. RELATED WORK
    • Formatting search results
      • Stuff I’ve Seen (2003)
      • Clusty (2004): go one step further and cluster the search results according to topic.
      • Grokker (2005)
      • To Go beyond clustering and reogranizing URLs.
    /23
  • 8. THE SUMMARIES FRAMEWORK
    • Built on top of the summaries framework .
    • Provide an interface for interactively creating extraction patterns for webpages.
    • Implemented as a browser extension.
    • Written in Javascript and XUL.
    • Use the extraction techniques that are already part of summaries framework.
      • DOM 、 Context-based rule
    • Focus on new application for the extracted content.
    /23
  • 9. EXAMPLE SCENARIO
    • Show the steps taken by a user as he looks for a restaurant for a night out in Seattle.
    /23
  • 10. EXAMPLE SCENARIO - Relations /23
  • 11. EXAMPLE SCENARIO - Cards /23
  • 12. EXAMPLE SCENARIO- Search templates /23
  • 13. EXAMPLE SCENARIO- Search templates /23
  • 14. SYSTEM OVERVIEW
    • The system includes
      • A data repository
        • To holds all of the content collected by the user according to the source webpage and semantic tags of the webpage elements.
      • A set of user-defined cards
        • webpage elements within a relation tree should be displayed and their visual arrangement.
      • A set of search templates
        • To include a set of websites and possibly relations for those websites.
    /23
  • 15. RETRIEVAL USING RELATIONSHIPS
    • To define a relation as a directed connection from tagi from websiteA to tagj from websiteB.
    • All relations are stored in the data repository and are available to the user at any time.
    • To define this process more formally, the execution of a relation can be expressed as a database query.
    • For a given relation r, where r = websiteA.tagi -> websiteB.tagj
      • collect content for any new data record from websiteA for tagi as a JOIN operation or the following SQL pseudo-query. SELECT * FROM websiteB WHERE websiteB.tagj = websiteA.tagi
    /23
  • 16. RETRIEVAL USING RELATIONSHIPS (Query formulation)
    • To formulate the keyword query, we typically use only the extracted text content.
      • We find that this type of query is usually sufficient and returns the appropriate result within the top eight search results.
    • To reformulate the query using heuristics.
      • We found this approach particularly effective for situations in which something is described in multiple ways or is part of multiple categories.
    • Other approaches for reformulating queries include using the semantic tag associated with the webpage element or using additional webpage elements.
    /23
  • 17. RETRIEVAL USING RELATIONSHIPS (Search result comparison)
    • For each query we extract the first eight search results and rank the extracted content according to similarity to the webpage content that triggered the query.
    • To compute similarity we compare the extracted webpage elements using the correspondence specified in the relation that triggered the search.
    • For example when collecting content for the “Ambrosia” restaurant from nwsource.com, the system issues the query “Ambrosia” limiting the results to the yelp.com domain.
    /23
  • 18. RETRIEVAL USING RELATIONSHIPS (Limitations)
    • To extract content from only eight search results because the Google AJAX Search API limits the search results to a maximum of eight.
    • To handle these dynamic webpages, in subsequent work we hope to leverage research into macro recording systems such asWebVCR (2000),Turquoise (1997), Web Macros (1999), TrIAs (2000), PLOW (2006),and Creo (2006).
    • Madhavan et al.(2007)
    • In the current implementation we allow the user to specify only one-to-one relations.
    /23
  • 19. AUTHORING CARDS
    • The user can view his collection of Web content through cards.
    • Cards are persistent, can be reused, and shared with others.
    • It does not currently capabilies for specifying interaction.
    • Combine it with the Exbit API(2007).
    /23
  • 20. AUTHORING CARDS /23
  • 21. TEMPLATE-BASED SEARCH /23
  • 22. EXPLORATORY USER STUDY
    • Four graduate students and two were staff in the university.
    • Relations
      • To Explore exposing possible relations to the user as he collects new content.
    • Cards
      • A good card designer should make it possible to create quickly but also give the user control.
    • Search Templates
      • To give the user feedback about the available search results.
    /23
  • 23. CONCLUSIONS AND FUTURE WORK
    • This Work combines content extraction and Web search to provide services and tools that are much needed and can help users with challenging information tasks.
    • Such a web of relationships can enable a new shift in Web applications and bring about a World Wide Web that is both more personal and collaborative.
    • They plan to continue evolving the card designer to provide light-weight card authoring for the novice.
    • They plan to explore approaches for providing more feedback so that the user can understand search results and quickly and easily iterate through queries.
    • They hope To explore which websites people relate together, how often they create new cards, and how well they can use search templates.
    /23