Your SlideShare is downloading. ×
20080930
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

20080930

453
views

Published on

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
453
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Mira Dontcheva Steven M. Drucker David Salesin Michael F. Cohen Oct. 2007 UIST '07 Lin Yen Ling 20080930 /23
  • 2. OUTLINE
    • Introduction
    • Related Work
    • The Summaries Framework
    • Example Scenario
    • System Overview
    • Retrieval Using Relationships
    • Authoring Cards
    • Template-based Search
    • Exploratory User Study
    • Conclusions and Future Work
    /23
  • 3. INTRODUCTION
    • To Focuse on helping people interact and gather Web content.
    • Goal is to Lower the effort necessary for collecting, organizing, managing, and sharing that content.
    • They present three new techniques that build on the existing summaries framework and interaction paradigms.
    /23
  • 4. INTRODUCTION
    • Three new techniques:
      • An interaction technique that allows users to specify relations between websites and use these relations to automatically collect data from multiple websites.
      • An interface for merging content from multiple websites and organizing it visually.
      • To introduce a novel search paradigm for collecting content from the Web with search templates.
    /23
  • 5. RELATED WORK
    • Managing web content
      • WebBook(1996) 、 Data Mountain(1998) 、 TopicShop(2003)
      • Hunter Gatherer(2002) 、 Internet Scrapbook(1998)
    • Semantic Web community
      • Piggy Bank(2005) 、 Thresher(2005)
    • Summaries framework(2006)
    • To give the user intercative tools for specifying relations between disparate data sources.
    /23
  • 6. RELATED WORK
    • Collecting content using relations
      • Data integration
      • Complementary to database research
      • End-user programming for the Web.
        • Chickenfoot(2005)
        • RecipeSheet(2006)
        • C3W (2004)
        • Marmite (2007) and Yahoo Pipes
      • Simple graphical interface for mixing content from different source.
    • Interactive layout editing
      • Sketch Pad a man-machine graphical communication system. (1963)
      • Inferring constraints from multiple snapshots. (1993)
      • Inferring graphical constraints with rockit.(1993)
      • Similar to commerical HTML editors.
    /23
  • 7. RELATED WORK
    • Formatting search results
      • Stuff I’ve Seen (2003)
      • Clusty (2004): go one step further and cluster the search results according to topic.
      • Grokker (2005)
      • To Go beyond clustering and reogranizing URLs.
    /23
  • 8. THE SUMMARIES FRAMEWORK
    • Built on top of the summaries framework .
    • Provide an interface for interactively creating extraction patterns for webpages.
    • Implemented as a browser extension.
    • Written in Javascript and XUL.
    • Use the extraction techniques that are already part of summaries framework.
      • DOM 、 Context-based rule
    • Focus on new application for the extracted content.
    /23
  • 9. EXAMPLE SCENARIO
    • Show the steps taken by a user as he looks for a restaurant for a night out in Seattle.
    /23
  • 10. EXAMPLE SCENARIO - Relations /23
  • 11. EXAMPLE SCENARIO - Cards /23
  • 12. EXAMPLE SCENARIO- Search templates /23
  • 13. EXAMPLE SCENARIO- Search templates /23
  • 14. SYSTEM OVERVIEW
    • The system includes
      • A data repository
        • To holds all of the content collected by the user according to the source webpage and semantic tags of the webpage elements.
      • A set of user-defined cards
        • webpage elements within a relation tree should be displayed and their visual arrangement.
      • A set of search templates
        • To include a set of websites and possibly relations for those websites.
    /23
  • 15. RETRIEVAL USING RELATIONSHIPS
    • To define a relation as a directed connection from tagi from websiteA to tagj from websiteB.
    • All relations are stored in the data repository and are available to the user at any time.
    • To define this process more formally, the execution of a relation can be expressed as a database query.
    • For a given relation r, where r = websiteA.tagi -> websiteB.tagj
      • collect content for any new data record from websiteA for tagi as a JOIN operation or the following SQL pseudo-query. SELECT * FROM websiteB WHERE websiteB.tagj = websiteA.tagi
    /23
  • 16. RETRIEVAL USING RELATIONSHIPS (Query formulation)
    • To formulate the keyword query, we typically use only the extracted text content.
      • We find that this type of query is usually sufficient and returns the appropriate result within the top eight search results.
    • To reformulate the query using heuristics.
      • We found this approach particularly effective for situations in which something is described in multiple ways or is part of multiple categories.
    • Other approaches for reformulating queries include using the semantic tag associated with the webpage element or using additional webpage elements.
    /23
  • 17. RETRIEVAL USING RELATIONSHIPS (Search result comparison)
    • For each query we extract the first eight search results and rank the extracted content according to similarity to the webpage content that triggered the query.
    • To compute similarity we compare the extracted webpage elements using the correspondence specified in the relation that triggered the search.
    • For example when collecting content for the “Ambrosia” restaurant from nwsource.com, the system issues the query “Ambrosia” limiting the results to the yelp.com domain.
    /23
  • 18. RETRIEVAL USING RELATIONSHIPS (Limitations)
    • To extract content from only eight search results because the Google AJAX Search API limits the search results to a maximum of eight.
    • To handle these dynamic webpages, in subsequent work we hope to leverage research into macro recording systems such asWebVCR (2000),Turquoise (1997), Web Macros (1999), TrIAs (2000), PLOW (2006),and Creo (2006).
    • Madhavan et al.(2007)
    • In the current implementation we allow the user to specify only one-to-one relations.
    /23
  • 19. AUTHORING CARDS
    • The user can view his collection of Web content through cards.
    • Cards are persistent, can be reused, and shared with others.
    • It does not currently capabilies for specifying interaction.
    • Combine it with the Exbit API(2007).
    /23
  • 20. AUTHORING CARDS /23
  • 21. TEMPLATE-BASED SEARCH /23
  • 22. EXPLORATORY USER STUDY
    • Four graduate students and two were staff in the university.
    • Relations
      • To Explore exposing possible relations to the user as he collects new content.
    • Cards
      • A good card designer should make it possible to create quickly but also give the user control.
    • Search Templates
      • To give the user feedback about the available search results.
    /23
  • 23. CONCLUSIONS AND FUTURE WORK
    • This Work combines content extraction and Web search to provide services and tools that are much needed and can help users with challenging information tasks.
    • Such a web of relationships can enable a new shift in Web applications and bring about a World Wide Web that is both more personal and collaborative.
    • They plan to continue evolving the card designer to provide light-weight card authoring for the novice.
    • They plan to explore approaches for providing more feedback so that the user can understand search results and quickly and easily iterate through queries.
    • They hope To explore which websites people relate together, how often they create new cards, and how well they can use search templates.
    /23