Transcendence: Enabling A Personal View of the Deep Web

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Notes on slide 1

    Transcendence helps make web forms more flexible and enables users to conduct queries that help them find the information they really want more easily from deep web resources. Transcendence is a web browser extension that enables a personal view of the deep web by making web forms more flexible, enabling users to perform queries of interest to them that are not supported by the original interface. . It enables users to enter multiple values for form input fields that may have originally been restricted to one, submits all combinations of form input automatically, and merges these results for easy visualization. It uses unsupervised information extraction to automatically supply inputs, enabling users to partially reconstruct the databases underlying deep web resources, facilitating aggregate queries that were previously impossible. Transcendence is joint work with fellow graduate students, Anna Cavender, Ryan Kaminsky, Craig Prince and Tyler Robison.

    Favorites, Groups & Events

    Transcendence: Enabling A Personal View of the Deep Web - Presentation Transcript

    1. Enabling a Personal View of the Deep Web Jeffrey P. Bigham Anna C. Cavender, Ryan S. Kaminsky, Craig M. Prince, and Tyler S. Robison University of Washington Computer Science and Engineering Transcendence
    2. What is the Deep Web?
      • The deep web
        • Built from underlying databases
        • Accessible by querying web forms
        • 400-550x larger than surface web 1
      • The surface web
        • Accessible by following links
        • Indexed by traditional search engines
      [1] Bergman, M. K. The deep web: Surfacing hidden value, 2001. Introduction
    3. Deep Web Resources Introduction
    4.  
    5. Deep Web Resources Introduction
    6. Problems
      • Web interfaces are inflexible
        • Your query might not be supported
      • Many searches are often required
        • Multiple queries, multiple tabs/windows
      • Aggregate queries are difficult
        • Data is technically available but hard to access
      Introduction
    7. Outline
      • Introduction
      • Transcending Craigslist
      • Related Work
      • 3 Steps of Transcendence
      • Additional Examples
      • User Evaluation
    8. Scenario
      • Jane is a new student at UW
      • Looking for an apartment on Craigslist
      • Aware of two neighborhoods in Seattle
        • “ University District” and “University Village”
      • Looking for the cheapest apartment near UW
      Transcending Craigslist
    9.  
    10.  
    11. Generalize a Form Field
    12. Add a Value
    13. Add a Value
    14. Add Another Value
    15. Automatically Generate More Values
    16. Results only for “University Village”
    17. Fields Automatically Chosen
    18. Extract for All Inputs
    19. Review Extractions in Place
    20. Extractions Sorted by Price
    21. Transcending Craigslist
      • Provided personal view of Craigslist
      • Multiple queries/results in single window
      • Cheapest neighborhood not originally entered
      • Required only a little more than a single search
    22. Outline
      • Introduction
      • Transcending Craigslist
      • Related Work
      • 3 Steps of Transcendence
      • Additional Examples
      • User Evaluation
    23. Crawling the Deep Web
      • Crawling the Deep Web
      • 1. Find interesting web forms
      • 2. Find appropriate values to provide
            • Determine schema and appropriate queries 1
            • Find keywords likely to elicit interesting results 2
      • Don’t involve users
      [1] Madhavan et al. “Structured data meets the web: A few observations.” 2006. [2] Ntoulas et al. “ Downloading textual hidden web content through keyword queries.” 2005. Related Work
    24. User Interfaces for the Web
      • Collect, Manage and Use Web Information
        • Sifter 1
          • Augment sorting/filtering
        • Clip, Connect, Clone 2
          • Manipulate web forms
          • Clone to specify multiple values
        • CREO 3
          • Web macros that generalize using Open Mind Repository
      Related Work [1] Huynh et al . “Enabling web browsers to augment web sites’ filtering and sorting functionalities.” UIST 2006. [2] Fujima et al. “Clip, connect, clone: combining application elements to build custom interfaces for information access.” UIST 2004. [3] Faaborg et al. “A goal-oriented web browser.” CHI 2006.
    25. Outline
      • Introduction
      • Transcending Craigslist
      • Related Work
      • 3 Steps of Transcendence
      • Additional Examples
      • User Evaluation
    26. 1. Generalize Form
      • Choose input fields to generalize
      • Enter multiple values for those fields
        • Either enter manually, or
        • Use subset of values in selection/radio/checkbox
      • Optionally add more values automatically
        • Prior Input of Other Users
        • Unsupervised Information Extraction
      3 Steps of Transcendence 1 2 3
      • Input : phrases Output : (similar) phrases
      • Google Sets 1
        • Up to 10 inputs, returns 15 or 50 results
        • Based on contextual similarity (probably)
      • KnowItAll List Extractor 2
        • Finds inputs in lists in unstructured web text
        • Extracts other items in the lists
        • Proceeds in iterations to potentially find many more
      1-a) Finding Values with UIE [1] http://labs.google.com/sets/ [2] Etzioni et al . “Methods for domain-independent information extraction from the web: an experimental comparison.” 2008 3 Steps of Transcendence 1 2 3
    27. 2. Choose Fields & Extract
      • Submit form with single combination of values
      • Result fields identified automatically 1
        • Identified by XPATH
        • Pre-processing adds structure
      • Users optionally edit fields
      • Begin extraction process
        • Multi-threaded extraction
      3 Steps of Transcendence [1] Huynh et al . “Enabling web browsers to augment web sites’ filtering and sorting functionalities.” UIST 2006. 1 2 3
    28. 3. Visualize Data
      • In place on the web page
        • Sort and select results within the web page
      • External Visualizers
        • Histogram , Google Map , line graphs ,
        • scatter plot , and table of values
      3 Steps of Transcendence 1 2 3
    29. Outline
      • Introduction
      • Transcending Craigstlist
      • Related Work
      • 3 Steps of Transcendence
      • Additional Examples
      • User Evaluation
    30. Examples: IMDB 1 Rating Dist. Additional Examples [1] http://www.imdb.com Entered: “ Scent of a Woman,” “Rocky,” “Star Wars,” and “The Matrix” Generate > 7000 more titles
    31. Examples: IMDB 1 Rating Dist. Additional Examples [1] http://www.imdb.com
    32. Examples: Directory Diving
      • Supplied 3 surnames:
        • “ Allen,” “Smith,” and “Johnson”
        • Generated 10,063 more names
      • A few hours later…
        • 51,233 unique names and emails
        • Also address information, position
      Additional Examples
    33. Outline
      • Introduction
      • Transcending Craigsist
      • Related Work
      • 3 Steps of Transcendence
      • Additional Examples
      • User Evaluation
    34. User Evaluation
      • 9 Potential Users Evaluated Transcendence
        • 5 programmers, 4 non-programmers
      • 3 Tasks
        • Search for a flight
          • Multiple destinations, departure and return dates
        • Map REI Stores in the U.S.
        • Search Craigslist for an apartment
      User Evaluation
    35. User Reaction & Comments
      • Agreed that Transcendence:
      • “ could be used to find useful information”
      • “ is powerful (would allow me to easily accomplish difficult tasks).”
      • Most compelling task varied by user
        • Craigslist suggested by preliminary user
        • Many related to flight task
      • Questioned value of incomplete database reconstructions
        • Pleasantly surprised by values automatically supplied
      • Wanted to use Transcendence in the future
      User Evaluation
    36. Future Work
      • Implicit Resource Descriptions
        • Eliminate Need to Choose Result Fields
          • Share result schemas between users
          • Eliminate Step 2
        • Custom vertical search engines
          • User-created Kayaks, Metacrawlers, and Froogles
      • Improved Deep Web Crawling
        • Use UIE to find appropriate values for forms
      Future Work
    37. Conclusion
      • Transcendence makes forms more flexible
      • Transcendence automatically finds input values
      • Unsupervised information extraction useful for crawling
      • Transcendence enables new queries not possible before
      • Participants wanted to use Transcendence
      Conclusion
    38. Transcendence Jeffrey P. Bigham [email_address] www.cs.washington.edu/homes/jbigham/ Thanks to: Mira Dontcheva, UW Turing Center, anonymous reviewers, and our study participants . The End
    39. Some Extra Slides
    40. Show Those Resulting from Specific Inputs
    41. Show Those Resulting from Specific Inputs
    42. Show Wedgewood Results
    43.  
    44.  
    45.  
    46. System Description
    47. 3 Steps of Transcendence Generalize Choose Fields & Extract Visualize
    48. Examples: Mapping Stores Additional Examples
    49. Examples: Kayak Flights Additional Examples
    50. User Evaluation
    51. Introduction
    52. Introduction
    53. Introduction
    54. Introduction
    55. Introduction
    56. Introduction
    57. Deep Web Resources Introduction

    + jbighamjbigham, 2 years ago

    custom

    2870 views, 0 favs, 0 embeds more stats

    Transcendence talk at IUI given by Jeffrey P. Bigha more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 2870
      • 2870 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 21
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories