Transcendence: Enabling A Personal View of the Deep Web


Published on

Transcendence talk at IUI given by Jeffrey P. Bigham. See

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Transcendence helps make web forms more flexible and enables users to conduct queries that help them find the information they really want more easily from deep web resources. Transcendence is a web browser extension that enables a personal view of the deep web by making web forms more flexible, enabling users to perform queries of interest to them that are not supported by the original interface. . It enables users to enter multiple values for form input fields that may have originally been restricted to one, submits all combinations of form input automatically, and merges these results for easy visualization. It uses unsupervised information extraction to automatically supply inputs, enabling users to partially reconstruct the databases underlying deep web resources, facilitating aggregate queries that were previously impossible. Transcendence is joint work with fellow graduate students, Anna Cavender, Ryan Kaminsky, Craig Prince and Tyler Robison.
  • Transcendence: Enabling A Personal View of the Deep Web

    1. 1. Enabling a Personal View of the Deep Web Jeffrey P. Bigham Anna C. Cavender, Ryan S. Kaminsky, Craig M. Prince, and Tyler S. Robison University of Washington Computer Science and Engineering Transcendence
    2. 2. What is the Deep Web? <ul><li>The deep web </li></ul><ul><ul><li>Built from underlying databases </li></ul></ul><ul><ul><li>Accessible by querying web forms </li></ul></ul><ul><ul><li>400-550x larger than surface web 1 </li></ul></ul><ul><li>The surface web </li></ul><ul><ul><li>Accessible by following links </li></ul></ul><ul><ul><li>Indexed by traditional search engines </li></ul></ul>[1] Bergman, M. K. The deep web: Surfacing hidden value, 2001. Introduction
    3. 3. Deep Web Resources Introduction
    4. 5. Deep Web Resources Introduction
    5. 6. Problems <ul><li>Web interfaces are inflexible </li></ul><ul><ul><li>Your query might not be supported </li></ul></ul><ul><li>Many searches are often required </li></ul><ul><ul><li>Multiple queries, multiple tabs/windows </li></ul></ul><ul><li>Aggregate queries are difficult </li></ul><ul><ul><li>Data is technically available but hard to access </li></ul></ul>Introduction
    6. 7. Outline <ul><li>Introduction </li></ul><ul><li>Transcending Craigslist </li></ul><ul><li>Related Work </li></ul><ul><li>3 Steps of Transcendence </li></ul><ul><li>Additional Examples </li></ul><ul><li>User Evaluation </li></ul>
    7. 8. Scenario <ul><li>Jane is a new student at UW </li></ul><ul><li>Looking for an apartment on Craigslist </li></ul><ul><li>Aware of two neighborhoods in Seattle </li></ul><ul><ul><li>“ University District” and “University Village” </li></ul></ul><ul><li>Looking for the cheapest apartment near UW </li></ul>Transcending Craigslist
    8. 11. Generalize a Form Field
    9. 12. Add a Value
    10. 13. Add a Value
    11. 14. Add Another Value
    12. 15. Automatically Generate More Values
    13. 16. Results only for “University Village”
    14. 17. Fields Automatically Chosen
    15. 18. Extract for All Inputs
    16. 19. Review Extractions in Place
    17. 20. Extractions Sorted by Price
    18. 21. Transcending Craigslist <ul><li>Provided personal view of Craigslist </li></ul><ul><li>Multiple queries/results in single window </li></ul><ul><li>Cheapest neighborhood not originally entered </li></ul><ul><li>Required only a little more than a single search </li></ul>
    19. 22. Outline <ul><li>Introduction </li></ul><ul><li>Transcending Craigslist </li></ul><ul><li>Related Work </li></ul><ul><li>3 Steps of Transcendence </li></ul><ul><li>Additional Examples </li></ul><ul><li>User Evaluation </li></ul>
    20. 23. Crawling the Deep Web <ul><li>Crawling the Deep Web </li></ul><ul><li>1. Find interesting web forms </li></ul><ul><li>2. Find appropriate values to provide </li></ul><ul><ul><ul><ul><li>Determine schema and appropriate queries 1 </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Find keywords likely to elicit interesting results 2 </li></ul></ul></ul></ul><ul><li>Don’t involve users </li></ul>[1] Madhavan et al. “Structured data meets the web: A few observations.” 2006. [2] Ntoulas et al. “ Downloading textual hidden web content through keyword queries.” 2005. Related Work
    21. 24. User Interfaces for the Web <ul><li>Collect, Manage and Use Web Information </li></ul><ul><ul><li>Sifter 1 </li></ul></ul><ul><ul><ul><li>Augment sorting/filtering </li></ul></ul></ul><ul><ul><li>Clip, Connect, Clone 2 </li></ul></ul><ul><ul><ul><li>Manipulate web forms </li></ul></ul></ul><ul><ul><ul><li>Clone to specify multiple values </li></ul></ul></ul><ul><ul><li>CREO 3 </li></ul></ul><ul><ul><ul><li>Web macros that generalize using Open Mind Repository </li></ul></ul></ul>Related Work [1] Huynh et al . “Enabling web browsers to augment web sites’ filtering and sorting functionalities.” UIST 2006. [2] Fujima et al. “Clip, connect, clone: combining application elements to build custom interfaces for information access.” UIST 2004. [3] Faaborg et al. “A goal-oriented web browser.” CHI 2006.
    22. 25. Outline <ul><li>Introduction </li></ul><ul><li>Transcending Craigslist </li></ul><ul><li>Related Work </li></ul><ul><li>3 Steps of Transcendence </li></ul><ul><li>Additional Examples </li></ul><ul><li>User Evaluation </li></ul>
    23. 26. 1. Generalize Form <ul><li>Choose input fields to generalize </li></ul><ul><li>Enter multiple values for those fields </li></ul><ul><ul><li>Either enter manually, or </li></ul></ul><ul><ul><li>Use subset of values in selection/radio/checkbox </li></ul></ul><ul><li>Optionally add more values automatically </li></ul><ul><ul><li>Prior Input of Other Users </li></ul></ul><ul><ul><li>Unsupervised Information Extraction </li></ul></ul>3 Steps of Transcendence 1 2 3
    24. 27. <ul><li>Input : phrases Output : (similar) phrases </li></ul><ul><li>Google Sets 1 </li></ul><ul><ul><li>Up to 10 inputs, returns 15 or 50 results </li></ul></ul><ul><ul><li>Based on contextual similarity (probably) </li></ul></ul><ul><li>KnowItAll List Extractor 2 </li></ul><ul><ul><li>Finds inputs in lists in unstructured web text </li></ul></ul><ul><ul><li>Extracts other items in the lists </li></ul></ul><ul><ul><li>Proceeds in iterations to potentially find many more </li></ul></ul>1-a) Finding Values with UIE [1] [2] Etzioni et al . “Methods for domain-independent information extraction from the web: an experimental comparison.” 2008 3 Steps of Transcendence 1 2 3
    25. 28. 2. Choose Fields & Extract <ul><li>Submit form with single combination of values </li></ul><ul><li>Result fields identified automatically 1 </li></ul><ul><ul><li>Identified by XPATH </li></ul></ul><ul><ul><li>Pre-processing adds structure </li></ul></ul><ul><li>Users optionally edit fields </li></ul><ul><li>Begin extraction process </li></ul><ul><ul><li>Multi-threaded extraction </li></ul></ul>3 Steps of Transcendence [1] Huynh et al . “Enabling web browsers to augment web sites’ filtering and sorting functionalities.” UIST 2006. 1 2 3
    26. 29. 3. Visualize Data <ul><li>In place on the web page </li></ul><ul><ul><li>Sort and select results within the web page </li></ul></ul><ul><li>External Visualizers </li></ul><ul><ul><li>Histogram , Google Map , line graphs , </li></ul></ul><ul><ul><li>scatter plot , and table of values </li></ul></ul>3 Steps of Transcendence 1 2 3
    27. 30. Outline <ul><li>Introduction </li></ul><ul><li>Transcending Craigstlist </li></ul><ul><li>Related Work </li></ul><ul><li>3 Steps of Transcendence </li></ul><ul><li>Additional Examples </li></ul><ul><li>User Evaluation </li></ul>
    28. 31. Examples: IMDB 1 Rating Dist. Additional Examples [1] Entered: “ Scent of a Woman,” “Rocky,” “Star Wars,” and “The Matrix” Generate > 7000 more titles
    29. 32. Examples: IMDB 1 Rating Dist. Additional Examples [1]
    30. 33. Examples: Directory Diving <ul><li>Supplied 3 surnames: </li></ul><ul><ul><li>“ Allen,” “Smith,” and “Johnson” </li></ul></ul><ul><ul><li>Generated 10,063 more names </li></ul></ul><ul><li>A few hours later… </li></ul><ul><ul><li>51,233 unique names and emails </li></ul></ul><ul><ul><li>Also address information, position </li></ul></ul>Additional Examples
    31. 34. Outline <ul><li>Introduction </li></ul><ul><li>Transcending Craigsist </li></ul><ul><li>Related Work </li></ul><ul><li>3 Steps of Transcendence </li></ul><ul><li>Additional Examples </li></ul><ul><li>User Evaluation </li></ul>
    32. 35. User Evaluation <ul><li>9 Potential Users Evaluated Transcendence </li></ul><ul><ul><li>5 programmers, 4 non-programmers </li></ul></ul><ul><li>3 Tasks </li></ul><ul><ul><li>Search for a flight </li></ul></ul><ul><ul><ul><li>Multiple destinations, departure and return dates </li></ul></ul></ul><ul><ul><li>Map REI Stores in the U.S. </li></ul></ul><ul><ul><li>Search Craigslist for an apartment </li></ul></ul>User Evaluation
    33. 36. User Reaction & Comments <ul><li>Agreed that Transcendence: </li></ul><ul><li> “ could be used to find useful information” </li></ul><ul><li> “ is powerful (would allow me to easily accomplish difficult tasks).” </li></ul><ul><li>Most compelling task varied by user </li></ul><ul><ul><li>Craigslist suggested by preliminary user </li></ul></ul><ul><ul><li>Many related to flight task </li></ul></ul><ul><li>Questioned value of incomplete database reconstructions </li></ul><ul><ul><li>Pleasantly surprised by values automatically supplied </li></ul></ul><ul><li>Wanted to use Transcendence in the future </li></ul>User Evaluation
    34. 37. Future Work <ul><li>Implicit Resource Descriptions </li></ul><ul><ul><li>Eliminate Need to Choose Result Fields </li></ul></ul><ul><ul><ul><li>Share result schemas between users </li></ul></ul></ul><ul><ul><ul><li>Eliminate Step 2 </li></ul></ul></ul><ul><ul><li>Custom vertical search engines </li></ul></ul><ul><ul><ul><li>User-created Kayaks, Metacrawlers, and Froogles </li></ul></ul></ul><ul><li>Improved Deep Web Crawling </li></ul><ul><ul><li>Use UIE to find appropriate values for forms </li></ul></ul>Future Work
    35. 38. Conclusion <ul><li>Transcendence makes forms more flexible </li></ul><ul><li>Transcendence automatically finds input values </li></ul><ul><li>Unsupervised information extraction useful for crawling </li></ul><ul><li>Transcendence enables new queries not possible before </li></ul><ul><li>Participants wanted to use Transcendence </li></ul>Conclusion
    36. 39. Transcendence Jeffrey P. Bigham [email_address] Thanks to: Mira Dontcheva, UW Turing Center, anonymous reviewers, and our study participants . The End
    37. 40. Some Extra Slides
    38. 41. Show Those Resulting from Specific Inputs
    39. 42. Show Those Resulting from Specific Inputs
    40. 43. Show Wedgewood Results
    41. 47. System Description
    42. 48. 3 Steps of Transcendence Generalize Choose Fields & Extract Visualize
    43. 49. Examples: Mapping Stores Additional Examples
    44. 50. Examples: Kayak Flights Additional Examples
    45. 51. User Evaluation
    46. 52. Introduction
    47. 53. Introduction
    48. 54. Introduction
    49. 55. Introduction
    50. 56. Introduction
    51. 57. Introduction
    52. 58. Deep Web Resources Introduction