Slideshare.net (beta)

 
Post: 
Myspace Hi5 Friendster Xanga LiveJournal Facebook Blogger Tagged Typepad Freewebs BlackPlanet gigya icons



All comments

Add a comment on Slide 1

If you have a SlideShare account, login to comment; else you can comment as a guest


Showing 1-50 of 0 (more)

Transcendence: Enabling A Personal View of the Deep Web

From jbigham, 5 months ago

Transcendence talk at IUI given by Jeffrey P. Bigham. See http:// more

1366 views  |  0 comments  |  0 favorites  |  9 downloads
 
 
 

Groups/Events

Not added to any group/event

 
 

Privacy InfoNew!

This slideshow is Public

 
Embed in your blog
Embed (wordpress.com)
custom

Slideshow Statistics
Total Views: 1366
on Slideshare: 1366
from embeds: 0* * Views from embeds since 21 Aug, 07

Slideshow transcript

Slide 1: Enabling a Personal View of the Deep Web Jeffrey P. Bigham Anna C. Cavender, Ryan S. Kaminsky, Craig M. Prince, and Tyler S. Robison University of Washington Computer Science and Engineering

Slide 2: Introduction What is the Deep Web? o The deep web o Built from underlying databases o Accessible by querying web forms o 400-550x larger than surface web1 o The surface web o Accessible by following links o Indexed by traditional search engines [1] Bergman, M. K. The deep web: Surfacing hidden value, 2001.

Slide 3: Introduction Deep Web Resources

Slide 5: Introduction Deep Web Resources

Slide 6: Introduction Problems o Web interfaces are inflexible – Your query might not be supported o Many searches are often required – Multiple queries, multiple tabs/windows o Aggregate queries are difficult – Data is technically available but hard to access

Slide 7: Outline o Introduction o Transcending Craigslist o Related Work o 3 Steps of Transcendence o Additional Examples o User Evaluation

Slide 8: Transcending Craigslist Scenario o Jane is a new student at UW o Looking for an apartment on Craigslist o Aware of two neighborhoods in Seattle – “University District” and “University Village” o Looking for the cheapest apartment near UW

Slide 11: Generalize a Form Field

Slide 12: Add a Value

Slide 13: Add a Value

Slide 14: Add Another Value

Slide 15: Automatically Generate More Values

Slide 16: Results only for “University Village”

Slide 17: Fields Automatically Chosen

Slide 18: Extract for All Inputs

Slide 19: Review Extractions in Place

Slide 20: Extractions Sorted by Price

Slide 21: Transcending Craigslist o Provided personal view of Craigslist o Multiple queries/results in single window o Cheapest neighborhood not originally entered o Required only a little more than a single search

Slide 22: Outline o Introduction o Transcending Craigslist o Related Work o 3 Steps of Transcendence o Additional Examples o User Evaluation

Slide 23: Related Work Crawling the Deep Web o Crawling the Deep Web 1. Find interesting web forms 2. Find appropriate values to provide – Determine schema and appropriate queries1 – Find keywords likely to elicit interesting results2 o Don’t involve users [1] Madhavan et al. “Structured data meets the web: A few observations.” 2006. [2] Ntoulas et al. “Downloading textual hidden web content through keyword queries.” 2005.

Slide 24: Related Work User Interfaces for the Web o Collect, Manage and Use Web Information – Sifter1 o Augment sorting/filtering – Clip, Connect, Clone2 o Manipulate web forms o Clone to specify multiple values – CREO3 o Web macros that generalize using Open Mind Repository [1] Huynh et al. “Enabling web browsers to augment web sites’ filtering and sorting functionalities.” UIST 2006. [2] Fujima et al. “Clip, connect, clone: combining application elements to build custom interfaces for information access.” UIST 2004. [3] Faaborg et al. “A goal-oriented web browser.” CHI 2006.

Slide 25: Outline o Introduction o Transcending Craigslist o Related Work o 3 Steps of Transcendence o Additional Examples o User Evaluation

Slide 26: 3 Steps of Transcendence 1. Generalize Form o Choose input fields to generalize o Enter multiple values for those fields – Either enter manually, or – Use subset of values in selection/radio/checkbox o Optionally add more values automatically – Prior Input of Other Users – Unsupervised Information Extraction

Slide 27: 3 Steps of Transcendence 1-a) Finding Values with UIE Input: phrases Output: (similar) phrases o Google Sets1 – Up to 10 inputs, returns 15 or 50 results – Based on contextual similarity (probably) o KnowItAll List Extractor2 – Finds inputs in lists in unstructured web text – Extracts other items in the lists – Proceeds in iterations to potentially find many more [1] http://labs.google.com/sets/ [2] Etzioni et al. “Methods for domain-independent information extraction from the web: an experimental comparison.” 2008

Slide 28: 3 Steps of Transcendence 2. Choose Fields & Extract o Submit form with single combination of values o Result fields identified automatically1 – Identified by XPATH – Pre-processing adds structure o Users optionally edit fields o Begin extraction process – Multi-threaded extraction [1] Huynh et al. “Enabling web browsers to augment web sites’ filtering and

Slide 29: 3 Steps of Transcendence 3. Visualize Data o In place on the web page – Sort and select results within the web page o External Visualizers – Histogram, Google Map, line graphs, scatter plot, and table of values

Slide 30: Outline o Introduction o Transcending Craigstlist o Related Work o 3 Steps of Transcendence o Additional Examples o User Evaluation

Slide 31: Additional Examples Examples: IMDB1 Rating Dist. Entered: “Scent of a Woman,” “Rocky,” “Star Wars,” and “The Matrix” Generate > 7000 more titles [1] http://www.imdb.com

Slide 32: Additional Examples Examples: IMDB1 Rating Dist. [1] http://www.imdb.com

Slide 33: Additional Examples Examples: Directory Diving o Supplied 3 surnames: – “Allen,” “Smith,” and “Johnson” – Generated 10,063 more names o A few hours later… – 51,233 unique names and emails – Also address information, position

Slide 34: Outline o Introduction o Transcending Craigsist o Related Work o 3 Steps of Transcendence o Additional Examples o User Evaluation

Slide 35: User Evaluation User Evaluation o 9 Potential Users Evaluated Transcendence – 5 programmers, 4 non-programmers o 3 Tasks – Search for a flight o Multiple destinations, departure and return dates – Map REI Stores in the U.S. – Search Craigslist for an apartment

Slide 36: User Evaluation User Reaction & Comments o Agreed that Transcendence: “could be used to find useful information” “is powerful (would allow me to easily accomplish difficult tasks).” o Most compelling task varied by user – Craigslist suggested by preliminary user – Many related to flight task o Questioned value of incomplete database reconstructions – Pleasantly surprised by values automatically supplied o Wanted to use Transcendence in the future

Slide 37: Future Work Future Work o Implicit Resource Descriptions – Eliminate Need to Choose Result Fields o Share result schemas between users o Eliminate Step 2 – Custom vertical search engines o User-created Kayaks, Metacrawlers, and Froogles o Improved Deep Web Crawling – Use UIE to find appropriate values for forms

Slide 38: Conclusion Conclusion o Transcendence makes forms more flexible o Transcendence automatically finds input values o Unsupervised information extraction useful for crawling o Transcendence enables new queries not possible before o Participants wanted to use Transcendence

Slide 39: The End Transcendence Jeffrey P. Bigham jbigham@cs.washington.edu www.cs.washington.edu/homes/jbigham/ Thanks to: Mira Dontcheva, UW Turing Center, anonymous reviewers, and our study participants.

Slide 40: Some Extra Slides

Slide 41: Show Those Resulting from Specific Inputs

Slide 42: Show Those Resulting from Specific Inputs

Slide 43: Show Wedgewood Results

Slide 47: System Description Generalizers KnowItAll Google Google Sets The Maps Web Prior Input Firefox Extension Java Applet Step 1: Step 2: Step 3: Generalize Extract Visualize Transcendence Transcendence Extraction System System Database

Slide 48: 3 Steps of Transcendence Generalize Choose Fields Visualize & Extract

Slide 49: Additional Examples Examples: Mapping Stores

Slide 50: Additional Examples Examples: Kayak Flights

Slide 51: User Evaluation Programmers Non-Programmers Combined Ease of Use 1 7 1 7 1 7 1. Transcendence is difficult to learn how to use. 2. Transcendence is tedious to use. 1 (strongly disagree ) to 7 (strongly agree ) Value 3. I could use Transcendence to find useful information. 4. Transcendence is powerful (it could allow me to easily accomplish difficult tasks). 5. Manually recreating Transcendence’s functionality for a specific web site would be difficult. 6. Manually recreating Transcendence’s functionality would be time-consuming. 7. Transcendence is useful for performing the tasks in this study. 8. Generalization of input fields is useful. 9. Automatic selection of fields is useful. 10. Transcendence would save me time. 1 7 1 7 1 7 11. I would use Transcendence in the future if was available. 1 (strongly disagree ) to 7 (strongly agree)

Slide 52: Introduction

Slide 53: Introduction

Slide 54: Introduction

Slide 55: Introduction

Slide 56: Introduction

Slide 57: Introduction

Slide 58: Introduction Deep Web Resources