Wrong confirmation ID
  • Email
  • Favorite
  • Download
  • Embed
  • Private Content

Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Rediscovering Missing Web Pages Using Link Neighborhood Lexical Signatures

by Martin Klein on Jun 17, 2011

  • 854 views

For discovering the new URI of a missing web page, lexical signatures, which consist of a small number of words chosen to represent the “aboutness” of a page, have been previously proposed. However...

For discovering the new URI of a missing web page, lexical signatures, which consist of a small number of words chosen to represent the “aboutness” of a page, have been previously proposed. However, prior methods relied on computing the lexical signature before the page was lost, or using cached or archived versions of the page to calculate a lexical signature. We demonstrate a system of constructing a lexical signature for a page from its link neighborhood, that is the “backlinks”, or pages that link to the missing page. After testing various methods, we show that one can construct a lexical signature for a missing web page using only ten backlink pages. Further, we show that only the first level of backlinks are useful in this effort. The text that the backlinks use to point to the missing page is used as input for the creation of a four-word lexical signature. That lexical signature is shown to successfully find the target URI in more than half of the test cases.

Accessibility

Categories

Tags

2011 klein mising martin rediscover web synchonicity jcdl pages

More...

Upload Details

Uploaded via SlideShare as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

Cancel

4 Embeds 197

http://ws-dl.blogspot.com 191
http://ws-dl.blogspot.in 3
http://ws-dl.blogspot.ca 2
http://webcache.googleusercontent.com 1

Statistics

Favorites
0
Downloads
0
Comments
0
Embed Views
197
Views on SlideShare
657
Total Views
854
Post Comment
Edit your comment Cancel

Rediscovering Missing Web Pages Using Link Neighborhood Lexical Signatures — Presentation Transcript