Slideshare.net (beta)

 

All comments

Add a comment on Slide 1

If you have a SlideShare account, login to comment; else you can comment as a guest


Showing 1-50 of 0 (more)

Retrieval and Feedback Models for Blog Feed Search

From jelsas, 4 weeks ago

SIGIR 2008 Presentation

302 views  |  0 comments  |  0 favorites  |  4 downloads  |  2 embeds (Stats)
Embed
options

More Info

This slideshow is Public
Total Views: 302
on Slideshare: 280
from embeds: 22

Slideshow transcript

Slide 1: QuickTimeᆰ and a TIFF (Uncompressed) decompressor are needed to see this picture. Retrieval and Feedback Models for Blog Feed Search SIGIR 2008 Singapore Jonathan Elsas, Jaime Arguello, Jamie Callan & Jaime Carbonell LTI/SCS/CMU

Slide 2: Outline • The task – Overview of Blogs & Blog Search – Challenges in Blog Search • Our approach – Retrieval Models – Query Expansion Models • Conclusion

Slide 3: Background

Slide 4: What is a Blog?

Slide 5: What is a Feed? <xml> <feed> <entry> <author>Peter …</> <title>Good, Evil…</> <content>I’ve said…</> </entry> <entry> <author>Peter …</> <title>Agreeing…</> <content>Some peo…</> </entry> …

Slide 6: Blog-Feed Correspondence Blog Feed Post Entry HTML XML

Slide 7: Why are Blogs important? Technorati currently tracking > 112.8 Million Blogs > 175,000 new Blogs per day > 1.6 Million posts per day [http://www.technorati.com/about/]

Slide 8: The Task

Slide 9: Feed Search at TREC (a.k.a. Blog Distillation) Ranking Blogs/Feeds (collections of posts) in response to a user’s query, [X] “A relevant feed should have a principle and recurring interest in X” — TREC 2007 Blog Track

Slide 10: Feed Search at TREC [Gardening] Represent Ongoing [Apple iPod] Information Frequently [Violence in Sudan] Needs Very [Gun Control] General [Food] [Wine]

Slide 11: Challenges in Feed Search

Slide 12: Challenges in Feed Search 1. A feed is a collection of documents feed time entries

Slide 13: Challenges in Feed Search 1. A feed is a collection of documents – How does relevance at the entry level correspond to relevance at the feed level? feed time entries

Slide 14: Challenges in Feed Search 2. Even a topical feed is topically diverse My dog Boeing China’s plans for Mars shuttle the moon NASA rover launch topic time Space Exploration

Slide 15: Challenges in Feed Search 2. Even a topical feed is topically diverse – Can we favor entries close to the central topic of the feed? topic time Space Exploration

Slide 16: Challenges in Feed Search 3. Feeds are noisy – Spam blogs, Spam & off topic comments time

Slide 17: Challenges in Feed Search 4. General & Ongoing Information Needs … describing songs, … post regularly about new biographies of products, features, or musicians, musical [Mac] application software of styles and Apple Mac computers. their influences of [Music] music on people are … describing discussed. [Food] …such as tastings, reviews, experiences eating food matching or pairing, cuisines, culinary [Wine] and oenophile news and delights, events. recipes, nutrition plans.

Slide 18: Our Approach

Slide 19: Challenges Our Approach Feeds: Topically Diverse Retrieval Models Noisy Collections Feedback Models Information Needs: General & Ongoing

Slide 20: Retrieval Models • Challenge: ranking topically diverse collections • Representation: feed vs. entry • Model topical relationship between entries

Slide 21: Large Document (Feed) Model Ranked Feeds Feed Document Collection [Q] <?xml… <?xml… <feed> <feed> <?xml… <?xml… <entry> <entry> `<?xml… <?xml… Rank by … <?xml… …<entry> <entry> <?xml… … … </…> … </…> … <entry> </…> <entry> </…> </…> <entry> </…> <entry> <entry> <entry> Indri’s standard retrieval model … … [Metzler and Croft, 2004; 2005] </…> </…>

Slide 22: Large Document (Feed) Model Advantages: • Feed straightforward application of existing retrieval A techniques E Entry E Entry Entry E Potential Pitfalls: • Large entries dominate a feed’s language model • Ignores relationship among entries

Slide 23: Small Document (Entry) Model [Q] Entry Document Collection Ranked Feeds Ranked Entries document = entry Apply some rank aggregation function <entry> <entry> <entry> <entry> <entry> <entry><entry> <entry> <entry> <entry><entry> <?xml… <entry> <entry> <?xml… <entry> <entry> <entry> <entry> <?xml… <entry> <entry> <entry><?xml… <entry> <entry> <entry> <entry><entry> <entry> <?xml… <entry> <entry> <entry> <entry> <entry> <?xml… <entry> <entry> <entry> <?xml… Rank By <entry>

Slide 24: Small Document (Entry) Model • Query Likelihood • Entry Centrality • Feed Prior: favors longer feeds ReDDE Federated Search Algortihm [Si & Callan, 2003]

Slide 25: Entry Centrality topic time Uniform : Geometric Mean :

Slide 26: Small Document (Entry) Model Advantages: " Controls for differing entry length " Models topical relationship among entries Not only improves speed, Also performance Disadvantages: " Centrality computation is slow(er) Q

Slide 27: Retrieval Model Results

Slide 28: Retrieval Model Results • 45 Queries from the TREC 2007 Blog Distillation Task • BLOG06 test collection, XML feeds only • 5-Fold Cross Validation for all retrieval model smoothing parameters

Slide 29: Retrieval Model Results Mean Average Precision 0.325 0.315 0.305 0.298 0.29 0.29 0.285 0.277 0.265 0.245 Large Small Document (Entry) Models Document (Feed) Model

Slide 30: Retrieval Model Results Mean Average Precision 0.325 0.315 0.305 0.298 0.29 0.29 0.285 0.277 0.265 0.245 Uniform Uniform Log(Feed Length) Log Prior Map 0.188

Slide 31: Retrieval Model Results Mean Average Precision 0.325 0.315 0.305 0.298 0.29 0.29 0.285 0.277 0.265 0.245 Uniform Uniform Log(Feed Length) n/a

Slide 32: Feedback Models • Challenge: Noisy collection with general & ongoing information needs • Use a cleaner external collection for query expansion (Wikipedia) • With an expansion technique designed to identify multiple query facets

Slide 33: Query Expansion (PRF) [Q] Related Terms from top K documents BLOG06 Collection [Q + Terms] [Lavrenko & Croft, 2001]

Slide 34: Query Expansion Example [Photography] Ideal PRF digital photography photography nude depth of field erotic art photographic film girl free photojournalism teen fashion cinematography women

Slide 35: Feedback Model Results None PRF 0.36 Mean Average Precision 0.32 0.28 0.24 0.2 BLOG LD BLOG SD

Slide 36: Query Expansion (Wikipedia PRF) [Q] Wikipedia Related Terms from top K documents [Q + BLOG06 Terms] Collection [Diaz & Metzler, 2006] [Lavrenko & Croft, 2001]

Slide 37: Query Expansion Example [Photography] Ideal PRF Wikipedia PRF digital photography photography photography nude director depth of field erotic special art film photographic film girl art free camera photojournalism teen music fashion cinematographer cinematography women photographic

Slide 38: Feedback Model Results None PRF Wiki. PRF 0.36 Mean Average Precision 0.32 0.28 0.24 0.2 BLOG LD BLOG SD

Slide 39: Query Expansion (Wikipedia Link) [Q] Wikipedia Related Terms from link structure [Q + BLOG06 Terms] Collection

Slide 40: Wikipedia Link-Based Query Expansion

Slide 41: Wikipedia Link-Based Expansion Wikipedia Q …

Slide 42: Wikipedia Link-Based Expansion Wikipedia Relevance Set, Top R = 100 Q Working Set, Top W = 1000 …

Slide 43: Wikipedia Link-Based Expansion Wikipedia Relevance Set, Top R = 100 Q Working Set, Top W = 1000 …

Slide 44: Wikipedia Link-Based Expansion Wikipedia Relevance Set, Extract anchor text from Top R = 100 Working Set that link to the Relevance Set. Q Working Set, Top W = 1000 …

Slide 45: Wikipedia Link-Based Expansion Wikipedia Relevance Set, Extract anchor text from Combines relevance and popularity Top R = 500 Working Set that link to the Relevance Set. Relevance: An anchor phrase that links to a high ranked article gets a high score Popularity: An anchor phrase that links many times to a mid- Q ranked articles also gets high score Working Set, Top W = 1000 …

Slide 46: Query Expansion Example [Photography] PRF Wikipedia Link-Based Ideal photography photography digital photography nude photographer erotic digital photography depth of field art photographic girl depth of field photographic film free feature photography teen film photojournalism fashion photographic film women photojournalism cinematography

Slide 47: Feedback Model Results None PRF Wiki. PRF Wiki. Link 0.4 Mean Average Precision 0.36 0.32 0.28 0.24 0.2 BLOG LD BLOG SD

Slide 48: Conclusion • Feed Search Challenges: – Feeds are topically diverse, noisy collections – Ranked against ongoing & general information needs • Novel Retrieval Models: – Ranking collections, sensitive to topical relationship among entries • Novel Feedback Models: – Discover multiple query facets & robust to collection noise

Slide 49: Thank You! Student Travel Grant funding from: ACM SIGIR, Amit Singhal, Microsoft Research

Slide 50: Entry Centrality GM Derivation Entry Generation Likelihood: |E| where

Slide 51: Query Expansion Examples [Music] Wikipedia Expansion PRF Music Music Folk music Country Electronic music Download Folk Free Music video MP3 World music Mp3andmore Ambient Lyric Electronic Listen Country music Song

Slide 52: Query Expansion Examples [Scottish Independence] Wikipedia Expansion PRF scotland scotland scottish parliament independence scottish party scottish national party convention wars of scottish independence politics scottish independence snp william wallace national glasgow people scottish socialist party scot

Slide 53: Query Expansion Examples [Machine Learning] Wikipedia Expansion PRF machine learning learn learning machine artificial intelligence credit turing machine card machine gun karaoke neural network journal support vector machine sex supervised learning model artificial neural network sew

Slide 54: Query Generality Characteristics • Query Length: – BLOG: 1.9 words – TB04: 3.2 words – TB05: 3.0 words • ODP Depth – BLOG: 4.7 levels – TB04: 5.2 levels – TB05: 5.3 levels

Slide 55: Relevance Set Cohesiveness Wikipedia Relevance Set, Top R = 100 Cohesiveness = | Lin | | Lin U Lout | …

Slide 56: Relevant Set Cohesiveness

Slide 57: Is it the Queries? Feed Search Queries ≠ TB Adhoc Queries But, none of these measures predict whether wikipedia expansions helps…