Retrieval and Feedback Models for Blog Feed Search

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    1 Favorite

    Retrieval and Feedback Models for Blog Feed Search - Presentation Transcript

    1. Retrieval and Feedback Models for Blog Feed Search SIGIR 2008 Singapore Jonathan Elsas, Jaime Arguello, Jamie Callan & Jaime Carbonell LTI/SCS/CMU
    2. Outline
      • The task
        • Overview of Blogs & Blog Search
        • Challenges in Blog Search
      • Our approach
        • Retrieval Models
        • Query Expansion Models
      • Conclusion
    3. Background
    4. What is a Blog?
    5. What is a Feed? <xml> <feed> <entry> <author>Peter …</> <title>Good, Evil…</> <content>I’ve said…</> </entry> <entry> <author>Peter …</> <title>Agreeing…</> <content>Some peo…</> </entry> …
    6. Blog-Feed Correspondence Blog Feed Post Entry HTML XML
    7. Why are Blogs important?
      • Technorati currently tracking > 112.8 Million Blogs > 175,000 new Blogs per day > 1.6 Million posts per day
      [http://www.technorati.com/about/]
    8. The Task
    9. Feed Search at TREC
      • Ranking Blogs/Feeds (collections of posts) in response to a user’s query, [X]
      • “ A relevant feed should have a principle and recurring interest in X ”
      • — TREC 2007 Blog Track
      (a.k.a. Blog Distillation)
    10. Feed Search at TREC
      • [Gardening]
      • [Apple iPod]
      • [Violence in Sudan]
      • [Gun Control]
      • [Food]
      • [Wine]
      Represent Ongoing Information Needs Frequently Very General
    11. Challenges in Feed Search
    12. Challenges in Feed Search
      • A feed is a collection of documents
      entries time feed
      • A feed is a collection of documents
        • How does relevance at the entry level correspond to relevance at the feed level?
      Challenges in Feed Search entries time feed
    13. Challenges in Feed Search
      • 2. Even a topical feed is topically diverse
      time Space Exploration topic NASA China’s plans for the moon shuttle launch My dog Mars rover Boeing
    14. Challenges in Feed Search
      • 2. Even a topical feed is topically diverse
        • Can we favor entries close to the central topic of the feed?
      Space Exploration time topic
    15. Challenges in Feed Search
      • 3. Feeds are noisy
        • Spam blogs, Spam & off topic comments
      time
    16. Challenges in Feed Search
      • 4. General & Ongoing Information Needs
      [Mac] [Music] [Food] [Wine] … post regularly about new products , features , or application software of Apple Mac computers. … describing songs , biographies of musicians, musical styles and their influences of music on people are discussed. … such as tastings , reviews , food matching or pairing , and oenophile news and events . … describing experiences eating cuisines, culinary delights , recipes , nutrition plans .
    17. Our Approach
    18. Feeds:
      • Topically Diverse
      • Noisy
      • Collections
      Information Needs: General & Ongoing Challenges Our Approach Retrieval Models Feedback Models
    19. Retrieval Models
      • Challenge: ranking topically diverse collections
      • Representation: feed vs. entry
      • Model topical relationship between entries
    20. Large Document (Feed) Model [Q] <?xml… … </…> `<?xml… … </…> <?xml… … </…> <?xml… <feed> <entry> <entry> <entry> <entry> <entry> … </…> <?xml… … </…> <?xml… … </…> <?xml… … </…> <?xml… <feed> <entry> <entry> <entry> <entry> <entry> … </…> Feed Document Collection Ranked Feeds Rank by Indri’s standard retrieval model [Metzler and Croft, 2004; 2005]
    21. Large Document (Feed) Model
      • Advantages:
      • A straightforward application of existing retrieval techniques
      • Potential Pitfalls:
      • Large entries dominate a feed’s language model
      • Ignores relationship among entries
      Feed Entry E E Entry Entry E
    22. Small Document (Entry) Model Ranked Entries [Q] <entry> <entry> <entry> <entry> <?xml… <entry> Entry Document Collection <entry> <entry> <entry> <entry> <?xml… <entry> <entry> <entry> <entry> <entry> <?xml… <entry> <entry> <entry> <entry> <entry> <?xml… <entry> <entry> <entry> <entry> <entry> <?xml… <entry> <entry> <entry> <entry> <entry> <?xml… <entry> <entry> <entry> <entry> <entry> <?xml… <entry> Ranked Feeds document = entry Apply some rank aggregation function Rank By
    23. Small Document (Entry) Model
      • Query Likelihood
      • Entry Centrality
      • Feed Prior: favors longer feeds
      ReDDE Federated Search Algortihm [Si & Callan, 2003]
    24. Entry Centrality
      • Uniform :
      • Geometric Mean :
      time topic
    25. Small Document (Entry) Model
      • Advantages:
        • Controls for differing entry length
        • Models topical relationship among entries
      • Disadvantages:
        • Centrality computation is slow(er)
      Not only improves speed, Also performance Q
    26. Retrieval Model Results
    27. Retrieval Model Results
      • 45 Queries from the TREC 2007 Blog Distillation Task
      • BLOG06 test collection, XML feeds only
      • 5-Fold Cross Validation for all retrieval model smoothing parameters
    28. Retrieval Model Results Mean Average Precision Large Document (Feed) Model Small Document (Entry) Models
    29. Retrieval Model Results Mean Average Precision Uniform Log(Feed Length) Uniform Log Prior Map 0.188
    30. Retrieval Model Results Mean Average Precision Uniform Log(Feed Length) Uniform n/a
    31. Feedback Models
      • Challenge: Noisy collection with general & ongoing information needs
      • Use a cleaner external collection for query expansion (Wikipedia)
      • With an expansion technique designed to identify multiple query facets
    32. Query Expansion (PRF) [Q] BLOG06 Collection Related Terms from top K documents [Q + Terms] [Lavrenko & Croft, 2001]
    33. Query Expansion Example
      • Ideal
      • digital photography
      • depth of field
      • photographic film
      • photojournalism
      • cinematography
      [Photography] PRF photography nude erotic art girl free teen fashion women
    34. Feedback Model Results Mean Average Precision None PRF
    35. Query Expansion (Wikipedia PRF) [Q] BLOG06 Collection [Q + Terms] [Lavrenko & Croft, 2001] Wikipedia [Diaz & Metzler, 2006] Related Terms from top K documents
    36. Query Expansion Example
      • Ideal
      • digital photography
      • depth of field
      • photographic film
      • photojournalism
      • cinematography
      [Photography] PRF photography nude erotic art girl free teen fashion women Wikipedia PRF photography director special film art camera music cinematographer photographic
    37. Feedback Model Results Mean Average Precision None PRF Wiki. PRF
    38. Query Expansion (Wikipedia Link) [Q] BLOG06 Collection [Q + Terms] Wikipedia Related Terms from link structure
    39. Wikipedia Link-Based Query Expansion
    40. Wikipedia Link-Based Expansion Wikipedia … Q
    41. Wikipedia Link-Based Expansion … Relevance Set, Top R = 100 Working Set, Top W = 1000 Q Wikipedia
    42. Wikipedia Link-Based Expansion … Wikipedia Q Relevance Set, Top R = 100 Working Set, Top W = 1000
    43. Wikipedia Link-Based Expansion Relevance Set, Top R = 100 Working Set, Top W = 1000 … Wikipedia Extract anchor text from Working Set that link to the Relevance Set . Q
    44. Wikipedia Link-Based Expansion Relevance Set, Top R = 500 Working Set, Top W = 1000 … Wikipedia Extract anchor text from Working Set that link to the Relevance Set . Q Combines relevance and popularity Relevance: An anchor phrase that links to a high ranked article gets a high score Popularity: An anchor phrase that links many times to a mid-ranked articles also gets high score
    45. Query Expansion Example
      • Wikipedia Link-Based
      • photography
      • photographer
      • digital photography
      • photographic
      • depth of field
      • feature photography
      • film
      • photographic film
      • photojournalism
      [Photography] PRF photography nude erotic art girl free teen fashion women Ideal digital photography depth of field photographic film photojournalism cinematography
    46. Feedback Model Results Mean Average Precision None PRF Wiki. PRF Wiki. Link
    47. Conclusion
      • Feed Search Challenges:
        • Feeds are topically diverse, noisy collections
        • Ranked against ongoing & general information needs
      • Novel Retrieval Models:
        • Ranking collections, sensitive to topical relationship among entries
      • Novel Feedback Models:
        • Discover multiple query facets & robust to collection noise
    48. Thank You! Student Travel Grant funding from: ACM SIGIR, Amit Singhal, Microsoft Research
    49. Entry Centrality GM Derivation where Entry Generation Likelihood: |E|
    50. Query Expansion Examples
      • Wikipedia Expansion
      • Music
      • Folk music
      • Electronic music
      • Folk
      • Music video
      • World music
      • Ambient
      • Electronic
      • Country music
      [Music] PRF Music Country Download Free MP3 Mp3andmore Lyric Listen Song
    51. Query Expansion Examples
      • Wikipedia Expansion
      • scotland
      • scottish parliament
      • scottish
      • scottish national party
      • wars of scottish independence
      • scottish independence
      • william wallace
      • glasgow
      • scottish socialist party
      [Scottish Independence] PRF scotland independence party convention politics snp national people scot
    52. Query Expansion Examples
      • Wikipedia Expansion
      • machine learning
      • learning
      • artificial intelligence
      • turing machine
      • machine gun
      • neural network
      • support vector machine
      • supervised learning
      • artificial neural network
      [Machine Learning] PRF learn machine credit card karaoke journal sex model sew
    53. Query Generality Characteristics
      • Query Length:
        • BLOG: 1.9 words
        • TB04: 3.2 words
        • TB05: 3.0 words
      • ODP Depth
        • BLOG: 4.7 levels
        • TB04: 5.2 levels
        • TB05: 5.3 levels
    54. Relevance Set Cohesiveness … Relevance Set, Top R = 100 Wikipedia Cohesiveness = | L in | | L in U L out |
    55. Relevant Set Cohesiveness
    56. Is it the Queries?
      • Feed Search Queries
      • TB Adhoc Queries
      But, none of these measures predict whether wikipedia expansions helps…

    + jelsasjelsas, 2 years ago

    custom

    1777 views, 1 favs, 2 embeds more stats

    SIGIR 2008 Presentation

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 1777
      • 1738 on SlideShare
      • 39 from embeds
    • Comments 0
    • Favorites 1
    • Downloads 15
    Most viewed embeds
    • 24 views on http://windowoffice.tumblr.com
    • 15 views on http://www.searchenginecaffe.com

    more

    All embeds
    • 24 views on http://windowoffice.tumblr.com
    • 15 views on http://www.searchenginecaffe.com

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories