Your SlideShare is downloading. ×
Trec2009blog overview v9
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Trec2009blog overview v9


Published on

Published in: Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • Idea of task is groups identify features for ranking wrt to a facet inclination
  • All facets assumed to have binary inclinations for operational simplicity
  • For each facet: 2 rankings -> 6 rankings39 queries: each query has two ranking inclination -> 78 AP
  • Transcript

    • 1. Overview of theTREC 2009 Blog Track
      Iadh Ounis, Craig Macdonald, Ian
    • 2. Outline
      Blog Track: Background
      TREC Blog Track 2009 Overview
      • Blogs08 collection
      • 3. Faceted blog distillation task
      • 4. Top stories identification task
    • 5. Blog Track @ TREC
      Introduced in TREC 2006
      • Explores the information seeking behaviour in the blogosphere
      The Blog track adopted an incremental approach
      • From core and simple retrieval tasks to more complex search scenarios
      Thus far, two main search tasks have been addressed:
      • Opinion-finding task [2006-2008]
      “Find meposts about what people think of X”
      • Blog distillation task [2007-2008]
      “Find me blogswith a principle, recurring interest in X”
    • 6. Blog Track 2009
      In 2009, the Blog track has been markedly revamped
      • Addresses more refined and complex search scenarios using a larger sample of the blogosphere
      An up-to-date sample of the blogosphere: Blogs08
      • One order of magnitude larger than the older Blogs06 (28M posts, 1.3M feeds)
      • 7. A much longer timespan: 13 months from Jan 08 to Feb 09
      Two new search tasks:
      • Faceted blog distillation
      Addresses the quality aspect of the retrieved blogs
      • Top stories identification task
      Addresses the news-related dimension of the blogosphere
    • 8. The New Blogs08 Collection
      Crawled from the blogosphere over a 13-month period from 14th Jan 08 to 10th Feb 09
      • Includes spam, non-English documents, and non-blogs
      Facilitates addressing the temporal/chronological aspect of the blogosphere
      • e.g. news and filtering tasks
      Follow a similar structure to the older Blogs06 collection:
      • 808GB feeds (>1.3M blogs)
      • 9. 1445GB permalinks (28M documents)
      A single post and its comments
      • 56GB homepages
      Created by the Univ. of Glasgow and distributed since April 2009
    • 10. Outline
      Blog Track: Background
      TREC Blog Track 2009 Overview
      • Blogs08 collection
      • 11. Faceted blog distillation task
      • 12. Top stories identification task
    • 13. Blog Distillation Task
      Blog search users often wish to identify blogs about a given topic
      • They can subscribe to and read on a regular basis
      Filtering: Subscribe to a repeated search in their RSS reader
      Distillation: add blog feeds with a recurring central interest to their RSS reader
      Blog distillation task [2007-2008]
      • “Find me a blog with a principle, recurring interest in X”
      The TREC 2007 and 2008 incarnations focused on topical relevance
      • The task did not address the “quality” aspect of the retrieved blogs
    • 14. Faceted Blog Search
      New task mimics an exploratory search task
      • “Find me a quality blog to follow/read about X”
      • 15. Quality aspect is addressed through the use of facets in the search interface (Hearst et al., SSM 2008)
      Faceted search allows the users to explore the attributes of those blogs they might wish to follow and read:
      • In-depth/shallow analysis
      • 16. Humouristic/serious style
      • 17. Expert/novice viewpoint
      • 18. etc.
    • 19. Task Definition
      For operationalising at TREC
      • Each topic has a facet of interest attached to it
      • 20. Blogs do not have facet attributes
      For TREC 2009, we used an initial set of 3 facets of varying difficulty:
      • Opinionated: ‘opinionated’ vs ‘factual’ blogs
      • 21. Personal: ‘personal’ vs. ‘official’ blogs
      • 22. Indepth: ‘in-depth’ vs. ‘shallow’ blogs
      The use of the Opinionated facet allowed to leverage past track work on opinion-finding
      } binary
    • 23. Topics
      One appropriate facet added to each topic
      <query> hugo chavez </query>
      <desc> I am looking for blogs that talk about Venezuelan
      president Hugo Chavez and his politics. </desc>
      <facet> indepth </facet>
      <narr>I want to follow blogs that talk about Hugo Chavez,
      the president of Venezuela. Blogs that follow his role in
      Venezuelan politics are relevant, as well as those that
      discuss non-political stories and activities. I am more
      interested in blogs about Chavez than blogs about
      Venezuelan politics generally.</narr>
      50 new topics were created by TREC assessors:
    • 26. Runs
      Retrieval unit:
      • Blogs from the Feeds component of Blogs08
      For each topic, a run consists of three rankings of 100 blogs:
      • One with the 1st inclination of facet enabled
      • 27. One with the 2nd inclination of facet enabled
      • 28. One with no facet inclination enabled (akin to topic-relevance baseline)
      Example: For a topic with Personal facet
      • 1st ranking should have 100 ‘personal’ blogs
      • 29. 2nd ranking should have 100 ‘official’ blogs
      • 30. 3rd ranking should have 100 relevant blogs
    • 31. Assessment Procedure
      How does one assess a blog?
      • By reading some of its posts
      Assessment scale:
      • [0]: Not relevant
      • 32. [1]: Relevant but not clearly inclined to a facet inclination
      • 33. [2]: Relevant and clearly inclined towards the 1st facet inclination (opinionated, personal, indepth)
      • 34. [3]: Relevant and clearly inclined towards the 2nd facet inclination (factual, official, shallow)
      Topic-relevance baseline runs
      • Measure using NR={0}, R={1,2,3}
      Faceted blog search runs
      • Measure using NR={0,1}, R={2|3}
      • 35. Measure MAP for all facet inclination rankings (2 inclinations for each topic)
    • 36. Runs and Pooling
      Each group permitted up to 4 runs
      • 9 groups took part in the faceted blog distillation task
      • 37. 29 submitted runs, including 24 title-only runs
      • 38. All runs pooled (and all 3 rankings in each run) to depth 30
    • 39. Overview of Results
      Baseline retrieval performances are lower than expected
      • 96% of the pooled blogs were judged irrelevant
      Facet performances are low
      • Performance across facets differs
      • 40. E.g. Indepth vs Opinionated
      Task complexity, early-stage techniques, or difficult topics?
    • 41. Baseline runs results: 39 topics; Top 5 Groups; Title-only (ranked by MAP)
      Topic relevance model and expansion using terms from <desc> and <narr> topic fields.
      Blog posts ranked using BM25, then scores aggregated to blogs
      Fuzzy aggregation methods to combine regularized blog posts scores into blog scores.
      • Most of the groups indexed only the Permalinks components of Blogs08
      • 42. Almost all deployed retrieval techniques scored a blog based on the scores of its corresponding relevant posts
    • 43. Faceted blog search runs results: 39 topics; Top 5 Groups; Ranked by ALL (MAP)
      Indepth facet: posts scored using Cross Entropy. For other facets: Mutual Information is used to weight terms in posts, using various lexicons.
      Did not attempt faceted search. Post scores are altered using temporal information before being aggregated into blog scores.
      Learned a classifier for the Indepth facet. For other facets, they used heuristics to score blog posts before aggregation.
      • Faceted search proved to be particularly challenging
      • 44. For all groups, and in almost all cases: Applying faceted search leads to a decrease in performance viz. the faceted performance of the baseline ranking
    • 45. Outline
      Blog Track: Background
      TREC Blog Track 2009 Overview
      • Blogs08 collection
      • 46. Faceted blog distillation task
      • 47. Top stories identification task
    • 48. Top Stories Identification Task
      Many blog search engine queries are news-related
      New task’s main research question:How well does the blogosphere respond to real-world events?
      Facilitated by the Blogs08 test collection – 54 weeks in length, including
    • 51. Task Definition
      Federal takeover of Fannie Mae and Freddie Mac
      For a given unit of time (“query date”), identify the top news stories on that date
      • And also identify some related blog posts to the headline, covering its various/diverse aspects
      News stories are represented by headlines broadcast by NY Times
      • For entire timespan of Blogs08
      • 52. Distributed with kind permission of NYT

    • 53. Task Details
      Example Query :
      <num> TS09-33 </num>
      <date> 2008-08-25 </date>
      Provide a ranking of news headlines in range <date> ± 1
      • e.g. If a story happens early on day d in Europe, it will be reported by an American broadcaster (NYT) on day d-1
      For each ranked news headline, suggest relevant, diverse blog posts
      • Relevant blog posts may occur anytime after the date of the event
      The task is of Retrospective Event Detection (RED) type
    • 54. Topic Development
      The organisers selected 55 dates as topics
      • Covering various global, political, economics, cultural, sports and technology events
      These included dates related to events such as:
    • 61. Runs and Assessments
      A run consists of a ranking of 100 headlines, each supported by up to 10 diverse blog posts
      • Runs use the SUPPORTing run format developed for the Enterprise track expert search task
      • 62. 25 runs by 7 groups: pooled top 20 headlines from each run
      Two phases of participant community judging:
      • Top news story judging: Identify important news stories for each day
      • 63. Blog post judging: Identify relevant and diverse blog posts for relevant headlines
    • 64. Phase 1: Top News Story Judging
      We asked assessors to take the role of a newspaper editor
      • What stories would they put on the front page of a newspaper or news website?
      • 65. Assess whether the headline actually occurred on the query day, and judge each headline story as “Important” or “Not Important”
      • 66. Could consider their own recollection of events, or refer to external Web resources
      Editorial factors to consider: Timing, Significance, Prominence, Human Interest, Proximity
      Interface provided pool of headlines to judge, headline and snippet of story, and link to actual NYT news article
    • 67. Phase 2: Blog Post Judging
      Once headlines were judged, important ones were sampled for which to perform blog post judging
      • 2-phase judging avoids judging blog posts at the same time as judging headline
      • 68. Assessors only have to read blog posts for judged important headlines
      Blog posts were judged “Relevant” or “Not Relevant” to the headline
      When judging, assessors defined “aspects” to group relevant blog posts
      • e.g. for a headline on the Oscars, the assessor defined aspects such as “liveblogs”, “factual”, “opinionated”, “accuracy of predictions”
      • 69. Aspects are used during diversity evaluation
    • 70. Relevance Assessments
      Top news story identification was hard:
      Blog post judging, less so:
      Result reporting in two phases: Top news story identification, then diverse blog post retrieval
    • 71. Identifying Top News Stories
      • All 25 submitted runs were automatic
      • 72. Task was fairly difficult: retrieval performances were rather low
    • 73. Identifying Top News Stories: Runs
      Voting Model: Number of blog posts mentioning a headline.
      Probabilistic: Combination of query generating headline probability and headline prior calculated from time- or term-based evidence
      Two probabilistic approaches: news to blogs or blogs to news.
      • All groups indexed only the Permalinks component of Blogs08 (exceptions are UAms & USI)
    • 74. Identifying Blog Posts
      • Runs with high top story recall have more chance to identify relevant blog posts
      • 75. Moreover, systems found identifying blog posts for a headline easier
      • 76. Evaluation measures are diversity-based, from the Web track:
      α-NDCG@10 (α=0.5)
      See Charlie’s talk for Web track
    • 77. Identifying Blog Posts: Runs
      Divergence From Randomness DPH ranking and MMR
      Latent Dirichlet Relevance Model, but applied no diversification
      • Means calculated over all 258 judged headlines
      • 78. However, ranking of runs not identical to top story identification evaluation
      • 79. Some swaps between groups, and between runs for a given group
    • 80. Conclusions
      In 2009, the Blog track has been markedly revamped
      • Two new pilot search tasks that go beyond topical relevance and simple adhoc retrieval
      The results on both tasks confirm the complexities of faceted blog search and top stories identification
      • There is a large scope for further research and improvements
      Blog track will run in 2010
      • Same tasks
      • 81. … but with a few proposed refinements intended to facilitate research into considering the blogosphere as a time stream
      More at the Blog track workshop on Friday