Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Blog track


Published on

Presentation for Information Retrieval class

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

Blog track

  1. 1. Blog Track Research at TREC (Craig Macdonald, Rodrygo L.T. Santos, Iadh Ounis and Ian Soboroff) presented by   Anil Kumar Attuluri                       11/14/2011                 
  2. 2. Outline <ul><ul><li>Motivation </li></ul></ul><ul><ul><li>Background </li></ul></ul><ul><ul><li>Tasks and Approaches </li></ul></ul><ul><ul><li>Conclusion </li></ul></ul>
  3. 3. Motivation
  4. 4. User Generated Content <ul><li>  </li></ul><ul><ul><li>User Generated Content (UGC) has become quite common with evolution of simple tools to create and publish the content online. </li></ul></ul><ul><li>  </li></ul><ul><ul><li>It takes the form of a Blog - a chronologically arranged journal. </li></ul></ul><ul><li>  </li></ul><ul><ul><li>Huge size of the blog sites. Large amounts of data. </li></ul></ul><ul><li>     </li></ul>
  5. 5. Information seeking behavior <ul><ul><li>Positive or Negative 'Buzz' about a product released in the market - Opinion finding </li></ul></ul><ul><li>  </li></ul><ul><ul><li>Finding blogs on topic of interest rather than just using relevance queries on blog posts - Blog distillation </li></ul></ul><ul><li>  </li></ul><ul><ul><li>Deciding automatically on which top stories to show is an important problem for online news papers - Top news </li></ul></ul>
  6. 6. Background
  7. 7. Blog data <ul><li>Feeds </li></ul><ul><li>      A web feed (or news feed ) is a data format used for   </li></ul><ul><li>      providing users with frequently updated content. </li></ul><ul><li>      </li></ul><ul><li>Permalinks </li></ul><ul><li>     The permanent URLs to your individual weblog posts, as  </li></ul><ul><li>      well as categories and other lists of weblog postings. </li></ul><ul><li>  </li></ul><ul><li>Homepages </li></ul><ul><li>      The main entry to the blog. </li></ul>
  8. 8. RSS and Atom XML <ul><li>RSS (Really Simple Syndication) </li></ul><ul><li>     </li></ul>
  9. 9. RSS and Atom XML <ul><li>Atom ( Atom Syndication Format ) XML  </li></ul><ul><li>     </li></ul>
  10. 10. Tasks
  11. 11. Opinion-Finding Tasks <ul><li>Task Definition </li></ul><ul><li>          </li></ul><ul><ul><li>Aims to identify blog posts expressing an opinion about a given target. </li></ul></ul><ul><ul><li>Two important aspects: i) relevance ii) opinionatedness  </li></ul></ul><ul><li>  </li></ul><ul><ul><li>Relevance assessments are at the blog post level, and detail whether the blog post is relevant.  </li></ul></ul><ul><li>  </li></ul><ul><ul><li>Determine what opinion was expressed towards the target entity. </li></ul></ul>
  12. 12. Opinion-Finding Tasks <ul><li>  Polarity Sub-Task </li></ul><ul><ul><li>Predict whether a document expresses a positive or a negative opinion about a blog post. </li></ul></ul><ul><li>  </li></ul><ul><ul><li>TREC 2007 - Classification task: for each of the retrieved documents, participants should predict the polarity of the document.  </li></ul></ul><ul><li>  </li></ul><ul><ul><li>TREC 2008 - Ranking task: only blog posts which are relevant to the topic, and express a positive opinion are retrieved. </li></ul></ul>
  13. 13. Opinion-Finding Tasks <ul><li>  Approaches </li></ul><ul><li>  </li></ul><ul><ul><li>Classification-based Opinion-Finding </li></ul></ul><ul><li>  </li></ul><ul><li>     - Train SVM classifier with data obtained from two consumer </li></ul><ul><li>         review websites: RateItAll3 and Use trained SVM </li></ul><ul><li>         for finding subjectiveness in blog posts. (Zhang et al.) </li></ul><ul><li>     - Use OpinionFinder , a subjectivity analysis system aimed to </li></ul><ul><li>         support NLP applications, to provide information about opinions  </li></ul><ul><li>         expressed in text and also about who expressed them. (He et al.) </li></ul>
  14. 14. Opinion-Finding Tasks <ul><li>  Approaches </li></ul><ul><li>  </li></ul><ul><ul><li>Lexicon-based Opinion-Finding </li></ul></ul><ul><li>     </li></ul><ul><li>     - Kullback-Leibler (KL) divergence </li></ul><ul><li>     - Subjective lexicon is automatically derived from the </li></ul><ul><li>        target collection.      </li></ul><ul><li>     - Language modeling approach </li></ul><ul><li>  </li></ul><ul><li>     - Generative model of words from distinct resources </li></ul><ul><li>     - Single stage opinion finding methods </li></ul>
  15. 15. Blog Distillation <ul><li>  Task Definition     </li></ul><ul><ul><li>Suggest relevant blogs in response to a query. Ex: all blogs on American football. </li></ul></ul><ul><li>  </li></ul><ul><ul><li>Unlike opinion finding, blogs are retrieved and not the blog posts. </li></ul></ul><ul><li>  </li></ul><ul><ul><li>Binary relevant assessment was used in TREC 2007 and three levels of relevance (not relevant, relevant, highly relevant) assessment were used in TREC 2008 </li></ul></ul>
  16. 16. Blog Distillation <ul><li>Faceted Task Definition </li></ul><ul><ul><li>Aims at addressing the 'quality' aspect of the retrieved blogs. </li></ul></ul><ul><li>  </li></ul><ul><ul><li>Retrieve a ranking of blogs having a recurring and principal interest in a given topic which also satisfies the active facet inclination(s).  </li></ul></ul><ul><li>  </li></ul><ul><ul><li>Facets </li></ul></ul><ul><li>     - Opinionated: ‘opinionated’  vs.  ‘factual’ blogs </li></ul><ul><li>      - Personal: 'personal' vs 'official' blogs </li></ul><ul><li>      - In-depth: ‘indepth’ vs. ‘shallow’ blogs   </li></ul><ul><li>   </li></ul>
  17. 17. Blog Distillation <ul><li>Approaches   </li></ul><ul><ul><li>Blog Distillation as a Resource Selection Problem </li></ul></ul><ul><li>  </li></ul><ul><li>      - Large Document (LD) vs Small Document (SD) models. </li></ul><ul><li>       - Query expansion mechanism based on external resource. </li></ul><ul><li>       - Global Representation(GR) and Pseudo Cluster Selection(PCS) </li></ul><ul><li>      - Global Evidence Model(GEM) vs Local Evidence Model (LEM) </li></ul>
  18. 18. Blog Distillation <ul><li>Approaches   </li></ul><ul><ul><li>Blog Distillation as an Expert Search Problem </li></ul></ul><ul><li>  </li></ul><ul><li>      - Voting Model(VM) : based on the notion of blogger profiles </li></ul><ul><li>       -  Blogger Model(BM) and Posting Model(PM) </li></ul><ul><li>  </li></ul><ul><li>       -   Ordered Weighted Average(OWA) operator </li></ul>
  19. 19. Top News <ul><li>  Task Definition </li></ul><ul><ul><li>Suggest a ranking of news articles where the most </li></ul></ul><ul><li>      important, “news-worthy” articles will be ranked first. </li></ul><ul><li>  </li></ul><ul><ul><li>Compare the ranked articles to the news articles which were deemed to be editorially important on each given day. </li></ul></ul><ul><li>  </li></ul><ul><ul><li>Perform evaluation in two stages with traditional IR approaches followed by assessing topical relevance. </li></ul></ul>
  20. 20. Top News <ul><li>  Approaches </li></ul><ul><ul><li>   Voting Model (McCreadie et al) </li></ul></ul><ul><li>       - blog posts are ranked for each headline. </li></ul><ul><li>          - importance of each headline on that day is inferred based on </li></ul><ul><li>            the number of retrieved posts for each headline. </li></ul><ul><ul><li>  Language Modeling Approach(Lee at al) </li></ul></ul><ul><li>        - use clustering to create multiple topic models for a day. </li></ul><ul><li>          - compare these to a headline model generated from the top   </li></ul><ul><li>            retrieved blog posts for that headline. </li></ul><ul><li>     </li></ul><ul><li>      </li></ul>
  21. 21. Conclusion
  22. 22. Conclusion <ul><li>         </li></ul><ul><ul><li>Research in the areas of social search and blog search is increasingly becoming important. </li></ul></ul><ul><li>  </li></ul><ul><ul><li>The Blog track has played an important role in initiating research, creating resources and facilitating the formation of a community of researchers for tackling multi-disciplinary search tasks.  </li></ul></ul><ul><li>  </li></ul><ul><ul><li>Block track influenced and created a sustainable platform for research related to blog search. </li></ul></ul><ul><li>     </li></ul><ul><li>      </li></ul>
  23. 23. Thank You!