• Save
Blog track
Upcoming SlideShare
Loading in...5
×
 

Blog track

on

  • 307 views

Presentation for Information Retrieval class

Presentation for Information Retrieval class

Statistics

Views

Total Views
307
Views on SlideShare
307
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Blog track Blog track Presentation Transcript

  • Blog Track Research at TREC (Craig Macdonald, Rodrygo L.T. Santos, Iadh Ounis and Ian Soboroff) presented by   Anil Kumar Attuluri                       11/14/2011                 
  • Outline
      • Motivation
      • Background
      • Tasks and Approaches
      • Conclusion
  • Motivation
  • User Generated Content
    •  
      • User Generated Content (UGC) has become quite common with evolution of simple tools to create and publish the content online.
    •  
      • It takes the form of a Blog - a chronologically arranged journal.
    •  
      • Huge size of the blog sites. Large amounts of data.
    •     
  • Information seeking behavior
      • Positive or Negative 'Buzz' about a product released in the market - Opinion finding
    •  
      • Finding blogs on topic of interest rather than just using relevance queries on blog posts - Blog distillation
    •  
      • Deciding automatically on which top stories to show is an important problem for online news papers - Top news
  • Background
  • Blog data
    • Feeds
    •       A web feed (or news feed ) is a data format used for  
    •       providing users with frequently updated content.
    •      
    • Permalinks
    •      The permanent URLs to your individual weblog posts, as 
    •       well as categories and other lists of weblog postings.
    •  
    • Homepages
    •       The main entry to the blog.
  • RSS and Atom XML
    • RSS (Really Simple Syndication)
    •     
  • RSS and Atom XML
    • Atom ( Atom Syndication Format ) XML 
    •     
  • Tasks
  • Opinion-Finding Tasks
    • Task Definition
    •          
      • Aims to identify blog posts expressing an opinion about a given target.
      • Two important aspects: i) relevance ii) opinionatedness 
    •  
      • Relevance assessments are at the blog post level, and detail whether the blog post is relevant. 
    •  
      • Determine what opinion was expressed towards the target entity.
  • Opinion-Finding Tasks
    •   Polarity Sub-Task
      • Predict whether a document expresses a positive or a negative opinion about a blog post.
    •  
      • TREC 2007 - Classification task: for each of the retrieved documents, participants should predict the polarity of the document. 
    •  
      • TREC 2008 - Ranking task: only blog posts which are relevant to the topic, and express a positive opinion are retrieved.
  • Opinion-Finding Tasks
    •   Approaches
    •  
      • Classification-based Opinion-Finding
    •  
    •      - Train SVM classifier with data obtained from two consumer
    •          review websites: RateItAll3 and Epinions.com. Use trained SVM
    •          for finding subjectiveness in blog posts. (Zhang et al.)
    •      - Use OpinionFinder , a subjectivity analysis system aimed to
    •          support NLP applications, to provide information about opinions 
    •          expressed in text and also about who expressed them. (He et al.)
  • Opinion-Finding Tasks
    •   Approaches
    •  
      • Lexicon-based Opinion-Finding
    •     
    •      - Kullback-Leibler (KL) divergence
    •      - Subjective lexicon is automatically derived from the
    •         target collection.     
    •      - Language modeling approach
    •  
    •      - Generative model of words from distinct resources
    •      - Single stage opinion finding methods
  • Blog Distillation
    •   Task Definition    
      • Suggest relevant blogs in response to a query. Ex: all blogs on American football.
    •  
      • Unlike opinion finding, blogs are retrieved and not the blog posts.
    •  
      • Binary relevant assessment was used in TREC 2007 and three levels of relevance (not relevant, relevant, highly relevant) assessment were used in TREC 2008
  • Blog Distillation
    • Faceted Task Definition
      • Aims at addressing the 'quality' aspect of the retrieved blogs.
    •  
      • Retrieve a ranking of blogs having a recurring and principal interest in a given topic which also satisfies the active facet inclination(s). 
    •  
      • Facets
    •      - Opinionated: ‘opinionated’  vs.  ‘factual’ blogs
    •       - Personal: 'personal' vs 'official' blogs
    •       - In-depth: ‘indepth’ vs. ‘shallow’ blogs  
    •   
  • Blog Distillation
    • Approaches  
      • Blog Distillation as a Resource Selection Problem
    •  
    •       - Large Document (LD) vs Small Document (SD) models.
    •        - Query expansion mechanism based on external resource.
    •        - Global Representation(GR) and Pseudo Cluster Selection(PCS)
    •       - Global Evidence Model(GEM) vs Local Evidence Model (LEM)
  • Blog Distillation
    • Approaches  
      • Blog Distillation as an Expert Search Problem
    •  
    •       - Voting Model(VM) : based on the notion of blogger profiles
    •        -  Blogger Model(BM) and Posting Model(PM)
    •  
    •        -   Ordered Weighted Average(OWA) operator
  • Top News
    •   Task Definition
      • Suggest a ranking of news articles where the most
    •       important, “news-worthy” articles will be ranked first.
    •  
      • Compare the ranked articles to the news articles which were deemed to be editorially important on each given day.
    •  
      • Perform evaluation in two stages with traditional IR approaches followed by assessing topical relevance.
  • Top News
    •   Approaches
      •    Voting Model (McCreadie et al)
    •        - blog posts are ranked for each headline.
    •           - importance of each headline on that day is inferred based on
    •             the number of retrieved posts for each headline.
      •   Language Modeling Approach(Lee at al)
    •         - use clustering to create multiple topic models for a day.
    •           - compare these to a headline model generated from the top  
    •             retrieved blog posts for that headline.
    •     
    •      
  • Conclusion
  • Conclusion
    •         
      • Research in the areas of social search and blog search is increasingly becoming important.
    •  
      • The Blog track has played an important role in initiating research, creating resources and facilitating the formation of a community of researchers for tackling multi-disciplinary search tasks. 
    •  
      • Block track influenced and created a sustainable platform for research related to blog search.
    •     
    •      
  • Thank You!