Detecting Topic Drift with Compound Topic Models

2,339 views

Published on

A poster presented at ICWSM 2009 (International AAAI Conference on Weblogs and Social Media).

Authors: Dan Knights, Michael C. Mozer (University of Colorado at Boulder), and Nicolas Nicolov (J.D. Power and Associates, McGraw-Hill).

The actual paper is here: http://dan.knights.googlepages.com/knights-icwsm09.pdf

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,339
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
21
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Detecting Topic Drift with Compound Topic Models

  1. 1. Detecting Topic Drift with Compound Topic Models Dan Knights Mike Mozer Nicolas Nicolov J.D. Power and Associates McGraw-Hill, U.S.A. Boulder, CO 80303 Goals: Track topics over time Detect topic drift Identify emerging topics Visualize topic trends Dan Knights (JDPA) Detecting Topic Drift May 19, 2009 1/9
  2. 2. Topic tracking challenge: emerging topics Dataset 1 Dataset 2 LDA LDA 0: energy hybrid gas prius fuel 0: money stock dow economy 1: million billion economy stock ? 1: hybrid gas prius alternative ... 2: obama mccain election race ... probability probability ... Topic 0 1 ... correspondence 0 1 2 topic index not guaranteed topic index Dan Knights (JDPA) Detecting Topic Drift May 19, 2009 2/9
  3. 3. Compound topic models guarantee correspondence CTM Dataset 1 + Dataset 2 LDA 0: money stock dow economy 0: money stock dow economy 1: hybrid gas prius alternative 1: hybrid gas prius alternative 2: obama mccain election race 2: obama mccain election race ... ... probability probability Topic 0 1 2 correspondence 0 1 2 topic index guaranteed topic index Dan Knights (JDPA) Detecting Topic Drift May 19, 2009 3/9
  4. 4. Potential indicators of drift 3 kinds of indicator: Kullback-Leibler divergence (KLD) Relative Perplexity (RP) Chi-square test (not shown) 2 kinds of model: Topic model Unigram model 3 x 2 = 6 potential indicators Dan Knights (JDPA) Detecting Topic Drift May 19, 2009 4/9
  5. 5. Case study: synthetic topic drift Gradual topic drift, days 150-179: Days Days Days 1-149 150-179 180-300 Drift indicators: All indicators detect drift Drift Dan Knights (JDPA) Detecting Topic Drift May 19, 2009 5/9
  6. 6. Case study: Toyota All blogs mentioning “Toyota” 6 months (January – June 2008) Drift indicators: Highest Drift? Dan Knights (JDPA) Detecting Topic Drift May 19, 2009 6/9
  7. 7. Emerging topics, Toyota (Mar-Jun 2008) Emerging “energy” topic Chapman auto accident topic “Energy” topic tracks gas price Dan Knights (JDPA) Detecting Topic Drift May 19, 2009 7/9
  8. 8. Case study: iPhone Public blogs mentioning “iPhone” and “platform” 12 months (April 2007 – March 2008) Most variable topics for Aug-Nov 2007 “Apple opens window: iPhone platform” “Google launches Android platform” Dan Knights (JDPA) Detecting Topic Drift May 19, 2009 8/9
  9. 9. Summary Compound topic models help with: tracking topics between distinct data sets detecting drift related to news events avoiding topic/vocabulary matching problem visualizing topic trends Open questions: How to interpret drift indicators Are unigram models sufficient for detecting topic drift? fast and frugal compared to topic models Dan Knights (JDPA) Detecting Topic Drift May 19, 2009 9/9

×