Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining

524 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
524
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining

  1. 2. <ul><li>Aim </li></ul><ul><ul><li>Find trends in document collections </li></ul></ul><ul><ul><ul><li>academic papers, patents, blog entries… </li></ul></ul></ul><ul><li>Idea </li></ul><ul><ul><li>Construct timestamps arrays as a new observed data </li></ul></ul><ul><li>Method </li></ul><ul><ul><li>Modify latent Dirichlet allocation (LDA) </li></ul></ul>
  2. 3. <ul><li>Timestamp array for each document </li></ul>t “ test” t t “ test” “ group” “ group” “ group” “ effect” “ space” “ space” t t −1 t −1 t +1 t +1
  3. 4. <ul><li>Modify LDA </li></ul><ul><ul><li>Draw a topic multinomial Multi ( θ d ) from Dirichlet </li></ul></ul><ul><ul><li>For each word tokens </li></ul></ul><ul><ul><ul><li>Draw a topic t from Multi ( θ d ) </li></ul></ul></ul><ul><ul><ul><li>Draw a word from multinomial Multi ( φ t ) </li></ul></ul></ul><ul><ul><li>For each timestamp tokens </li></ul></ul><ul><ul><ul><li>Draw a topic t from Multi ( θ d ) </li></ul></ul></ul><ul><ul><ul><li>Draw a timestamp from multinomial Multi ( ψ t ) </li></ul></ul></ul>
  4. 5. θ α z t z w β φ γ ψ
  5. 6. <ul><li>Different Dirichlet priors for word and timestamp multinomials </li></ul><ul><ul><li>Taking Bayesian approach also for timestamps </li></ul></ul><ul><ul><li>Not just introducing new vocabulary </li></ul></ul>
  6. 7. Topics over Time Bag of Timestamps Modification of LDA (Beta distribution for continuous timestamps) Modification of LDA (Dirichlet-multinomial for discrete timestamps ) O(NK) time, O(N) space N: number of word tokens O((N+L)K) time, O(N+L) space L: sum of timestamp array lengths Non-Bayesian term in updating formula for Gibbs sampling Additional input parameter for timestamp array lengths
  7. 8. θ α z t z w β φ ψ 1 , ψ 2
  8. 16. <ul><li>Pros </li></ul><ul><ul><li>Bayesian also for timestamps </li></ul></ul><ul><ul><li>Simple in updating computations </li></ul></ul><ul><li>Cons </li></ul><ul><ul><li>Clueless in determining timestamp array lengths </li></ul></ul><ul><ul><li>Weak for fine-grained timestamps </li></ul></ul>
  9. 17. <ul><li>Determining timestamp array lengths </li></ul><ul><ul><li>Controlling strength of timestamp data </li></ul></ul><ul><li>Parallelization </li></ul><ul><ul><li>OpenMP </li></ul></ul><ul><ul><li>CUDA </li></ul></ul><ul><ul><li>MPICH2 </li></ul></ul>

×