-----  S teering  ----- T ime-Dependent  E stimation ---  of  P osteriors  --- with  HY perparameter Indexing -  in Bayesian Topic Models  - Tomonari MASADA   ( 正田备也 ) Nagasaki University [email_address]
OUTLINE(1/3) Aim Improve  LDA  [Blei et al. 03] in terms of perplexity by using document  timestamps e.g. SNS documents are timestamped. e.g. Facebook, Twitter, Weibo, ...
OUTLINE(2/3) Our approach Prepare a  word  multinomial for each timestamp LDA :  K  word multinomials (Ours) :  T  x  K  word multinomials
Topic distributions vary along time. (Increase # basis coefficient vectors)    topic = word multinomial   (Increase # basis vectors) Word distributions vary along time.
OUTLINE(3/3) Problem Overfitting T  x  K  x  W  word multinomial params Proposal Hyperparameter indexing
φ 1 φ K Multi( φ 1 ), Multi( φ 2 ), ... , Multi( φ K ) φ k =( φ k 1 ,  φ k 2,  ...,  φ kW ) LDA LDA
φ 1 φ K Di(β) β=(β 1 ,  β 2,  ...,  β W ) LDA LDA
φ 11 φ 1K φ TK φ T1
φ 11 φ 1K φ TK φ T1 K
φ 11 φ 1K φ TK φ T1 T
φ 11 φ 1K φ TK φ T1 Di(β) β=(β 1 ,  β 2,  ...,  β W ) Option 0 Option 0
φ 11 φ 1K φ TK φ T1 Option 1 Option 1 Di(β 1 )  . . .  Di(β K ) β=(β k 1 ,  β k 2,  ...,  β kW )
φ 11 φ 1K φ TK φ T1 Option 2 Option 2 Di(β 1 ) . . . Di(β T ) β=(β t 1 ,  β t 2,  ...,  β tW )
φ 11 φ 1K φ TK φ T1 Option 3 Option 3 Di(β 11 )  . . .  Di(β 1 K ) .  .  . .  .  . .  .  . Di(β T 1 )  . . .  Di(β TK ) β=(β tk 1 ,  β tk 2,  ...,  β tkW )
PROPOSAL LDA Option 1 Option 3
-----  S teering  ----- T ime-Dependent  E stimation ---  of  P osteriors  --- with  HY perparameter Indexing -  in Bayesian Topic Models  - S T E P H Y
VB for Time Independent Model VB for Slightly Time Dependent Model VB for Heavily Time Dependent Model S T E P H Y LDA Option 1 Option 3 x  50 iters x 140 iters x  10 iters
LDA Option 1 Option 3
STEPHY Conduct Multistage Inference Over Different Topic Models Having Compatible Parameters
DATA SPECS J W T P NIPS 1,740 11,998 13 919,916 DBLP 1,235,988 273,173 20 7,814,175 DONGA 24,093 71,621 53 7,949,288 TDT 96,256 51,849 123 11,460,231 NSF 128,181 25,325 13 10,388,976 YOMI 367,910 84,060 52 32,762,456
COMPLEXITY Time:  O ( PK ) P = #(diff doc-word pairs) Space:  O ( QK )  Q = #(diff timestamp-word pairs) No malloc for Malloc for
IMPLIMENTATION VB Realm of embarrassing parallelism OpenMP
[Wang et al. 06]
 
 
 
 
 
 
CGS for VB for Time Independent Model VB for Slightly Time Dependent Model VB for Heavily Time Dependent Model NEW RESULTS LDA Option 1 Option 3 x 1000 iters x  50  iters x  5  iters x  50 iters LDA
 
 
 
 
 
CONCLUSION STEPHY Conduct Multistage Inference Over Different Topic Models Having Compatible Parameters. Can efficiently improve LDA in terms of test set perplexity.
FUTURE WORK Other types of mixture models topic = Gaussian Bayesian nonparametrics Topic distributions are left intact. Practical evaluation e.g. Classification, Clustering, Topic detection, IR, ...

Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models

  • 1.
    ----- Steering ----- T ime-Dependent E stimation --- of P osteriors --- with HY perparameter Indexing - in Bayesian Topic Models - Tomonari MASADA ( 正田备也 ) Nagasaki University [email_address]
  • 2.
    OUTLINE(1/3) Aim Improve LDA [Blei et al. 03] in terms of perplexity by using document timestamps e.g. SNS documents are timestamped. e.g. Facebook, Twitter, Weibo, ...
  • 3.
    OUTLINE(2/3) Our approachPrepare a word multinomial for each timestamp LDA : K word multinomials (Ours) : T x K word multinomials
  • 4.
    Topic distributions varyalong time. (Increase # basis coefficient vectors)  topic = word multinomial  (Increase # basis vectors) Word distributions vary along time.
  • 5.
    OUTLINE(3/3) Problem OverfittingT x K x W word multinomial params Proposal Hyperparameter indexing
  • 6.
    φ 1 φK Multi( φ 1 ), Multi( φ 2 ), ... , Multi( φ K ) φ k =( φ k 1 , φ k 2, ..., φ kW ) LDA LDA
  • 7.
    φ 1 φK Di(β) β=(β 1 , β 2, ..., β W ) LDA LDA
  • 8.
    φ 11 φ1K φ TK φ T1
  • 9.
    φ 11 φ1K φ TK φ T1 K
  • 10.
    φ 11 φ1K φ TK φ T1 T
  • 11.
    φ 11 φ1K φ TK φ T1 Di(β) β=(β 1 , β 2, ..., β W ) Option 0 Option 0
  • 12.
    φ 11 φ1K φ TK φ T1 Option 1 Option 1 Di(β 1 ) . . . Di(β K ) β=(β k 1 , β k 2, ..., β kW )
  • 13.
    φ 11 φ1K φ TK φ T1 Option 2 Option 2 Di(β 1 ) . . . Di(β T ) β=(β t 1 , β t 2, ..., β tW )
  • 14.
    φ 11 φ1K φ TK φ T1 Option 3 Option 3 Di(β 11 ) . . . Di(β 1 K ) . . . . . . . . . Di(β T 1 ) . . . Di(β TK ) β=(β tk 1 , β tk 2, ..., β tkW )
  • 15.
  • 16.
    ----- Steering ----- T ime-Dependent E stimation --- of P osteriors --- with HY perparameter Indexing - in Bayesian Topic Models - S T E P H Y
  • 17.
    VB for TimeIndependent Model VB for Slightly Time Dependent Model VB for Heavily Time Dependent Model S T E P H Y LDA Option 1 Option 3 x 50 iters x 140 iters x 10 iters
  • 18.
    LDA Option 1Option 3
  • 19.
    STEPHY Conduct MultistageInference Over Different Topic Models Having Compatible Parameters
  • 20.
    DATA SPECS JW T P NIPS 1,740 11,998 13 919,916 DBLP 1,235,988 273,173 20 7,814,175 DONGA 24,093 71,621 53 7,949,288 TDT 96,256 51,849 123 11,460,231 NSF 128,181 25,325 13 10,388,976 YOMI 367,910 84,060 52 32,762,456
  • 21.
    COMPLEXITY Time: O ( PK ) P = #(diff doc-word pairs) Space: O ( QK ) Q = #(diff timestamp-word pairs) No malloc for Malloc for
  • 22.
    IMPLIMENTATION VB Realmof embarrassing parallelism OpenMP
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
    CGS for VBfor Time Independent Model VB for Slightly Time Dependent Model VB for Heavily Time Dependent Model NEW RESULTS LDA Option 1 Option 3 x 1000 iters x 50 iters x 5 iters x 50 iters LDA
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
    CONCLUSION STEPHY ConductMultistage Inference Over Different Topic Models Having Compatible Parameters. Can efficiently improve LDA in terms of test set perplexity.
  • 37.
    FUTURE WORK Othertypes of mixture models topic = Gaussian Bayesian nonparametrics Topic distributions are left intact. Practical evaluation e.g. Classification, Clustering, Topic detection, IR, ...