Your SlideShare is downloading. ×
Graphical Models for the Internet (Part2)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Graphical Models for the Internet (Part2)

357
views

Published on


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
357
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Graphical  Models  for  the   Internet  Amr  Ahmed  and  Alexander  Smola   Yahoo  Research,  Santa  Clara,  CA  
  • 2. Thus  far  ...  •  Mo>va>on  •  Basic  tools   –  Clustering   –  Topic  Models  •  Distributed  batch  inference   –  Local  and  global  states   –  Star  synchroniza>on  
  • 3. Up  next  •  Inference     –  Online  Distributed  Sampling   –  Single  machine  mul>-­‐threaded  inference   –  Online  EM  and  Submodular  Selec>on  •  Applica>ons   –  User  tracking  for  behavioral  Targe>ng   –  Content  understanding   –  User  modeling  for  content  recommenda>on  
  • 4. 4.  Online  Model  
  • 5. Scenarios  •  Batch  Large-­‐Scale   –  Covered  in  part  1  •  Mini-­‐batches   –  We  already  have  a  model   –  Data  arrives  in  batches   –  We  would  like  to  keep  model  up-­‐to-­‐data   Time  •  Time-­‐sensi>ve   –  Data  arrives  one  item  at  a  >me   –  Model  should  be  up-­‐to-­‐data   Time  
  • 6. 4.1  Dynamic  Clustering  
  • 7. The  Chinese  Restaurant  Process  •  Allows  the  number  of  mixtures  to  grow  with  the  data  •  They  are  called  non-­‐parametric  models   –  Means  the  number  of  effec>ve  parameters  grow  with  data   –  S>ll  have  hyper-­‐parameters  that  control  the  rate  of  growth   •  α:  how  fast  a  new  cluster/mixture  is  born?   •  G0:    Prior  over  mixture  component  parameters  
  • 8. The  Chinese  Restaurant  Process   φ1   φ2   φ3  Genera>ve  Process  -­‐ For  data  point  xi     -­‐   Choose  table  j  ∝  mj        and    Sample  xi  ~  f(φj)   The  rich  gets  richer  effect   -­‐   Choose  a  new  table    K+1  ∝  α     CANNOT  handle  sequen6al  data   -­‐   Sample  φK+1  ~  G0      and  Sample  xi  ~  f(φK+1)  
  • 9. Recurrent  CRP  (RCRP)  [Ahmed  and  Xing  2008]  •  Adapts  the  number  of  mixture  components  over  >me   –  Mixture  components  can  die  out   –  New  mixture  components  are  born  at  any  >me   –  Retained  mixture  components  parameters  evolve  according   to  a  Markovian  dynamics  
  • 10. The  Recurrent  Chinese  Restaurant  Process   φ1,1   φ2,1   φ3,1   T=1   Dish  eaten  at  table  3  at  >me  epoch  1   OR  the  parameters  of  cluster  3  at  >me    epoch  1  Genera>ve  Process  -­‐ Customers  at  >me  T=1  are  seated  as  before:   -­‐   Choose  table  j  ∝  mj,1        and    Sample  xi  ~  f(φj,1)   -­‐   Choose  a  new  table    K+1  ∝  α     -­‐   Sample  φK+1,1  ~  G0      and  Sample  xi  ~  f(φK+1,1)  
  • 11. The  Recurrent  Chinese  Restaurant  Process   φ1,1   φ2,1   φ3,1   T=1   m1,1=2   m2,1=3   m3,1=1   T=2  
  • 12. φ1,1   φ2,1   φ3,1   T=1  m1,1=2   m2,1=3   m3,1=1   T=2   φ1,1   φ2,1   φ3,1  
  • 13. φ1,1   φ2,1   φ3,1   T=1  m1,1=2   m2,1=3   m3,1=1   T=2   φ1,1   φ2,1   φ3,1  
  • 14. φ1,1   φ2,1   φ3,1   T=1  m1,1=2   m2,1=3   m3,1=1   T=2   φ1,1   φ2,1   φ3,1  
  • 15. φ1,1   φ2,1   φ3,1   T=1  m1,1=2   m2,1=3   m3,1=1   T=2   φ1,2   φ2,1   φ3,1  Sample  φ1,2  ~  P(.| φ1,1)  
  • 16. φ1,1   φ2,1   φ3,1   T=1  m1,1=2   m2,1=3   m3,1=1   T=2   φ1,2   φ2,1   φ3,1   And  so  on  ……  
  • 17. φ1,1   φ2,1   φ3,1   T=1  m1,1=2   m2,1=3   m3,1=1   T=2   φ1,2   φ2,2   φ3,1   φ4,2   Died  out  cluster   Newly  born  cluster   At  the  end  of  epoch  2  
  • 18. φ1,1   φ2,1   φ3,1   T=1  N1,1=2   N2,1=3   m3,1=1   T=2  φ1,2   φ2,2   φ3,1   φ4,2  m1,2=2   m2,2=2   m4,2=1   T=3  
  • 19. Recurrent  Chinese  Restaurant  Process  •  Can  be  extended  to  model  higher-­‐order   dependencies  •  Can  decay  dependencies  over  >me   –  Pseudo-­‐counts  for  table  k  at  >me  t  is   H   -  h   ⎛   ρ   ⎞   ∑ ⎜ e mk    ,  t  -  h   ⎟         History  size   ⎝   h  =1     ⎠   Decay  factory   Number  of  customers  siing     at    table  K  at  >me  epoch  t-­‐h  
  • 20. φ1,1   φ2,1   φ3,1   T=1   m1,1=2   m2,1=3   m3,1=1   T=2   φ1,2   φ2,2   φ3,1   φ4,2   m2,3   T=3  φ1,2   φ2,2   φ4,2   H   -  h   ⎛   ρ   ⎞   m2,3  =   ∑ ⎜ e mk    ,  t  -  h   ⎟         ⎝   h  =1     ⎠  
  • 21. 4.2  Online  Distributed  Inference   Tracking  Users  Interest  
  • 22. Characterizing  User  Interests  •  Short  term  vs  long-­‐term   Music   Housing   Buying  a  car   Furniture   Travel  plans   Jan   April   July   Oct  
  • 23. Characterizing  User  Interests  •  Short  term  vs  long-­‐term  •  Latent   Gaga   mortgage   millage   Barcelona   fast   seafood   used   Jan   April   July   Oct  
  • 24. Problem  formula>on  Input  •  Queries  issued  by  the  user  or  tags  of  watched  content  •  Snippet  of  page  examined  by  user  •  Time  stamp  of  each  ac>on  (day  resolu>on)  Output  •     Users’  daily  distribu>on  over  interests  •     Dynamic  interest  representa>on  •     Online  and  scalable  inference  •     Language  independent   Flight   classes   School   London   registra>on   Supplies   Hotel   housing   Loan   weather   rent   semester  
  • 25. Problem  formula>on  Input  •  Queries  issued  by  the  user  or  tags  of  watched  content  •  Snippet  of  page  examined  by  user  •  Time  stamp  of  each  ac>on  (day  resolu>on)  Output  •     Users’  daily  distribu>on  over  interests  •     Dynamic  interest  representa>on  •     Online  and  scalable  inference  •     Language  independent   Back   To  school   finance   Flight   Travel   classes   School   London   registra>on   Supplies   Hotel   housing   Loan   weather   rent   semester  
  • 26. Problem  formula>on  When  to  show  a  financing  ad?   Back   To  school   finance   Flight   Travel   classes   School   London   registra>on   Supplies   Hotel   housing   Loan   weather   rent   semester  
  • 27. Problem  formula>on  When  to  show  a  financing  ad?   Back   To  school   finance   Flight   Travel   classes   School   London   registra>on   Supplies   Hotel   housing   Loan   weather   rent   semester  
  • 28. Problem  formula>on  When  to  show  a  financing  ad?   Back   To  school   finance   Flight   Travel   classes   School   London   registra>on   Supplies   Hotel   housing   Loan   weather   rent   semester  
  • 29. Problem  formula>on  When  to  show  a  hotel  ad?   Back   To  school   finance   Flight   Travel   classes   School   London   registra>on   Supplies   Hotel   housing   Loan   weather   rent   semester  
  • 30. Problem  formula>on  When  to  show  a  hotel  ad?   Back   To  school   finance   Flight   Travel   classes   School   London   registra>on   Supplies   Hotel   housing   Loan   weather   rent   semester  
  • 31. Problem  formula>on  Input  •  Queries  issued  by  the  user  or  tags  of  watched  content  •  Snippet  of  page  examined  by  user  •  Time  stamp  of  each  ac>on  (day  resolu>on)  Output  •     Users’  daily  distribu>on  over  interests  •     Dynamic  interest  representa>on  •     Online  and  scalable  inference  •     Language  independent   Back   To  school   finance   Flight   Travel   classes   School   London   registra>on   Supplies   Hotel   housing   Loan   weather   rent   semester  
  • 32. Mixed-­‐Membership  Formula>on   Job  Hiring    Objects   speed  price   card    diet  calories    part-­‐6me  Camry   loan  recipe  milk   Career  opening     Weight  lb  kg   bonus  package  Degree  of  membership   Recipe   Car   job     Bank   Chocolate   Blue   Career   Online   Pizza   Book   Business   Credit   Food   Kelley   Assistant   Card  Mixtures   Chicken   Prices   Hiring   debt   Milk   Small   Part-­‐>me   pornolio   Buoer   Speed   Recep>onis Finance   Powder   large   t   Chase   Diet   Cars   Job   Finance  
  • 33. In  Graphical  Nota>on  
  • 34. In  Polya-­‐Urn  Representa>on   x   x  •  Collapse  mul>nomial  variables:  •  Fixed-­‐dimensional  Hierarchal  Polya-­‐Urn   representa>on   –  Chinese  restaurant  franchise    
  • 35. Recipe   Car   job     Bank   Chocolate   Blue   Career   Online   Global  topics   Pizza   Food   Chicken   Book   Kelley   Prices   Business   Assistant   Hiring   Credit   Card   debt   trends Milk   Small   Part-­‐>me   pornolio   Buoer   Speed   Recep>o Finance   Powder   large   nist   Chase   Topic    Food  Chicken   word-­‐distribu>ons User-­‐specific  topics  trends   (mixing-­‐vector)Car  speed  offer  camry  accord  career   User  interac>ons:  queries,   keyword  from  pages  viewed
  • 36. Recipe   Car   job     Bank   Chocolate   Blue   Career   Online   Pizza   Book   Business   Credit   Food   Kelley   Assistant   Card   Chicken   Prices   Hiring   debt   Milk   Small   Part-­‐>me   pornolio   Buoer   Speed   Recep>o Finance   Powder   large   nist   Chase  Food  Chicken   ………   Genera>ve  Process   •   For  each  user  interac>on   •     Choose  an  intent  from  local  distribu>on   •  Sample  word  from  the  topic’s  word-­‐distribu>on    Car  speed  offer   • Choose  a  new  intent    ∝  λ    camry  accord  career   •  Sample  a  new  intent  from  the  global  distribu>on   •   Sample  word  from  the  new  topic  word-­‐ distribu>on    
  • 37. Recipe   Car   job     Bank   Chocolate   Blue   Career   Online   Pizza   Book   Business   Credit   Food   Kelley   Assistant   Card   Chicken   Prices   Hiring   debt   Milk   Small   Part-­‐>me   pornolio   Buoer   Speed   Recep>o Finance   Powder   large   nist   Chase  Food  Chicken  pizza   ………   Genera>ve  Process   •   For  each  user  interac>on   •     Choose  an  intent  from  local  distribu>on   •  Sample  word  from  the  topic’s  word-­‐distribu>on    Car  speed  offer   • Choose  a  new  intent    ∝  λ    camry  accord  career   •  Sample  a  new  intent  from  the  global  distribu>on   •   Sample  word  from  the  new  topic  word-­‐ distribu>on    
  • 38. Recipe   Car   job     Bank   Chocolate   Blue   Career   Online   Pizza   Book   Business   Credit   Food   Kelley   Assistant   Card   Chicken   Prices   Hiring   debt   Milk   Small   Part-­‐>me   pornolio   Buoer   Speed   Recep>o Finance   Powder   large   nist   Chase  Food  Chicken  pizza   ………   Genera>ve  Process   •   For  each  user  interac>on   •     Choose  an  intent  from  local  distribu>on   •  Sample  word  from  the  topic’s  word-­‐distribu>on    Car  speed  offer   • Choose  a  new  intent    ∝  λ    camry  accord  career   •  Sample  a  new  intent  from  the  global  distribu>on   •   Sample  word  from  the  new  topic  word-­‐ distribu>on    
  • 39. Recipe   Car   job     Bank   Chocolate   Blue   Career   Online   Pizza   Book   Business   Credit   Food   Kelley   Assistant   Card   Chicken   Prices   Hiring   debt   Milk   Small   Part-­‐>me   pornolio   Buoer   Speed   Recep>o Finance   Powder   large   nist   Chase  Food  Chicken  pizza    h………   iring   Genera>ve  Process   •   For  each  user  interac>on   •   Choose  an  intent  from  local  distribu>on   •  Sample  word  from  the  topic’s  word-­‐distribu>on    Car  speed  offer   • Choose  a  new  intent    ∝  λ    camry  accord  career   •  Sample  a  new  intent  from  the  global  distribu>on   •   Sample  word  from  the  new  topic  word-­‐ distribu>on    
  • 40. Recipe   Car   job     Bank   Chocolate   Blue   Career   Online   Pizza   Book   Business   Credit   Food   Kelley   Assistant   Card   Chicken   Prices   Hiring   debt   Milk   Small   Part-­‐>me   pornolio   Buoer   Speed   Recep>o Finance   Powder   large   nist   Chase  Food  Chicken  pizza    millage   Genera>ve  Process   •   For  each  user  interac>on   •     Choose  an  intent  from  local  distribu>on   •  Sample  word  from  topic’s  word-­‐distribu>on    Car  speed  offer   • Choose  a  new  intent    ∝  λ    camry  accord  career   •  Sample  a  new  intent  from  the  global  distribu>on   •   Sample  from  word  the  new  topic  word-­‐ distribu>on    
  • 41. Recipe   Car   job     Bank   Chocolate   Blue   Career   Online   Pizza   Book   Business   Credit   Food   Kelley   Assistant   Card   Chicken   Prices   Hiring   debt   Milk   Small   Part-­‐>me   pornolio   Buoer   Speed   Recep>o Finance   Powder   large   nist   Chase   ProblemsFood  Chicken   -­‐   Sta>c  Model  pizza    millage   -­‐   Does  not  evolve  user’s  interests   -­‐   Does  not  evolve  the  global  trend  of  interests   -­‐   Does  not  evolve  interest’s  distribu>on  over  terms    Car  speed  offer  camry  accord  career  
  • 42. At  6me  t   At  6me  t+1   Recipe   Car   job     Bank   Chocolate   Blue   Career   Online   Pizza   Book   Business   Credit   Food   Kelley   Assistant   Card   Chicken   Prices   Hiring   debt   Milk   Small   Part-­‐>me   pornolio   Buoer   Speed   Recep>o Finance   Powder   large   nist   Chase  Food  Chicken   Build  a  dynamic  model  pizza    millage   Connect  each  level     using  a  RCRP  Car  speed  offer  camry  accord  career  
  • 43. At  6me  t   At  6me  t+1   At  6me  t+2   At  6me  t+3   Global   m   process   m   n   User  1   n  process   Which  >me  kernel  to     use  at  each  level?   User  2  process   User  3  process  
  • 44. At  6me  t   At  6me  t+1   Recipe   Car   job     Bank   Chocolate   Blue   Career   Online   Pizza   Book   Business   Credit   Food   Kelley   Assistant   Card   Chicken   Prices   Hiring   debt   Milk   Small   Part-­‐>me   pornolio   Buoer   Speed   Recep>o Finance   Powder   large   nist   Chase                        =                  *       Pseudo  countsFood  Chicken  pizza    millage   Decay  factorCar  speed  offer   Observa>on  1camry  accord  career   -­‐ Popular  topics  at    >me  t  are  likely  to  be  popular  at  >me  t+1   -  φk,t+1  is  likely  to  smoothly  evolve  from     φk,t    
  • 45. At  6me  t   At  6me  t+1   Car   Recipe   Car   Blue   job     Bank   Chocolate   Blue   Career   Online   Pizza   Book   Book   Business   Credit   Food   Kelley   Kelley   Assistant   Card   Chicken   Prices   Prices   Hiring   debt   Milk   Small   Part-­‐>me   pornolio   Buoer   Small   Speed   Recep>o Finance   Powder   Speed   large   nist   Chase   large   Car   Intui>on Al6ma   Accord   Captures  current  trend  of  Food  Chicken   Book   the  car  industry     Kelley  pizza    millage   Prices   (new  release  for  e.g.)   Small   Speed   φk,t   ~   φk,t+1      ∼ Dir(βk,t+1)  Car  speed  offer   Observa>on  1camry  accord  career   -­‐ Popular  topics  at    >me  t  are  likely  to  be  popular  at  >me  t+1   -  φk,t+1  is  likely  to  smoothly  evolve  from     φk,t    
  • 46. At  6me  t   At  6me  t+1   Car   Recipe   Al>ma   job     Bank   Chocolate   Accord   Career   Online   Pizza   Blue   Business   Credit   Food   Book   Assistant   Card   Chicken   Kelley   Hiring   debt   Milk   Prices   Part-­‐>me   pornolio   Buoer   Small   Recep>o Finance   Powder   Speed   nist   Chase   How  do  we  get  a  prior  Food  Chicken   that  captures  both  long  pizza    millage   and  short  term   interest?Car  speed  offer   Observa>on  2camry  accord  career   -­‐   User  prior  at  >me  t+1  is  a  mixture  of  the  user  short  and   long  term  interest  
  • 47. All   μ3   month   μ2   week   Long-­‐term μ   short-­‐term Prior  for  user   ac>ons  at  >me  t   food  Food   recipe   Part-­‐>me   Kelly   chicken  Chicke job   Opening   recipe   Pizza  n   hiring   salary   cuisine   millage  pizza                  t                                  t+1                  Time   Diet   Cars   Job   Finance   Recipe   Car   job     Bank   Chocolate   Blue   Career   Online   Pizza   Book   Business   Credit   Food   Kelley   Assistant   Card   Chicken   Prices   Hiring   debt   Milk   Small   Part-­‐>me   pornolio   Buoer   Speed   Recep>onis Finance   Powder   large   t   Chase  
  • 48. At  6me  t   At  6me  t+1   Car   Al>ma   Recipe   job     Bank   Accord   Chocolate   Career   Online   Blue   Pizza   Business   Credit   Book   Food   Assistant   Card   Kelley   Chicken   Hiring   debt   Prices   Milk   Part-­‐>me   pornolio   Small   Buoer   Recep>o Finance   Speed   Powder   nist   Chase   short-­‐term priorsFood  Chicken  Pizza    millage   Genera>ve  Process   •   For  each  user  interac>on   •   Choose  an  intent  from  local  distribu>on   •  Sample  word  from  the  topic’s  word-­‐distribu>on    Car  speed  offer   • Choose  a  new  intent    ∝  λ    camry  accord  career   •  Sample  a  new  intent  from  the  global  distribu>on   •   Sample  word  from  the  new  topic  word-­‐ distribu>on    
  • 49. Polya-­‐Urn  RCRF  Process   ?  
  • 50. Simplified  Graphical  Model   ~   ~   ~   ~  At  6me  t   At  6me  t+1   ~   ~  
  • 51. Simplified  Graphical  Model   ~   ~   ~   ~   Car   Car   Blue   Al6ma   Book   Accord   Kelley   Book   ~   ~   Prices   Kelley   Small   Prices   Speed   Small   large   Speed  
  • 52. Simplified  Graphical  Model   ~   ~   ~   ~   ~   ~  Food  Chicken  Pizza    millage  
  • 53. Simplified  Graphical  Model   ~   ~   ~   ~   ~   ~  
  • 54. Simplified  Graphical  Model   ~   ~   ~   ~  Topics  evolve  over  6me?  User’s  intent  evolve  over  6me?  Capture  long  and  term  interests  of  users?   ~   ~  
  • 55. At  6me  t   At  6me  t+1   At  6me  t+2   At  6me  t+3   Global   m   process   m   n   User  1   n  process   User  2  process   User  3  process  
  • 56. 4.2  Online  Distributed  Inference   Work  Flow  
  • 57. Work  Flow   today   System   User  interac>ons   state   User  interac>ons   User  interac>ons   Daily     new   Update   User  interac>ons   Users’  models   (inference)   User  interac>ons   Current   Users’  models  Hundred  of  millions  
  • 58. Online  Scalable  Inference  •  Online  algorithm   ~   ~   –  Greedy  1-­‐par>cle  filtering  algorithm   –  Works  well  in  prac>ce     –  Collapse  all  mul>nomials  except  Ωt   ~   ~   •  This  makes  distributed  inference  easier   –  At  each  >me  t:   ~   ~  •  Distributed  scalable  implementa>on   –  Used  first  part  architecture  as  a  subrou>ne   –  Added  synchronous  sampling  capabili>es  
  • 59. Distributed  Inference  (at  >me  t)   Ω θ z   w   φ
  • 60. Distributed  Inference  (at  >me  t)   Ω Collapse  all  mul>nomial   θ θ z   z   w   w   φ client   client  
  • 61. Ater  collapsing   job     Recipe   Career   Bank   Car   Chocol Busines Online   Blue   ate   s   Credit   Book   Pizza   Assistan Card   Kelley   Food   t   debt   Prices   Chicken   Hiring   pornoli Small   Milk   Part-­‐ o   Speed   Buoer   >me   Finance   large   Powder   Recep> Chase   onist   Use  Star-­‐Synchroniza>on   Car  speed  offer   Food  Chicken   camry  accord  career   Pizza    millage  client   client  
  • 62. Fully  Collapsed   Shared  memory     job     Recipe   Career   Bank   Car   Chocol Busines Online   Blue   ate   s   Credit   Book   Pizza   Assistan Card   Kelley   Food   t   debt   Prices   Chicken   Hiring   pornoli Small   Milk   Part-­‐ o   Speed   Buoer   >me   Finance   large   Powder   Recep> Chase   onist   Car  speed  offer   Food  Chicken   camry  accord  career   Pizza    millage  client   client  
  • 63. Distributed  Inference  (at  >me  t)  /P (zij = k|wij = w, ⌦t , nt ) / t t ˜i ! + mt mt ˜k ↵ +K jt, t, j t ˜mt + mt + nkw + kwk + ˜ k t nt, j + nt ˜ ik + P t k n↵ t + ik + nik t, j P t P + ˜ ik l ml + ml K ˜ ˜ml+ mt nkl + l kl + ˜ l t l Local  trend   Topic  factor   Global  trend   Car  speed  offer   Food  Chicken   camry  accord  career   Pizza    millage   client   client  
  • 64. Semi-­‐Collapsed   Shared  memory   job     Recipe   Career   Bank   Car   Chocol Busines Online   Blue   ate   s   Credit   Book   Pizza   Assistan Card   Kelley   Food   t   debt   Prices   Chicken   Hiring   pornoli Small   Milk   Part-­‐ o   Speed   Buoer   >me   Finance   large   Powder   Recep> Chase   onist   Car  speed  offer   Food  Chicken   camry  accord  career   Pizza    millage  client   client  
  • 65. Semi-­‐Collapsed   Shared  memory   job     Recipe   Career   Bank   Car   Chocol Busines Online   Blue   ate   s   Credit   Book   Pizza   Assistan Card   Kelley   Ωt   Food   Chicken   Prices   t   Hiring   debt   pornoli Small   Milk   Part-­‐ o   Speed   Buoer   >me   Finance   large   Powder   Recep> Chase   onist   Car  speed  offer   Food  Chicken   camry  accord  career   Pizza    millage  client   client  
  • 66. Semi-­‐Collapsed   t(z t t t t, Pi ) ij n ˜ = k|wij = w, ⌦ , ni ) ˜ ! t, j t, j + ˜t +t nkw t, jkw t / nik + nt ˜ ik + ⌦ t / nj P t, ik +t nik ˜ + ⌦ l nkl + ˜kl + Ωt   Ωt   Car  speed  offer   Food  Chicken   camry  accord  career   Pizza    millage   client   client  
  • 67. Distributed  Sampling  Cycle   Sample  Z   Sample  Z   Sample  Z   Sample  Z   For  users   For  users   For  users   For  users   Write  counts   Write  counts   Write  counts   Write  counts   to     to     to     to     memcached   memcached   memcached   memcached   Barrier   Sample    Ωt  Collect  counts  and   Do  nothing   Do  nothing   Do  nothing  sample  Ω   reduc>on  step   Requires  a  Barrier   Read  Ω Read  Ω Read  ΩRead  Ω from   from   from   from  memcached memcached   memcached   memcached  
  • 68. Distributed  Sampling  Cycle   Sample  Z   Sample  Z   Sample  Z   Sample  Z   For  users   For  users   For  users   For  users   Write  counts   Write  counts   Write  counts   Write  counts   Barrier  Collect  counts  and   Do  nothing   Do  nothing   Do  nothing  sample  Ω   Barrier  Read  Ω from Read  Ω from   Read  Ω from   Read  Ω from  
  • 69. Sampling  Ω •  Introduce  auxiliary  variable  mkt   –  How  many  >mes  the  global  distribu>on  was  visited     –                                                                                 ~  AnotniaK   Ω P (zij = k|wij = w, ⌦t , nt ) t t ˜i ! m   nt, j + / nt, ik j + nt + ⌦t ˜ ik P kwt, j l nkl +
  • 70. Distributed  Sampling  Cycle   Sample  Z   Sample  Z   Sample  Z   Sample  Z   For  users   For  users   For  users   For  users   Write  counts   Write  counts   Write  counts   Write  counts   Barrier  Collect  counts  and   Do  nothing   Do  nothing   Do  nothing  sample  Ω   Barrier  Read  Ω from Read  Ω from   Read  Ω from   Read  Ω from  
  • 71. 4.2  Online  Distributed  Inference   Behavioral  Targe>ng  
  • 72. Experimental  Results  •  Tasks  is  predic>ng  convergence  in  display  adver>sing  •  Use  two  datasets   –  6  weeks  of  user  history   –  Last  week  responses  to  Ads  are  used  for  tes>ng  •  Baseline:   –  User  raw  data  as  features   –  Sta>c  topic  model    
  • 73. Interpretability  User-­‐1   User-­‐2  
  • 74. Performance  in  Display  Adver>sing   ROC   Number  of  conversions  
  • 75. Performance  in  Display  Adver>sing  Weighted  ROC  measure   Sta>c  Effect  of  number  of  topics   Batch  models  
  • 76. How  Does  It  Scale?  2  Billion  instances  with    5M  vocabulary   using  1000  machines      one  itera>on  took  ~  3.8  minutes  
  • 77. Distributed  Inference  Revisited  
  • 78. To  collapse  or  not  to  collapse?  •  Not  collapsing   –  Keeps  condi>onal  independence   •  Good  for  paralleliza>on   •  Requires  synchronous  sampling   –  Might  mix  slowly    •  Collapsing     –  Mixes  faster   –  Hinder  parallelism   –  Use  star-­‐synchroniza>on   •  Works  well  if  sibling  depends  on  each   others  via  aggregates   •  Requires  asynchronous  communica>on  
  • 79. Inference  Primi>ve  •  Collapse  a  variable   –  Star  synchroniza>on  for  the  sufficient  sta>s>cs  •  Sampling  a  variable   –  Local     •  Sample  it  locally  (possibly  using  the  synchronized   sta>s>cs)   –  Shared   •  Synchronous  sampling  using  a  barrier  •  Op>mizing  a  variable   –  Same  as  in  the  shared  variable  case   –  Ex.  Condi>onal  topic  models  
  • 80. Online  Models  •  Batch  Large-­‐Scale   –  Covered  in  part  1  •  Mini-­‐batches   –  We  already  have  a  model   –  Data  arrives  in  batches   –  We  would  like  to  keep  model  up-­‐to-­‐data   Time  •  Time-­‐sensi>ve   –  Data  arrives  one  item  at  a  >me   –  Model  should  be  up-­‐to-­‐data   Time  
  • 81. What  Is  Coming?  •  Inference     –  Online  Distributed  Sampling   –  Single  machine  mul>-­‐threaded  inference   –  Online  EM  and  Submodular  Selec>on  •  Applica>ons   –  User  tracking  for  behavioral  Targe>ng   –  Content  understanding   –  User  modeling  for  content  recommenda>on  
  • 82. 4.2  Scalable  SMC  Inference   Storylines  
  • 83. News  Stream  
  • 84. News  Stream  •  Real>me  news  stream     –  Mul>ple  sources  (Reuters,  AP,  CNN,  ...)   –  Same  story  from  mul>ple  sources   –  Stories  are  related  •  Goals   –  Aggregate  ar>cles  into  a  storyline   –  Analyze  the  storyline  (topics,  en>>es)   •  How  does  the  story  develop  over  >me?   •  Who  are  the  main  en>>es?   •  What  topics  are  addressed?  
  • 85. A  Unified  Model  •  Jointly  solves  the  three  main  tasks   –  Clustering,     –  Classifica>on     –  Analysis  •  Building  blocks   –  A  Topic  model   •  High-­‐level  concepts  (unsupervised  classifica>on)   –  Dynamic  clustering  (RCRP)   •  Discover  >ghtly-­‐focused  concepts   –  Named  en>>es   –  Story  developments  
  • 86. Infinite  Dynamic  Cluster-­‐Topic  Hybrid   Sports   Poli6cs   Unrest   games   Government   Police   Won   Minister   Aoack   Team   Authori>es   run   Final   Opposi>on   man   Season   Officials   group   League   Leaders   arrested   held   group   move   UEFA-soccer Tax-BillChampions   Juventus     Tax   Bush  Goal   AC  Milan     Billion   Senate  Coach   Lazio   Cut   Fleischer  Striker    Ronaldo   Plan   White  House  Midfield   Lyon         Budget   Republican  penalty   Economy   γ  
  • 87. Infinite  Dynamic  Cluster-­‐Topic  Hybrid   Sports   Poli6cs   Unrest   games   Government   Police   Won   Minister   Aoack   Team   Authori>es   run   Final   Opposi>on   man   Season   Officials   group   League   Leaders   arrested   held   group   move   UEFA-soccer Tax-Bill Border-TensionChampions   Juventus     Tax   Bush   Nuclear   Pakistan  Goal   AC  Milan     Billion   Border   India   Senate  Coach   Lazio   Cut   Dialogue   Kashmir   Fleischer  Striker    Ronaldo   Plan   Diploma6c   New  Delhi   White  House  Midfield   Lyon         Budget   militant   Islamabad   Republican   Insurgency  penalty   Economy   Musharraf   missile   Vajpayee   γ  
  • 88. The  Graphical  Model   •  Topic  model   •  Topics  per  cluster   •  RCRP  for  cluster   •  Hierarchical  DP  for   ar>cle   •  Separate  model  for   named  en>>es   •  Story  specific   correc>on  
  • 89. 4.2  Fast  SMC  Inference   Inference  via  SMC  
  • 90. Online  Inference  Algorithm  •  A  Par>cle  filtering  algorithm  •  Each  par>cle  maintains  a  hypothesis   –  What  are  the  stories   –  Document-­‐story  associa>ons   –  Topic-­‐word  distribu>ons  •  Collapsed  sampling   –  Sample  (zd,sd)  only  for  each  document  
  • 91. Par>cle  Filter  Representa>on  
  • 92. -­‐  s  and  z  are  >ghtly  coupled   -­‐  Alterna>ves   Fold  the  document  into   -­‐  Sample  s  then  sample  z  (high  variance)  the  structure  of  each  filter   Document  td   en>>es   w   w   w   w   w   w   s   z   z   z   z   z   z  
  • 93. -­‐  s  and  z  are  >ghtly  coupled   -­‐  Alterna>ves   Fold  the  document  into   -­‐  Sample  s  then  sample  z  (high  variance)  the  structure  of  each  filter   -­‐  Sample  z  then  sample  s  (doesn’t  make  sense)   Document  td   en>>es   w   w   w   w   w   w   z   z   z   z   z   z   s  
  • 94. -­‐  s  and  z  are  >ghtly  coupled   -­‐  Alterna>ves   Fold  the  document  into   -­‐  Sample  s  then  sample  z  (high  variance)  the  structure  of  each  filter   -­‐  Sample  z  then  sample  s  (doesn’t  make  sense)   -­‐  Idea     -­‐  Run  a  few  itera>ons  of  MCMC  over  s  and  z   -­‐  Take  last  sample  as  the  proposed  value  
  • 95. How  good  each  filter  look   now?  
  • 96. Get  rid  of  bad  filter  Replicate  good  one  
  • 97. Get  rid  of  bad  filter  Replicate  good  one  
  • 98. Efficient  Computa>on  and  Storage  •  Par>cles  get  replicated   State  P1   1 State  P1   State  P2   1 2State  P3   State  P1   State  P2   3 1 2
  • 99. Efficient  Computa>on  and  Storage  •  Par>cles  get  replicated   –  Use  thread-­‐safe  Inheritance  tree     State  P1   1 State  P1   State  P2   1 2State  P3   State  P1   State  P2   3 1 2
  • 100. Efficient  Computa>on  and  Storage  •  Par>cles  get  replicated   –  Use  thread-­‐safe  Inheritance  tree  [extends  Canini  et.  Al  2009]   state   1   1
  • 101. Efficient  Computa>on  and  Storage  •  Par>cles  get  replicated   –  Use  thread-­‐safe  Inheritance  tree  [extends  Canini  et.  Al  2009]   state   Root   1 State   1 2 1 2 state  
  • 102. Efficient  Computa>on  and  Storage  •  Par>cles  get  replicated   –  Use  thread-­‐safe  Inheritance  tree  [extends  Canini  et.  Al  2009]   state   Root   1 State   2 1 2 state   3 1 State     State   3 1 2
  • 103. Efficient  Computa>on  and  Storage  •  Par>cles  get  replicated   –  Use  thread-­‐safe  Inheritance  tree  [extends  Canini  et.  Al  2009]   –  Inverted  representa6on    for  fast  lookup   India:  story  5   Root   Pakistan:  story  1   1India:    story  1,    US:    story  2,  story  3   2 1 2 Bush:  story  2   India:  story  3   3 1 (empty)   Congress:    story  2   3 1 2
  • 104. Efficient  Computa>on  and  Storage  •  Why  this  is  useful?  •  Only  focus  on  stories  that  men>on  at  least  one   en>ty   –  Otherwise  pre-­‐compute  and  reuse  •  We  can  use  fast  samplers  for  z  as  well  [Yao  et.   Al.  KDD09]  
  • 105. Experiments  •  Yahoo!  News  datasets  over  two  months   –  Three  sub-­‐sampled  sets  with  different  characteris>cs  •  Editorially-­‐labeled    documents   –  Cannot-­‐like  and  must-­‐link  pairs  •  Performance  measures  using  clustering  accuracy  •  Baseline   –  A  strong  offline  Correla>on  clustering  algorithm  [WSDM  11]   •  Scaled  with  LSH  to  compute  neighborhood  graph  (similar  to  Petrovic   2010)  
  • 106. Structured  Browsing   Sports   Poli6cs   Unrest   games   Government   Police   Won   Minister   Aoach   Team   Authori>es   run   Final   Opposi>on   man   Season   Officials   group   League   Leaders   arrested   held   group   move   UEFA-soccer Tax-bills Border-TensionChampions   Juventus     Tax   Bush   Nuclear   Pakistan  Goal   AC  Milan     Billion   Senate   Border   India  Leg   Real  Madrid   Cut   US   Dialogue   Kashmir  Coach   Milan     Plan   Congress   Diploma>c   New  Delhi  Striker   Lazio   Budget   Fleischer   militant   Islamabad  Midfield   Ronaldo   Economy   White  House   Insurgency   Musharraf  penalty   Lyon         lawmakers   Republican   missile   Vajpayee  
  • 107. Structured  Browsing   Border-Tension Nuclear   Pakistan   Border   India   Dialogue   Kashmir   Diploma>c   New  Delhi   More    Like  India-­‐Pakistan  story   militant   Islamabad   Insurgency   Musharraf   missile   Vajpayee  Based    on  topics   Nuclear+  topics  [poli>cs]   Middle-east-conflict Nuclear programsPeace   Israel     Nuclear   North  Korea  Roadmap   Pales>nian   summit   South  Korea  Suicide   West  bank   warning   U.S  Violence   Sharon   policy   Bush  Seolements   Hamas   missile   Pyongyang  bombing   Arafat   program  
  • 108. Structured  Browsing   Sports   Poli6cs   Unrest   games   Government   Police   Won   Minister   Aoach   Team   Authori>es   run   Final   Opposi>on   man   Season   Officials   group   League   Leaders   arrested   More  on  Personaliza>on     held   group   move   UEFA-soccerlater  on  the  talk   Tax-bills Border-TensionChampions   Juventus     Tax   Bush   Nuclear   Pakistan  Goal   AC  Milan     Billion   Senate   Border   India  Leg   Real  Madrid   Cut   US   Dialogue   Kashmir  Coach   Milan     Plan   Congress   Diploma>c   New  Delhi  Striker   Lazio   Budget   Fleischer   militant   Islamabad  Midfield   Ronaldo   Economy   White  House   Insurgency   Musharraf  penalty   Lyon         lawmakers   Republican   missile   Vajpayee  
  • 109. Quan>ta>ve  Evalua>on  Number  of  topics  =  100  Effect  of  number  of  topics  
  • 110. Scalability  
  • 111. Model  Contribu>on  •  Named  en>>es  are  very  important  •  Removing  >me  increase  processing  up  to  2   seconds  per  document  
  • 112. Puing  Things  Together  
  • 113. Time  vs.  Machines  •  Data  arrives  dynamically  •  How  to  keep  models  up  to  date?   Batch   Mini-­‐batches   Truly     online    Single-­‐Machine    Gibbs   Online-­‐LDA   SMC   Varia>onal   Mul>-­‐Machine   Star-­‐Synch.   Star-­‐Synch  +   ?   Synchronous  step  
  • 114. 4.3  User  Preference  Online  EM  and  Submodularity  
  • 115. Storyline  Summariza>on  Earthquake  &    Tsunami   Rescue  &  Relief   Nuclear  Power   Economy   Time  •  How  to  summarize  a  storyline  with  few  ar>cles?  
  • 116. Storyline  Summariza>on  Earthquake  &    Tsunami   Rescue  &  Relief   Nuclear  Power   Economy   Time  •  How  to  summarize  a  storyline  with  few  ar>cles?  •  How  to  personalize  the  summary?  
  • 117. Storyline  Summariza>on  Earthquake  &    Tsunami   Rescue  &  Relief   Nuclear  Power   Economy   Time  •  How  to  summarize  a  storyline  with  few  ar>cles?  •  How  to  personalize  the  summary?  
  • 118. User  Interac>on  •  Passive   –  We  observe  the  user  generated  contents   –  Model  user  based  on  those  content  using  unsupervised  techniques  •  Explicit   –  We  present  users  with  content   –  User  give  explicit  feedback   •  Like/dislike   –  Learn  user  preference  using  supervised  techniques  •  Implicit   –  Mixture  between  the  two   –  Present  the  user  with  items   –  Observe  which  items  the  user  interact  with   –  Learning  user  preference  using  semi-­‐supervised  models  
  • 119. User  Sa>sfac>on  •  Modular   –  Present  users  with  items  she  prefers   •  Regardless  of  the  context   –  Targets  relevance   –  Ex:  vector  space  models  •  Submodular   –  More  of  the  same  thing  is  not  always  beoer   •  Dimensioning  return     –  Targets  diversity     –  Ex:  TDN    [ElArini  et.  Al.  KDD  09]  
  • 120. Sequen>al  Click-­‐View  Model  Modeling  Views  based  on  posi>on     1 p(vi = 1 | vi 1 = 1, ci 1 = 1) = (1 + exp( ↵i )) 1 p(vi = 1 | vi 1 = 1, ci 1 = 0) = (1 + exp( i ))
  • 121. Sequen>al  Click-­‐View  Model  Modeling  clicks  using  posi>on  and  informa>on  gain   1p(ci = 1 | vi = 1, c1,...,i 1) = Pi 1 (1 + exp( i j=1 cj ⇢(Ai ) + ⇢(Ai 1 )))Coverage  func>on’s  weights  are  learnt   Threshold   #clicks   Informa>on  gain  
  • 122. Sequen>al  Click-­‐View  Model   Selected   Summary   Modular   XX ⇣ X ⌘ ⇢(D|S) := [s]j aj [d]j + bj ⇢j (D) . s2S j d2DStory   Submodula Features   r  
  • 123. Online  Inference  •  Treat  missing  views  as  hidden  variables   –  Realis>c  interac>on  model  •  Use  the  online  EM  algorithm   –  Infer  the  value  of  hidden  variables  •  Op>mize  parameters  using  SGD   –  Use  addi>ve  weights   •  Background  +  story  +  category  +  user  
  • 124. Online  Inference   X⇤ = arg min log p(c| , d) + ⌦( ) (c,d) = 0 + u + s + c.
  • 125. How  Does  it  Work?  
  • 126. How  Does  It  Work?  
  • 127. 5.  Summary  Future  Direc>ons  
  • 128. Summary  •  Tools   –  Load  distribu>on,  balancing  and  synchroniza>on   –  Clustering,  Topic  Models  •  Models   –  Dynamic  non-­‐parametric  models   –  Sequen>al  latent  variable  models  •  Inference  Algorithms   –  Distributed  batch   –  Sequen>al  Monte  Carlo  •  Applica>ons   –  User  profiling   –  News  content  analysis  &  recommenda>on  
  • 129. Future  Direc>ons  •  Theore>cal  bounds  and  guarantees  •   Network  data   –  Graph  par>>oning  •  Non-­‐parametric  models   –  Learning  structure  from  data  •  Working  under  communica>on  constraints  •  Data  distribu>on  for  par>cle  filters