Digging into the Dirichlet Distribution by Max Sklar

7,146 views
7,376 views

Published on

When it comes to recommendation systems and natural language processing, data that can be modeled as a multinomial or as a vector of counts is ubiquitous. For example if there are 2 possible user-generated ratings (like and dislike), then each item is represented as a vector of 2 counts. In a higher dimensional case, each document may be expressed as a count of words, and the vector size is large enough to encompass all the important words in that corpus of documents. The Dirichlet distribution is one of the basic probability distributions for describing this type of data.

The Dirichlet distribution is surprisingly expressive on its own, but it can also be used as a building block for even more powerful and deep models such as mixtures and topic models.

Published in: Technology, Education
0 Comments
15 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
7,146
On SlideShare
0
From Embeds
0
Number of Embeds
4,962
Actions
Shares
0
Downloads
131
Comments
0
Likes
15
Embeds 0
No embeds

No notes for slide

Digging into the Dirichlet Distribution by Max Sklar

  1. 1. Digging into the Dirichlet Max Sklar @maxsklar New York Machine Learning Meetup December 19th, 2013
  2. 2. Dedication Meyer Marks 1925 - 2013
  3. 3. The Dirichlet Distribution
  4. 4. Let’s start with something simpler
  5. 5. Let’s start with something simpler A Pie Chart!
  6. 6. Let’s start with something simpler A Pie Chart! AKA Discrete Distribution Multinomial Distribution
  7. 7. Let’s start with something simpler A Pie Chart! K = The number of categories. K=5
  8. 8. Examples of Multinomial Distributions
  9. 9. Examples of Multinomial Distributions
  10. 10. Examples of Multinomial Distributions
  11. 11. Examples of Multinomial Distributions
  12. 12. Examples of Multinomial Distributions
  13. 13. What does the raw data look like?
  14. 14. What does the raw data look like? id # dislikes 1 231 23 2 Counts! # likes 81 40 3 67 9 4 121 14 5 9 31 6 18 0 7 1 1
  15. 15. What does the raw data look like? More specifically: - K columns of counts - N rows of data id # likes # dislikes 1 231 23 2 81 40 3 67 9 4 121 14 5 9 31 6 18 0 7 1 1
  16. 16. BUT... Counts != Multinomial Distribution
  17. 17. BUT... We can estimate the multinomial distribution with the counts, using the maximum likelihood estimate 366 181 203
  18. 18. BUT... We can estimate the multinomial distribution with the counts, using the maximum likelihood estimate Sum = 366 + 181 + 203 = 750 366 181 203
  19. 19. BUT... We can estimate the multinomial distribution with the counts, using the maximum likelihood estimate 366 / 750 181 / 750 203 / 750 366 181 203
  20. 20. BUT... We can estimate the multinomial distribution with the counts, using the maximum likelihood estimate 48.8% 24.1% 27.1% 366 181 203
  21. 21. BUT... 366 203 1 Uh Oh 181 2 1
  22. 22. BUT... 366 This column will be all Yellow right? 181 203 1 2 1 0 1 0
  23. 23. BUT... 366 203 1 Panic!!!! 181 2 1 0 1 0 0 0 0
  24. 24. Bayesian Statistics to the Rescue
  25. 25. Bayesian Statistics to the Rescue Still assume each row was generated by a multinomial distribution
  26. 26. Bayesian Statistics to the Rescue Still assume each row was generated by a multinomial distribution We just don’t know which one!
  27. 27. The Dirichlet Distribution Is a probability distribution over all possible multinomial distributions, p.
  28. 28. The Dirichlet Distribution ? Represents our uncertainty over the actual distribution that created the row. ?
  29. 29. The Dirichlet Distribution p: represents a multinomial distribution alpha: the parameters of the dirichlet K: the number of categories
  30. 30. Bayesian Updates
  31. 31. Bayesian Updates Also a Dirichlet!
  32. 32. Bayesian Updates Also a Dirichlet! (Conjugate Prior)
  33. 33. Bayesian Updates
  34. 34. Bayesian Updates
  35. 35. Bayesian Updates +1
  36. 36. Why Does this Work? Let’s look at it again.
  37. 37. Entropy
  38. 38. Entropy Information Content
  39. 39. Entropy Information Content Energy
  40. 40. Entropy Information Content Energy Log Likelihood
  41. 41. Entropy Information Content Energy Log Likelihood
  42. 42. The Dirichlet Distribution
  43. 43. The Dirichlet Distribution Normalizing Constant
  44. 44. The Dirichlet Distribution Normalizing Constant
  45. 45. The Dirichlet Distribution
  46. 46. The Dirichlet Distribution
  47. 47. The Dirichlet Distribution
  48. 48. The Dirichlet Distribution Linear
  49. 49. The Dirichlet MACHINE Prior 1.2 3.0 0.3
  50. 50. The Dirichlet MACHINE Prior 1.2 3.0 0.3
  51. 51. The Dirichlet MACHINE Update 2.2 3.0 0.3
  52. 52. The Dirichlet MACHINE Prior 2.2 3.0 0.3
  53. 53. The Dirichlet MACHINE Update 2.2 3.0 1.3
  54. 54. The Dirichlet MACHINE Prior 2.2 3.0 1.3
  55. 55. The Dirichlet MACHINE Update 2.2 3.0 2.3
  56. 56. Interpreting the Parameters 1 3 2 What does this alpha vector really mean?
  57. 57. Interpreting the Parameters 1 normalized 1/6 3/6 3 2 sum 2/6 6
  58. 58. Interpreting the Parameters 1 Expected Value 3 2 6
  59. 59. Interpreting the Parameters 1 Expected Value 3 2 6 Weight
  60. 60. ANALOGY: Normal Distribution Precision = 1 / variance
  61. 61. ANALOGY: Normal Distribution High precision: data is close to the mean Low precision: far away from the mean
  62. 62. Interpreting the Parameters 1 Expected Value 3 2 6 Precision
  63. 63. High Weight Dirichlet 0.4 0.4 0.2
  64. 64. Low Weight Dirichlet 0.4 0.4 0.2
  65. 65. Urn Model At each step, pick a ball from the urn.. Replace it, and add another ball of that color into the urn
  66. 66. Urn Model At each step, pick a ball from the urn.. Replace it, and add another ball of that color into the urn
  67. 67. Urn Model At each step, pick a ball from the urn.. Replace it, and add another ball of that color into the urn
  68. 68. Urn Model At each step, pick a ball from the urn.. Replace it, and add another ball of that color into the urn
  69. 69. Urn Model At each step, pick a ball from the urn.. Replace it, and add another ball of that color into the urn
  70. 70. Urn Model Rich get richer...
  71. 71. Urn Model Rich get richer...
  72. 72. Urn Model Finally yellow catches a break
  73. 73. Urn Model Finally yellow catches a break
  74. 74. Urn Model But it’s too late...
  75. 75. Urn Model As the urn gets more populated, the distribution gets “stuck” in place.
  76. 76. Urn Model Once lots of data has been collected, or the dirichlet has high precision, it’s hard to overturn that with new data
  77. 77. Chinese Restaurant Process When you find the white ball, throw a new color into the mix.
  78. 78. Chinese Restaurant Process When you find the white ball, throw a new color into the mix.
  79. 79. Chinese Restaurant Process When you find the white ball, throw a new color into the mix.
  80. 80. Chinese Restaurant Process When you find the white ball, throw a new color into the mix.
  81. 81. Chinese Restaurant Process When you find the white ball, throw a new color into the mix.
  82. 82. Chinese Restaurant Process When you find the white ball, throw a new color into the mix.
  83. 83. Chinese Restaurant Process When you find the white ball, throw a new color into the mix.
  84. 84. Chinese Restaurant Process When you find the white ball, throw a new color into the mix.
  85. 85. Chinese Restaurant Process The expected infinite distribution (mean) is exponential. # of white balls controls the exponent
  86. 86. So coming back to the count data: 20 0 2 What Dirichlet parameters explain the data? 0 1 17 14 6 0 15 5 0 0 20 0 0 14 6
  87. 87. So coming back to the count data: 20 0 2 Newton’s Method: Requires Gradient + Hessian 0 1 17 14 6 0 15 5 0 0 20 0 0 14 6
  88. 88. So coming back to the count data: 20 0 2 Reads all of the data... 0 1 17 14 6 0 15 5 0 0 20 0 0 14 6
  89. 89. So coming back to the count data: 20 https://github. com/maxsklar/Baye sPy/tree/master/Con jugatePriorTools 0 0 2 1 17 14 6 0 15 5 0 0 20 0 0 14 6
  90. 90. So coming back to the count data: Compress the data into a Matrix and a Vector: Works for lots of sparsely populated rows 20 0 0 2 1 17 14 6 0 15 5 0 0 20 0 0 14 6
  91. 91. The Compression MACHINE 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  92. 92. The Compression MACHINE K=3 (the 4th row is a special, total row) M=6 The maximum # samples per input 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  93. 93. The Compression MACHINE 1 0 0 K=3 (the 4th row is a special, total row) M=6 The maximum # samples per input 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  94. 94. The Compression MACHINE K=3 (the 4th row is a special, total row) M=6 The maximum # samples per input 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
  95. 95. The Compression MACHINE 1 3 2 K=3 (the 4th row is a special, total row) M=6 The maximum # samples per input 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
  96. 96. The Compression MACHINE K=3 (the 4th row is a special, total row) M=6 The maximum # samples per input 2 0 0 0 0 0 1 1 1 0 0 0 1 1 0 0 0 0 2 1 1 1 1 1
  97. 97. The Compression MACHINE 0 6 0 K=3 (the 4th row is a special, total row) M=6 The maximum # samples per input 2 0 0 0 0 0 1 1 1 0 0 0 1 1 0 0 0 0 2 1 1 1 1 1
  98. 98. The Compression MACHINE K=3 (the 4th row is a special, total row) M=6 The maximum # samples per input 2 0 0 0 0 0 2 2 2 1 1 1 1 1 0 0 0 0 3 2 2 2 2 2
  99. 99. The Compression MACHINE 2 1 1 K=3 (the 4th row is a special, total row) M=6 The maximum # samples per input 2 0 0 0 0 0 2 2 2 1 1 1 1 1 0 0 0 0 3 2 2 2 2 2
  100. 100. The Compression MACHINE K=3 (the 4th row is a special, total row) M=6 The maximum # samples per input 3 1 0 0 0 0 3 2 2 1 1 1 2 1 0 0 0 0 4 3 3 3 2 2
  101. 101. DEMO
  102. 102. Our Popularity Prior
  103. 103. Dirichlet Mixture Models Anything you can go with a Gaussian, you can also do with a Dirichlet
  104. 104. Dirichlet Mixture Models Example: Mixture of Gaussians using ExpectationMaximization
  105. 105. Dirichlet Mixture Models Assume each row is a mixture of multinomials. And the parameters of that mixture are pulled from a Dirichlet. ?
  106. 106. Dirichlet Mixture Models Latent Dirichlet Allocation Topic Model
  107. 107. Questions

×