Your SlideShare is downloading.
×

×
# Introducing the official SlideShare app

### Stunning, full-screen experience for iPhone and Android

#### Text the download link to your phone

Standard text messaging rates apply

Like this presentation? Why not share!

- LDA Beginner's Tutorial by Wayne Lee 4623 views
- Interactive Latent Dirichlet Alloca... by Quentin Pleplé 114 views
- Gentle Introduction to Dirichlet Pr... by Yap Wooi Hen 1848 views
- Crowdsourcing Series: LinkedIn. By ... by Hakka Labs 1693 views
- Topic Models - LDA and Correlated T... by Claudia Wagner 2955 views
- LDA Basics by Minsu Ko 2850 views
- Multinomial Model Simulations by tim_hare 610 views
- Bayesian Nonparametrics: Models Bas... by Alessandro Panella 539 views
- Topic model, LDA and all that by Zhibo Xiao 2072 views
- LDA by henri2005 242 views
- Topics Modeling by Svitlana volkova 3878 views
- Topic Models by Claudia Wagner 8237 views

Like this? Share it with your network
Share

4,512

views

views

Published on

When it comes to recommendation systems and natural language processing, data that can be modeled as a multinomial or as a vector of counts is ubiquitous. For example if there are 2 possible …

When it comes to recommendation systems and natural language processing, data that can be modeled as a multinomial or as a vector of counts is ubiquitous. For example if there are 2 possible user-generated ratings (like and dislike), then each item is represented as a vector of 2 counts. In a higher dimensional case, each document may be expressed as a count of words, and the vector size is large enough to encompass all the important words in that corpus of documents. The Dirichlet distribution is one of the basic probability distributions for describing this type of data.

The Dirichlet distribution is surprisingly expressive on its own, but it can also be used as a building block for even more powerful and deep models such as mixtures and topic models.

No Downloads

Total Views

4,512

On Slideshare

0

From Embeds

0

Number of Embeds

3

Shares

0

Downloads

70

Comments

0

Likes

3

No embeds

No notes for slide

- 1. Digging into the Dirichlet Max Sklar @maxsklar New York Machine Learning Meetup December 19th, 2013
- 2. Dedication Meyer Marks 1925 - 2013
- 3. The Dirichlet Distribution
- 4. Let’s start with something simpler
- 5. Let’s start with something simpler A Pie Chart!
- 6. Let’s start with something simpler A Pie Chart! AKA Discrete Distribution Multinomial Distribution
- 7. Let’s start with something simpler A Pie Chart! K = The number of categories. K=5
- 8. Examples of Multinomial Distributions
- 9. Examples of Multinomial Distributions
- 10. Examples of Multinomial Distributions
- 11. Examples of Multinomial Distributions
- 12. Examples of Multinomial Distributions
- 13. What does the raw data look like?
- 14. What does the raw data look like? id # dislikes 1 231 23 2 Counts! # likes 81 40 3 67 9 4 121 14 5 9 31 6 18 0 7 1 1
- 15. What does the raw data look like? More specifically: - K columns of counts - N rows of data id # likes # dislikes 1 231 23 2 81 40 3 67 9 4 121 14 5 9 31 6 18 0 7 1 1
- 16. BUT... Counts != Multinomial Distribution
- 17. BUT... We can estimate the multinomial distribution with the counts, using the maximum likelihood estimate 366 181 203
- 18. BUT... We can estimate the multinomial distribution with the counts, using the maximum likelihood estimate Sum = 366 + 181 + 203 = 750 366 181 203
- 19. BUT... We can estimate the multinomial distribution with the counts, using the maximum likelihood estimate 366 / 750 181 / 750 203 / 750 366 181 203
- 20. BUT... We can estimate the multinomial distribution with the counts, using the maximum likelihood estimate 48.8% 24.1% 27.1% 366 181 203
- 21. BUT... 366 203 1 Uh Oh 181 2 1
- 22. BUT... 366 This column will be all Yellow right? 181 203 1 2 1 0 1 0
- 23. BUT... 366 203 1 Panic!!!! 181 2 1 0 1 0 0 0 0
- 24. Bayesian Statistics to the Rescue
- 25. Bayesian Statistics to the Rescue Still assume each row was generated by a multinomial distribution
- 26. Bayesian Statistics to the Rescue Still assume each row was generated by a multinomial distribution We just don’t know which one!
- 27. The Dirichlet Distribution Is a probability distribution over all possible multinomial distributions, p.
- 28. The Dirichlet Distribution ? Represents our uncertainty over the actual distribution that created the row. ?
- 29. The Dirichlet Distribution p: represents a multinomial distribution alpha: the parameters of the dirichlet K: the number of categories
- 30. Bayesian Updates
- 31. Bayesian Updates Also a Dirichlet!
- 32. Bayesian Updates Also a Dirichlet! (Conjugate Prior)
- 33. Bayesian Updates
- 34. Bayesian Updates
- 35. Bayesian Updates +1
- 36. Why Does this Work? Let’s look at it again.
- 37. Entropy
- 38. Entropy Information Content
- 39. Entropy Information Content Energy
- 40. Entropy Information Content Energy Log Likelihood
- 41. Entropy Information Content Energy Log Likelihood
- 42. The Dirichlet Distribution
- 43. The Dirichlet Distribution Normalizing Constant
- 44. The Dirichlet Distribution Normalizing Constant
- 45. The Dirichlet Distribution
- 46. The Dirichlet Distribution
- 47. The Dirichlet Distribution
- 48. The Dirichlet Distribution Linear
- 49. The Dirichlet MACHINE Prior 1.2 3.0 0.3
- 50. The Dirichlet MACHINE Prior 1.2 3.0 0.3
- 51. The Dirichlet MACHINE Update 2.2 3.0 0.3
- 52. The Dirichlet MACHINE Prior 2.2 3.0 0.3
- 53. The Dirichlet MACHINE Update 2.2 3.0 1.3
- 54. The Dirichlet MACHINE Prior 2.2 3.0 1.3
- 55. The Dirichlet MACHINE Update 2.2 3.0 2.3
- 56. Interpreting the Parameters 1 3 2 What does this alpha vector really mean?
- 57. Interpreting the Parameters 1 normalized 1/6 3/6 3 2 sum 2/6 6
- 58. Interpreting the Parameters 1 Expected Value 3 2 6
- 59. Interpreting the Parameters 1 Expected Value 3 2 6 Weight
- 60. ANALOGY: Normal Distribution Precision = 1 / variance
- 61. ANALOGY: Normal Distribution High precision: data is close to the mean Low precision: far away from the mean
- 62. Interpreting the Parameters 1 Expected Value 3 2 6 Precision
- 63. High Weight Dirichlet 0.4 0.4 0.2
- 64. Low Weight Dirichlet 0.4 0.4 0.2
- 65. Urn Model At each step, pick a ball from the urn.. Replace it, and add another ball of that color into the urn
- 66. Urn Model At each step, pick a ball from the urn.. Replace it, and add another ball of that color into the urn
- 67. Urn Model At each step, pick a ball from the urn.. Replace it, and add another ball of that color into the urn
- 68. Urn Model At each step, pick a ball from the urn.. Replace it, and add another ball of that color into the urn
- 69. Urn Model At each step, pick a ball from the urn.. Replace it, and add another ball of that color into the urn
- 70. Urn Model Rich get richer...
- 71. Urn Model Rich get richer...
- 72. Urn Model Finally yellow catches a break
- 73. Urn Model Finally yellow catches a break
- 74. Urn Model But it’s too late...
- 75. Urn Model As the urn gets more populated, the distribution gets “stuck” in place.
- 76. Urn Model Once lots of data has been collected, or the dirichlet has high precision, it’s hard to overturn that with new data
- 77. Chinese Restaurant Process When you find the white ball, throw a new color into the mix.
- 78. Chinese Restaurant Process When you find the white ball, throw a new color into the mix.
- 79. Chinese Restaurant Process When you find the white ball, throw a new color into the mix.
- 80. Chinese Restaurant Process When you find the white ball, throw a new color into the mix.
- 81. Chinese Restaurant Process When you find the white ball, throw a new color into the mix.
- 82. Chinese Restaurant Process When you find the white ball, throw a new color into the mix.
- 83. Chinese Restaurant Process When you find the white ball, throw a new color into the mix.
- 84. Chinese Restaurant Process When you find the white ball, throw a new color into the mix.
- 85. Chinese Restaurant Process The expected infinite distribution (mean) is exponential. # of white balls controls the exponent
- 86. So coming back to the count data: 20 0 2 What Dirichlet parameters explain the data? 0 1 17 14 6 0 15 5 0 0 20 0 0 14 6
- 87. So coming back to the count data: 20 0 2 Newton’s Method: Requires Gradient + Hessian 0 1 17 14 6 0 15 5 0 0 20 0 0 14 6
- 88. So coming back to the count data: 20 0 2 Reads all of the data... 0 1 17 14 6 0 15 5 0 0 20 0 0 14 6
- 89. So coming back to the count data: 20 https://github. com/maxsklar/Baye sPy/tree/master/Con jugatePriorTools 0 0 2 1 17 14 6 0 15 5 0 0 20 0 0 14 6
- 90. So coming back to the count data: Compress the data into a Matrix and a Vector: Works for lots of sparsely populated rows 20 0 0 2 1 17 14 6 0 15 5 0 0 20 0 0 14 6
- 91. The Compression MACHINE 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
- 92. The Compression MACHINE K=3 (the 4th row is a special, total row) M=6 The maximum # samples per input 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
- 93. The Compression MACHINE 1 0 0 K=3 (the 4th row is a special, total row) M=6 The maximum # samples per input 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
- 94. The Compression MACHINE K=3 (the 4th row is a special, total row) M=6 The maximum # samples per input 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
- 95. The Compression MACHINE 1 3 2 K=3 (the 4th row is a special, total row) M=6 The maximum # samples per input 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
- 96. The Compression MACHINE K=3 (the 4th row is a special, total row) M=6 The maximum # samples per input 2 0 0 0 0 0 1 1 1 0 0 0 1 1 0 0 0 0 2 1 1 1 1 1
- 97. The Compression MACHINE 0 6 0 K=3 (the 4th row is a special, total row) M=6 The maximum # samples per input 2 0 0 0 0 0 1 1 1 0 0 0 1 1 0 0 0 0 2 1 1 1 1 1
- 98. The Compression MACHINE K=3 (the 4th row is a special, total row) M=6 The maximum # samples per input 2 0 0 0 0 0 2 2 2 1 1 1 1 1 0 0 0 0 3 2 2 2 2 2
- 99. The Compression MACHINE 2 1 1 K=3 (the 4th row is a special, total row) M=6 The maximum # samples per input 2 0 0 0 0 0 2 2 2 1 1 1 1 1 0 0 0 0 3 2 2 2 2 2
- 100. The Compression MACHINE K=3 (the 4th row is a special, total row) M=6 The maximum # samples per input 3 1 0 0 0 0 3 2 2 1 1 1 2 1 0 0 0 0 4 3 3 3 2 2
- 101. DEMO
- 102. Our Popularity Prior
- 103. Dirichlet Mixture Models Anything you can go with a Gaussian, you can also do with a Dirichlet
- 104. Dirichlet Mixture Models Example: Mixture of Gaussians using ExpectationMaximization
- 105. Dirichlet Mixture Models Assume each row is a mixture of multinomials. And the parameters of that mixture are pulled from a Dirichlet. ?
- 106. Dirichlet Mixture Models Latent Dirichlet Allocation Topic Model
- 107. Questions

Be the first to comment