Your SlideShare is downloading.
×

×
Saving this for later?
Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.

Text the download link to your phone

Standard text messaging rates apply

Like this presentation? Why not share!

- Modeling Topic Hierarchies with the... by Minsu Ko 417 views
- Digging into the Dirichlet Distribu... by Hakka Labs 5166 views
- Hierarchical Topic Detection and Re... by Yash Vadalia 149 views
- F0372032035 by inventionjournals 41 views
- Nagios Conference 2011 - Mike Weber... by Nagios 1721 views
- Gaussian processes for machine lear... by bin21 1677 views

1,579

Published on

Nested Chinese Restaurant Process Latent Dirichlet Allocation Hierarchical CRP LDA Minsu Ko OwlNest

Nested Chinese Restaurant Process Latent Dirichlet Allocation Hierarchical CRP LDA Minsu Ko OwlNest

Published in:
Technology

No Downloads

Total Views

1,579

On Slideshare

0

From Embeds

0

Number of Embeds

4

Shares

0

Downloads

0

Comments

0

Likes

8

No embeds

No notes for slide

- Hierarchical Topic Models and the Nested Chinese Restaurant Process - In Advances in Neural Information Processing Systems (2004) David M. Blei , Thomas L. Griﬃths , Michael I. Jordan , Joshua B. Tenenbaum Presented by Minsu Ko ryan0802@owl-nest.com http://owl-nest.com/lab/1/1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- Contents 1. Introduction 2. Chinese restaurant process 2.1 The Chinese restaurant process 2.2 Extending the CRP to hierarchies 3. A hierarchical topic model 4. Approximate inference by Gibbs sampling 5. Examples and empirical results 6. Summary2/1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 1. Introduction Research Abstraction The problem of learning topic hierarchies from data. A Bayesian approach, generating an appropriate prior via a distribution on partitions that we refer to as the nested Chinese restaurant process. The nonparametric prior allows arbitrarily large branching factors and readily accommodates growing data collections. A hierarchical topic model by combining the prior with a likelihood that is based on a hierarchical variant of latent Dirichlet allocation. Illustration of this approach on simulated data and with an application to the modeling of NIPS abstracts.3/1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 1. Introduction The problem of document classiﬁcation One-class approach? one topic per document, with words generated according to the topic. (For example, a Naive Bayes model.) It is more realistic to assume more than one topic per document! Generative model: pick a mixture distribution over K topics and generate words from it. Even more realistic: topics may be organized in a hierarchy. (not independent) Pick a path from root to leaf in a tree; each node is a topic; sample from the mixture.4/1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- Parametrics vs. Nonparametrics Parametric models Parametric models can only capture a bounded amount of information from data, since they have bounded complexity. Real data is often complex and the parametric assumption is often wrong. Nonparametric models Nonparametric models allow relaxation of the parametric assumption, bringing signiﬁcant ﬂexibility to our models of the world. Nonparametric models can also often lead to model selection/averaging behaviours without the cost of actually doing model selection/averaging. Nonparametric models are gaining popularity, spurred by growth in computational resources and inference algorithms.5/1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- Parametrics vs. Nonparametrics Parametric Topic Models The representative model is LDA (Latent Dirichlet Allocation) (Blei, 2004) Number of topics must be determined for the corpus. Like K-means, user should select appropriate number of topics K before training the model. A Limitation of Parametric Topic Models It is diﬃcult to determine the number of topics for a corpus. Computing optimal number of topics is very time-consuming. The optimal number of topics varies for each corpus. Non-Parametric Topic Models Assumes an inﬁnite number of topics. Model automatically captures the appropriate number of topics.6/1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- Parametrics vs. Nonparametrics Question: Are Nonparametric Models Nonparametric? Nonparametric just means not parametric: cannot be described by a ﬁxed set of parameters. Nonparametric models still have parameters, they just have an inﬁnite number of them. No free lunch: cannot learn from data unless you make assumptions. Nonparametric models still make modelling assumptions, they are just less constrained than the typical parametric models. Models can be nonparametric in one sense and parametric in another: ⇒ semiparametric models.7/1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 2. Chinese restaurant process What is the Chinese restaurant process? The Chinese restaurant process (CRP) gives us a distribution over partitions. The CRP has been used to represent uncertainty over the number of components in a mixture model. Suppose that we have a collection of observations, and we want to cluster/partition them into groups. Every possible group corresponds to a table in an inﬁnitely large Chinese restaurant. Each observation corresponds to a customer entering the restaurant and sitting at a table.8/1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 2. Chinese restaurant process Generating from the CRP: First customer sits at the ﬁrst table. Customer n sits at: nk Table k with probability λ+n−1 where nk is the number of previous customers at table k . λ A new table k + 1 with probability λ+n−1 . Customers ⇔ integers, tables ⇔ clusters. The CRP exhibits the clustering property of the Dirichlet process (DP). nk p(occupied table i|previous customers) = λ+n−1 λ p(next unoccupied table|previous customers) = λ+n−19/1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 2. Chinese restaurant process To see how this works, zi : an indicator variable that tells us which table the ith customer is sitting at N : customers sitting in the restaurant z = (z1 , z2 , · · · , zN ) : a vector of table assignments For example, z = (1, 1, 2, 1, 3, 4) Table 1 : customer 1, 2, 4. Table 2 : customer 3. Table 3 : customer 5. Table 4 : customer 6. ⇒ the counts vector n = (n1 , n2 , · · · , nK ) ⇒ n = (3, 1, 1, 1) The actual table numbers don’t mean anything: they’re just convenient indexing variables. That is, z = (1, 1, 2, 1, 3, 4) and z = (2, 2, 1, 2, 4, 3) are eﬀectively equivalent. The probability of a particular set of assignments z (with corresponding count vector n) for a CRP with concentration parameter is as follows: Γ(α) K Γ(nk ) K k=1 P(z|α, N) = α (1) Γ(N + α)10 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 2. Chinese restaurant process CRP to Topic Model Each restaurant has an inﬁnite number of tables. N customers are sequentially sitting down at the tables. Each table has one dish to serve.11 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 2. Chinese restaurant process CRP to Topic Model (from Dongwoo Kim’s presentation slides) Consider a restaurant as a document. Consider a customer as a word. Consider a table as a topic.12 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 2. Chinese restaurant process CRP to Topic Model First customer spam is coming to document1 restaurant and sitting at the table T1-1.13 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 2. Chinese restaurant process CRP to Topic Model Second customer email is coming to document1 restaurant and considering where to sit. Probability of email sitting at ⇒ an occupied table is proportional to the number of customers already sitting at that table. ⇒ new table is proportional to a constant λ14 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 2. Chinese restaurant process CRP to Topic Model Above result shows conﬁguration after N customers are sitting at the tables. This process represents how CRP works.15 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 2.2 Extending the CRP to hierarchies A nested Chinese restaurant process16 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 2.2 Extending the CRP to hierarchies A nested Chinese restaurant process17 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 2.2 Extending the CRP to hierarchies A nested Chinese restaurant process18 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 2.2 Extending the CRP to hierarchies A nested Chinese restaurant process19 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 2.2 Extending the CRP to hierarchies A nested Chinese restaurant process20 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 2.2 Extending the CRP to hierarchies A nested Chinese restaurant process21 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 2.2 Extending the CRP to hierarchies A nested Chinese restaurant process22 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 2.2 Extending the CRP to hierarchies A nested Chinese restaurant process23 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 2.2 Extending the CRP to hierarchies A nested Chinese restaurant process24 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 2.2 Extending the CRP to hierarchies A nested Chinese restaurant process25 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 2.2 Extending the CRP to hierarchies A nested Chinese restaurant process26 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 2.2 Extending the CRP to hierarchies A nested Chinese restaurant process27 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 2.2 Extending the CRP to hierarchies A nested Chinese restaurant process28 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 2.2 Extending the CRP to hierarchies To generate a document given a tree with L levels Choose a path from the root of the tree to a leaf. Draw a vector θ of topic mixing proportions from an L-dimensional Dirichlet. Generate the words in the document from a mixture of the topics along the path, with mixing proportions θ. The role of CRP for hLDA Branching factor of nodes. It determines to generate topics in a level.29 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 3. A hierarchical topic model The hierarchical LDA (hLDA) Generative model of multiple-topic documents. Generate a mixture distribution on topics using a Nested CRP prior. Topics are joined together in a hierarchy by using the nested Chinese restaurant process. Pick a topic according to their distribution and generate words according to the word distribution for the topic.30 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 3. A hierarchical topic model Observation Topics are not independent. Example The topic of CS consists of AI, Systems, Theory, etc. AI consists of NLP, Machine Learning, Robotics, Vision, etc. Question How to encode dependencies between topics?31 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- Reviews on Topic Models Notation and terminology Word: the basic unit from a vocabulary of size V (includes V distinct words). Document: a sequence of N words. W = [w1 , w2 , · · · , wN ] Corpus: a collection of M documents. D = [W1 , W2 , · · · , WN ] α, β : hyperparameters, specifying the nature of the priors on θ and φ Basic assumptions: The words in a document are generated according to a mixture model where the mixing proportions are random and document-speciﬁc.32 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- Reviews on Topic Models The Dirichlet distribution The Dirichlet distribution is a distribution over distribution. The Dirichlet distribution is an exponential family distribution over the simplex, i.e., positive vectors that sum to one ⇒ in other words: a draw from a Dirichlet distribution is a vector of positive real numbers that sum up to one. Γ( i αi ) p(θ|→) = − α θiαi −1 (2) i Γ(αi ) i It is conjugate to the multinomial. Given a multinomial observation, the posterior distribution of θ is a Dirichlet. The parameter α constrols the mean shape and sparsity of θ. The topic proportions are a K -dimensional Dirichlet. The topics are a V -dimensional Dirichlet.33 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- Reviews on Topic Models The Dirichlet distribution The Dirichlet Distribution is parameterised by a set of concentration constants α deﬁned over the k-simplex α = (α1 · · · αk ), αi ≥ 0, i ∈ 1 · · · k A draw from a Dirichlet Distribution written as: φ ∼ Dirichletk (α) where φ is a multinomial distribution over k outcomes. Each point on a k-dimensional simplex is a multinomial probability distribution: i φi = 134 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- Reviews on Topic Models The Dirichlet distribution Example draws from a Dirichlet Distribution over the 3-simplex:35 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- Reviews on Topic Models - Latent Dirichlet Allocation (LDA) · M, N, V , k : ﬁxed known parameters · α, β : ﬁxed unknown parameters · θ, z, w : Random variables (w are observable) · θ is a document-level variable, z and w are word-level variables. Generative process for each document W in a corpus D: 1 Choose θ Dirichlet(α), θ and α are L-dim. 2 For each of the N words wn in the document W (a) Choose a topic zn ∼ Multinomial(θ) (b) Choose a word wn ∼ Multinomial(β(zn ) ), β is a k × V matrix. βij = p(w j = 1|z i = 1)36 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- Hierarchical Latent Dirichlet Allocation (hLDA) Key diﬀerence from LDA: · Topics β are organized as an L-level tree structure, instead of a k × V matrix. · L is prespeciﬁed manually. Generative process for each document W in a corpus D: 1 Choose a path from the root of the topic tree to a leaf. The path includes L topics. 2 Choose θ ∼ Dirichlet(α), θ and α are L-dim. 3 For each of the N words wn in the document W (a) Choose a topic zn ∼ Multinomial(θ) (b) Choose a word wn ∼ Multinomial(β(zn ) ), β(zn ) is a V -dimensional vector, which is the multinomial parameter for the zn th topic along the path from root to leaf, chosen by step 1.37 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 3. A hierarchical topic model Simple representation of hLDA 1 Let c1 be the root restaurant. 2 For each level l ∈ {2, · · · , L} : (a) Draw a table from restaurant cl−1 . Set cl to be the restaurant referred to by that table. 3 Draw an L-dimensional topic proportion vector θ from Dir (α). 4 For each word n ∈ {1, · · · , N} : (a) Draw z ∈ {1, · · · , L} from Mult(θ). (b) Draw wn from the topic associated with restaurant cz .38 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 3. A hierarchical topic model hLDA A document is generated by sampling words from the topics along a single path from the root to leaf node of a topic tree. Tree depth L is ﬁxed, the number of topics is inferred using a nested CRP. Symmetric Dirichlet distribution All of the elements making up the parameter vector α have the same value. Since all elements of the parameter vector have the same value, the distribution alternatively can be parametrized by a single scalar value α, called the concentration parameter. The density function then simpliﬁes to K Γ(αK ) f (x1 , . . . , xK −1 ; α) = xiα−1 (3) Γ(α)K i=139 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 4. Approximate inference by Gibbs sampling Gibbs sampling. Gibbs sampling is commonly used for statistical inference to determine the best value of a parameter. (e.g., θ in LDA) e.g., Determining the number of people likely to shop at a particular store on a given day, the candidate a voter will most likely vote for. The standard setup for Gibbs sampling over a space of variables a,b,c. Draw a conditioned on b,c Draw b conditioned on a,c Draw c conditioned on a,b40 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 4. Approximate inference by Gibbs sampling Gibbs sampler in the hLDA model. The Gibbs sampler provides a method for simultaneously exploring the particular topics of the corpus and the L-level trees. The variables needed by the sampling alrogithm. wm,n : the nth word in the mth document (the only observed variables in the model) cm,l : the restaurant corresponding to the lth topic in document m zm,n : the assignment of the nth word in the mth document to one of the L available topics. All other variables in the model (θ and β) are integrated out. ⇒ The Gibbs sampler thus assesses the values of zm,n and cm,l .41 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 4. Approximate inference by Gibbs sampling Two parts of the Gibbs sampler in this research. 1 Given the current state of the CRP, sample the zm,n variables of the underlying LDA model following the algorithm developed in (Griﬃths and Steyvers, 2002) (6). 2 Given the values of the LDA hidden variables, sample the cm,l variables which are associated with the CRP prior.42 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 4. Approximate inference by Gibbs sampling The conditional posterior distribution for zi (Griﬃths and Steyvers, 2002) z−i : the assignment of all zk , k = i. (w ) i n−i,j : the number of words assigned to topic j that are same as wi . (·) n−i,j : the total number of words assigned to topic j. (d ) i n−i,j : the number of words from document di assigned to topic j. (d ) n−ii : the total number of words in document di . α, β : free parameters that determine how heavily these empirical distributions are smoothed. i(w ) (d ) i n−i,j + β n−i,j + α P(zi = j|z−i , W ) ∝ (·) (d ) (4) n−i,j + W β n−ii + T α43 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 4. Approximate inference by Gibbs sampling The conditional distribution for cm , the L topics associated with document m is, p(wm |c, w−m , z) : the likelihood of the data given a particular choice of cm p(cm |c−m ) : the prior on cm implied by the nested CRP p(cm |w , c−m , z) ∝ p(wm |c, w−m , z)p(cm |c−m ) (5) The likelihood is obtained by integrating over the parameters β, which gives: (w ) ncm,l ,−m : the number of instances of word w that have been assigned to the topic indexed by cm,l , not including those in the current document. W : the total vocabulary size. Γ(·) : the standard gamma function. (·) (w ) (w ) L Γ(ncm,l ,−m + W η) Γ(ncm,l ,−m + ncm,l ,m + η) w p(wm |c, w−m , z) = (w ) (·) (·) (6) l=1 w Γ(ncm,l ,−m + η) Γ(ncm,l ,−m + ncm,l ,m + W η)44 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 4. Approximate inference by Gibbs sampling45 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 5. Examples and empirical results Sampling Let the sampler burn in for 10000 iterations. Take samples 100 iterations apart for another 1000 iterations. Local maxima can be a problem in the hLDA model. Restart randomly the sampler 25 times and take Take the trajectory with the highest average posterior likelihood. A unit of iteration D1 w0 · · · wn D2 w0 · · · wn ⇒ . . ··· . . . ··· . Dn . ··· wn46 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 5. Examples and empirical results Corpus Generation Corpus of 100 1000-word documents. Three-level hierarchy. Vocabulary of 25 terms. Topics on the vocabulary can be viewed as bars on a 5 × 5 grid.47 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 5. Examples and empirical results48 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 5. Examples and empirical results49 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- NIPS abstracts A text data set 1717 NIPS abstracts from 1987-1999 (http://www.cs.toronto.edu/roweis/data.html) 208,896 words, 1600 terms50 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant
- 6. Summary Summary The result is a ﬂexible, general model for topic hierarchies that naturally accomodates growing data collections. Gibbs sampling procedure provides a simple method for simultaneously exploring the spaces of trees and topics. Diﬀerent documents can express paths of diﬀerent lengths. Syntactic structures such as paragraphs and sentences within a document. Comment Using this model, we can break down the documents into topics, but, Relationship among the documents? Relationship among the topics?51 / 1 OwlNest Corp. Hierarchical Topic Models and the Nested Chinese Restaurant

Be the first to comment