LDA and Topic Models Explained

Topic Models, LDA and all that 肖智博 2011-03-23

LDA Linear Discriminant Analysis Fisher Linear Discriminant Analysis 有监督学习寻找使类间与类内比最大的投影方向从矩阵的角度 Latent DirichletAllocation 无监督学习从图模型的角度 Just make sure we are on the same page

所需数学知识 Latent Dirichlet Allocation 后验概率逼近方法主题模型的演化 —— 从主题的关系探讨有监督LDA --- MedLDA 主题模型的应用主要内容 Roadmap

后验概率 posteriori 逼近 approximation 采样 sampling 变分法variational 优化 optimization 关键词 Keywords

Approximation methods is useful! GibbsSampling Variational Methods (Convex) Optimization is Useful! Math is almighty! 我的感受 My afterthoughts

概率知识回顾 Beta和Gamma方程 Dirichlet分布多项分布共轭分布贝叶斯网络简介本节内容 Overview

概率知识复习 Chain rule (conditional independence) Bayes rule Marginal distribution Probability Recap

Gamma和Beta方程 Gamma方程 ,[object Object],Gamma and Beta function

Dirichlet分布 ,[object Object],Dirichlet Distribution

方差Multinomial Distribution

共轭分布 Very Important! 如果似然函数和先验分布属于同一分布族，则称两者是共轭分布共轭先验分布可以为计算后验分布提供方便 Conjugrate distribution

共轭分布相似 Conjugrate distribution

David Barber: Bayesian Reasoning and Machine Learning

贝叶斯网络 Bayesian Network

贝叶斯网络(续) Bayesian Network

贝叶斯网络：要解决的问题如何表示满足特定独立性的分布？表示问题 representation 如何利用特定独立性来有效的计算？推断问题 inference 如何辨识数据中的特定独立性？学习问题 learning Bayesian Network : problems to solve

David Barber Bayesian Reasoning and Machine Learning Daphne Koller，NirFriedman Probabilistic Graphical Model Bishop Pattern Recognition and Machine Learning Ch8 Eric Xing Probabilistic Graphical Models 获得更多 Bayesian Network : where to learn more

本节内容:主题模型主题模型 LDA 推断方法 Inference 主题间关系 MedLDA LDA的应用 Topic Model Overview

Hyperspace Analogue to Language (HAL) (Lund and Burgess, 1996)

Bound Encoding of the Aggregate Language Environment(BEAGLE) (Jones and Mewhort, 2007)Topic Model Overview

主要研究者 Mark Steyvers Michael I. Jordan Andrew Ng D Blei Andrew McCallum John Lafferty Eric Xing Fei-Fei Li Researchers in Topic Model

主要研究者 Hanna Wallach Yee WhyeTeh Jun Zhu David Mimno Researchers in Topic Model

重新思考贝叶斯模型 ,[object Object]

一个恰当的先验分布应该避免给可能发生的情况赋予小概率，但是也不应该将几乎不可能事件与其他事件一概而论。为了避免这种情况发生，需要考虑模型参数间的联系。一种策略是在模型中引入隐含变量(latent variables )，另一种是引入超参数(hyperparameters)。这两种方法都是可计算的(tractable)。 From Radford Neal’s CSC2541 “Bayesian Methods for Machine Learning”

Goal The goal is to find short descriptions of the members of a collection that enable efficient processing of large collections while preserving the essential statistical relationships that are useful for basic tasks such as classification, novelty detection, summarization, and similarity and relevance judgments. Goal and Motivation of Topic Model

主题模型：前人工作 tf-idf 统计词频无法捕捉到文档内部和文档间的统计特征 tf-idf 1983 Previous Work : tf-idf

主题模型：前人工作 LSI: Latent Semantic Indexing 在词与文档(term-by-document)矩阵上使用SVD tf-idf的线性组合，能捕捉到一些语法特征 tf-idf 1983 LSI 1990 Previous work: LSI

主题模型：前人工作 pLSI (aka Aspect Model 内容模型) 参数随着语料库的容量增长，容易过拟合在文档层面没有一个统计模型，无法对文档指定概率 tf-idf 1983 pLSI 1999 LSI 1990 Previous work: pLSI

主题模型：前人工作 LDA bag-of-word假设同时考虑词和文档交换性的混合模型 tf-idf 1983 pLSI 1999 LSI 1990 LDA 2003 LDA

LDA举例：在线音乐社区 An analog : Modeling Shared Tastes in Online Communities - Laura Dietz NIPS 09 workshop

LDA 对于语料库中的每个文档，LDA是如下的变参数层次贝叶斯网络：选择单词的个数选择文档中话题比率对于每个单词选择话题从分布中选择单词 LDA procedure

LDA 在已知超参数和的情况下，主题和词的联合概率为对和求积分，可以得到文档的边际概率进而，对所有的边际概率求积，可得语料库的概率 LDA : to see a document

LDA 在已知超参数和的情况下，主题和词的联合概率为为何求积分？对和求积分，可以得到文档的边际概率进而，对所有的边际概率求积，可得语料库的概率 LDA : to see a document

数据是随机的，所以，期望也是随机的；

参数是确定的，未知常量与概率式无关；Bayesian ,[object Object]

求参数𝜃的期望是通过求其概率分布得到；

对未知参数的估计是通过求其边际概率得到。 From Jerry Zhu’s CS 731 Advanced Artificial Intelligence

寒假里发生的一件趣事训练测试阅读… ,[object Object]

展开LDA by Human and Computer

LDA : Topic Five topics from a 50-topic LDA model fit to Science from 1980– 2002 LDA : Five topics from a 50-topic LDA model fit to Science from 1980– 2002

LDA : Personas Demo http://personas.media.mit.edu/personasWeb.html LDA : Personas

LDA ：获得更多 David M Blei, Andrew Y Ng, and Michael I Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3:993–1022, 2003 David M Blei and John D Lafferty. Topic models. Taylor and Francis, 2009. Ali Daud, Juanzi Li, Lizhu Zhou, and Faqir Muhammad. Knowledge discovery through directed probabilistic topic models: a survey. Frontiers of Computer Science in China, 4(2):280–301,January 2010. Mark Steyvers and Tom Griffith. Probabilistic topic models. Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, July 2006. LDA : where to learn more --- Surveys

如何得到LDA中的参数 --- 推断每个词的主题指定概率文档中主题的概率 LDA模型中最重要的计算任务是计算隐含变量的后验概率变分法抽样法 Variational Inference Gibbs Sampling Inference --- get important parameters in LDA

推断方法 Inference Methods Overview

推断方法随机方法 (抽样) MCMC, Metropolis-Hasting, Gibbs, etc 计算量大，但相对精确判定方法 (优化) Mean Field, Belief Propagation Variational Bayes, Expectation Propagation 计算量小，不精确，可以给出边界 Inference Methods : Comparison of two major methods

变分推断 Variational Inference Variational≈ Optimization Variational≈ ConvexOptimization The basic idea of convexity-based variational inference is to make use of Jensen’s inequality to obtain an adjustable lower bound on the log likelihood. Essentially, one considers a family of lower bounds, indexed by a set of variational parameters. The variationalparameters are chosen by an optimizationprocedure that attempts to find the tightest possible lower bound. Variational Inference

Mean field 基本思想用一个简单可分解的分布逼近求KL散度最小的逼近为何得名？概率可完全分解 Mean field variational inference

LDA中的变分推断目标：求出 Variational inference in LDA Overview

LDA中的变分推断 Jensen不等式 Variational Inference : Beautiful math

LDA中的变分推断记则因为都是可分解的，所以有 Variational Inference : Beautiful math

LDA中的变分推断 Variational Inference : Beautiful math

LDA中的变分推断应用拉格朗日法，得到 Variational Inference : Beautiful math

总结：LDA中的变分推断目标：求出 Variational Inference : Review

变分推断: 获得更多 Martin Wainwright. Graphical models and variational methods: Message-passing, convex relaxations, and all that. ICML2008 Tutorial M. J. Wainwright and M. I. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, Vol. 1, Numbers 1--2, pp. 1--305, December 2008 Variational Inference : Where to learn more

MCMC in LDA MCMC Overview Sampling in general Why sampling is necessary and why it is hard Importance sampling, rejection sampling Markov Chain Monte Carlo Metropolis-Hasting, Gibbs sampling Collapsed Gibbs in LDA

Pioneers to push sampling Nicholas C. Metropolis Andrey Markov Josiah W. Gibbs MCMC Overview

抽样例子中华人民共和国国家统计局 2006年3月16日经国务院批准，我国于2005年底开展了全国1%人口抽样调查工作。这次调查的样本量为1705万人，占全国总人口的1.31%。全国人口中，具有大学程度（指大专及以上）的人口为6764万人，高中程度（含中专）的人口为15083万人，初中程度的人口为46735万人，小学程度的人口为40706万人。 Sampling Example : Population statistics

抽样要解决的问题 1 从给定概率分布中产生样本 2 在给定概率分布下，估计函数的期望 Sampling

例子：测量湖水内某种物质的含量 Sampling ： Why it is so damn hard?

Rejection sampling Reject Accept Sampling ： Rejection sampling

Importance sampling In Rejection sampling, throwing away an x seems a waste, and the rejection is the only thing we know about the original distribution. Sampling ： Importance sampling

Metropolis-Hasting Method 考虑Markov特性：某个状态仅与其前一个状态有关在t状态，可以是任意可以抽样的分布，比如高斯分布对于一个新的状态，考虑 Sampling : Metropolis-Hasting Method

Gibbs Sampling 1 对所有的变量初始化 2 选定维度 i 从分布中对采样 Sampling : Gibbs Sampling

Gibbs Sampling in LDA : Joint distribution Gibbs Sampling in LDA

Gibbs Sampling in LDA : Joint distribution 将上式带入 Collapsed： Gibbs Sampling in LDA

Gibbs Sampling in LDA : Joint distribution 此处省略若干公式 …… Gibbs Sampling in LDA

Gibbs Sampling in LDA : Marginal dist. 此处省略若干公式 …… Gibbs Sampling in LDA

Gibbs Sampling in LDA in Python Code Really simple! for m in xrange(n_docs): for i, w in enumerate(word_indices(matrix[m, :])): z = np.random.randint(self.n_topics) self.nmz[m,z] += 1 self.nm[m] += 1 self.nzw[z,w] += 1 self.nz[z] += 1 self.topics[(m,i)] = z Sampling : Gibbs Sampling code in Python

Gibbs Sampling获得更多 D.J.C. MacKay. Information theory, inference, and learning algorithms. Cambridge UnivPr,2003. GregorHeinrich. Parameter estimation for text analysis. Technical Report, 2009. Michael I. Jordan and Yair Weiss. Graphical models: Probabilistic inference. Christophe Andrieu, N De Freitas, A Doucet, and Michael I. Jordan. An introduction to MCMC for machine learning. Machine learning, pages 5–43, 2003. Yi Wang. Distributed Gibbs Sampling of Latent Dirichlet Allocation : The Gritty Details. Technical Report, 2007. Gibbs Sampling where to learn more

主题模型的演化 ,[object Object]

Temporal topic modelsEvolution of Topic Models

Correlated + Dynamic TM David M. Blei and John D Lafferty. Correlated Topic Models. In Advances in Neural Information Processing Systems 18, 2006. David M. Blei and John D Lafferty. A correlated topic model of Science. The Annals of Applied Statistics, 1(1):17–35, 2007. David M. Blei and John D Lafferty. Dynamic topic models. Proceedings of the 23rd international conference on Machine learning - ICML ’06, pages 113–120, 2006. Correlated + Dynamic Topic Models

Correlated + Dynamic TM Correlated + Dynamic Topic Models

Correlated TM 无法捕捉主题间的联系和多元分布不共轭采用变分法进行推断 Correlated Topic Models

Correlated TM Correlated Topic Models

Correlated TM 控制稀疏度 Correlated Topic Models

Dynamic TM DTM中，假设所有文档是按时间分块 Dynamic Topic Models

Dynamic TM Dynamic Topic Models ： Top10 words of Science and example articles of Science

Topics over time Published in: KDD '06 Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining Topics over time

Topics over time Topics over time

Topics over time : Topic discovery TOT LDA Topics over time : State-of-the-Union Addresses

Topics over time : Topic evolution Topics over time : Topic evolution on NIPS data

Topics over time : Co-occuring Topics Topics over time : Topic evolution on NIPS data

Topics over time : Review Topics over time : Review 基于LDA 话题演化研究方法综述@中文信息学报 2010 年11 月

Work by Hanna Wallach NIPS ’09 ICML ’09 Work by Hanna Wallach

不对称先验 Rethinking LDA : Why priors matter

不对称先验的优点 ,[object Object], ,[object Object]

𝜃𝑑是每个文档的参数，适合使用不对称先验对文档进行刻画 Rethinking LDA : Why priors matter

Supervised LDA David Blei and Jon D. McAuliffe. Supervised topic models. In Advances in Neural Information Processing Systems, pages 1–22, 2008. Daniel Ramage, David Hall, Ramesh Nallapati, and C.D. Manning. Labeled LDA : A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1, pages 248–256. Association for Computational Linguistics, 2009. Jun Zhu, Amr Ahmed, and Eric Xing. MedLDA: Maximum Margin Supervised Topic Models. Journal of Machine Learning Research, 1:1–48, 2010. Supervised LDA

Supervised LDA ：目标 Naïve 在无监督模型的基础上，增加对类标的描述进行分类、回归朴素的方法：先进行LDA，利用主题进行分类用分布来对类标进行建模 Supervised LDA

Supervisedtopic models 用泛化线性模型(Generalized Linear Model)来对类标进行建模 Supervised LDA

Supervisedtopic models GLM可以灵活的描述任何可以写成指数分布的类标 ,[object Object]

……Supervised LDA : Why use GLM

Semi-Supervised LDA Unlabeled Labeled Semi-Supervised LDA

MedLDA maximum entropy discrimination latent Dirichlet allocation 通过优化单一目标函数和一组边界约束将大边界理论同主题模型结合在一起 Maximum Entropy Discrimination Latent Dirichlet Allocation

MedTM 优点：利用大边界理论正确分类更好的描述数据 Maximum Entropy Discrimination Topic Models

MedLDA : topic discovery MedLDA LDA 2D embedding on 20Newsgroups data

MedLDA : classification 二分类多分类 Classfication on 20Newsgroups data

主题模型的应用 ,[object Object],不规范用语(缩写，误拼，引用，不规范引用，@，RT……) 层次结构，更细的粒度 ,[object Object]

多媒体（图像、音频、视频）

蛋白质表达式分析NIPS ’09 Workshop on Applications for Topic Models: Text and Beyond

重建庞贝古城 Reconstructing Pompeian Households

LDA and Topic Models Explained

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to LDA and Topic Models Explained

Similar to LDA and Topic Models Explained (20)

Recently uploaded

Recently uploaded (20)

LDA and Topic Models Explained