Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

524 views

358 views

358 views

Published on

No Downloads

Total views

524

On SlideShare

0

From Embeds

0

Number of Embeds

66

Shares

0

Downloads

9

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Tensor Decompositions for Guaranteed Learning of Latent Variable Models Anima Anandkumar U.C. Irvine
- 2. Application 1: Topic Modeling Document modeling Observed: words in document corpus. Hidden: topics. Goal: carry out document summarization.
- 3. Application 2: Understanding Human Communities Social Networks Observed: network of social ties, e.g. friendships, co-authorships Hidden: groups/communities of actors.
- 4. Application 3: Recommender Systems Recommender System Observed: Ratings of users for various products, e.g. yelp reviews. Goal: Predict new recommendations. Modeling: Find groups/communities of users and products.
- 5. Application 4: Feature Learning Feature Engineering Learn good features/representations for classiﬁcation tasks, e.g. image and speech recognition. Sparse representations, low dimensional hidden structures.
- 6. Application 5: Computational Biology Observed: gene expression levels Goal: discover gene groups Hidden variables: regulators controlling gene groups “Unsupervised Learning of Transcriptional Regulatory Networks via Latent Tree Graphical Model” by A. Gitter, F. Huang, R. Valluvan, E. Fraenkel and A. Anandkumar Submitted to BMC Bioinformatics, Jan. 2014.
- 7. Statistical Framework In all applications: discover hidden structure in data: unsupervised learning. Latent Variable Models Concise statistical description through graphical modeling Conditional independence relationships or hierarchy of variables. x h
- 8. Statistical Framework In all applications: discover hidden structure in data: unsupervised learning. Latent Variable Models Concise statistical description through graphical modeling Conditional independence relationships or hierarchy of variables. x1 x2 x3 x4 x5 h
- 9. Statistical Framework In all applications: discover hidden structure in data: unsupervised learning. Latent Variable Models Concise statistical description through graphical modeling Conditional independence relationships or hierarchy of variables. x1 x2 x3 x4 x5 h1 h2 h3
- 10. Computational Framework Challenge: Eﬃcient Learning of Latent Variable Models Maximum likelihood is NP-hard. Practice: EM, Variational Bayes have no consistency guarantees. Eﬃcient computational and sample complexities?
- 11. Computational Framework Challenge: Eﬃcient Learning of Latent Variable Models Maximum likelihood is NP-hard. Practice: EM, Variational Bayes have no consistency guarantees. Eﬃcient computational and sample complexities? Fast methods such as matrix factorization are not statistical. We cannot learn the latent variable model through such methods.
- 12. Computational Framework Challenge: Eﬃcient Learning of Latent Variable Models Maximum likelihood is NP-hard. Practice: EM, Variational Bayes have no consistency guarantees. Eﬃcient computational and sample complexities? Fast methods such as matrix factorization are not statistical. We cannot learn the latent variable model through such methods. Tensor-based Estimation Estimate moment tensors from data: higher order relationships. Compute decomposition of moment tensor. Iterative updates, e.g. tensor power iterations, alternating minimization. Non-convex: convergence to a local optima. No guarantees.
- 13. Computational Framework Challenge: Eﬃcient Learning of Latent Variable Models Maximum likelihood is NP-hard. Practice: EM, Variational Bayes have no consistency guarantees. Eﬃcient computational and sample complexities? Fast methods such as matrix factorization are not statistical. We cannot learn the latent variable model through such methods. Tensor-based Estimation Estimate moment tensors from data: higher order relationships. Compute decomposition of moment tensor. Iterative updates, e.g. tensor power iterations, alternating minimization. Non-convex: convergence to a local optima. No guarantees. Innovation: Guaranteed convergence to correct model.
- 14. Computational Framework Challenge: Eﬃcient Learning of Latent Variable Models Maximum likelihood is NP-hard. Practice: EM, Variational Bayes have no consistency guarantees. Eﬃcient computational and sample complexities? Fast methods such as matrix factorization are not statistical. We cannot learn the latent variable model through such methods. Tensor-based Estimation Estimate moment tensors from data: higher order relationships. Compute decomposition of moment tensor. Iterative updates, e.g. tensor power iterations, alternating minimization. Non-convex: convergence to a local optima. No guarantees. Innovation: Guaranteed convergence to correct model. In this talk: tensor decompositions and applications
- 15. Outline 1 Introduction 2 Topic Models 3 Eﬃcient Tensor Decomposition 4 Experimental Results 5 Conclusion
- 16. Topic Models: Bag of Words
- 17. Probabilistic Topic Models Bag of words: order of words does not matter Graphical model representation l words in a document x1, . . . , xl. h: proportions of topics in a document. Word xi generated from topic yi. A(i, j) := P[xm = i|ym = j] : topic-word matrix. Words Topics Topic Mixture x1 x2 x3 x4 x5 y1 y2 y3 y4 y5 AAAAA h
- 18. Geometric Picture for Topic Models Topic proportions vector (h) Document Linear Model: E[xi|h] = Ah . Multiview model: h is ﬁxed and multiple words (xi) are generated.
- 19. Geometric Picture for Topic Models Single topic (h) Linear Model: E[xi|h] = Ah . Multiview model: h is ﬁxed and multiple words (xi) are generated.
- 20. Geometric Picture for Topic Models Topic proportions vector (h) Linear Model: E[xi|h] = Ah . Multiview model: h is ﬁxed and multiple words (xi) are generated.
- 21. Geometric Picture for Topic Models Topic proportions vector (h) AAA x1 x2 x3 Word generation (x1, x2, . . .) Linear Model: E[xi|h] = Ah . Multiview model: h is ﬁxed and multiple words (xi) are generated.
- 22. Moment Tensors Consider single topic model. E[xi|h] = Ah. λ := [E[h]]i. Learn topic-word matrix A, vector λ = P[h] M2: Co-occurrence of two words in a document M2 := E[x1x⊤ 2 ] = E[E[x1x⊤ 2 |h]] = AE[hh⊤ ]A⊤ = k r=1 λrara⊤ r
- 23. Moment Tensors Consider single topic model. E[xi|h] = Ah. λ := [E[h]]i. Learn topic-word matrix A, vector λ = P[h] M2: Co-occurrence of two words in a document M2 := E[x1x⊤ 2 ] = E[E[x1x⊤ 2 |h]] = AE[hh⊤ ]A⊤ = k r=1 λrara⊤ r Tensor M3: Co-occurrence of three words M3 := E(x1 ⊗ x2 ⊗ x3) = r λrar ⊗ ar ⊗ ar
- 24. Moment Tensors Consider single topic model. E[xi|h] = Ah. λ := [E[h]]i. Learn topic-word matrix A, vector λ = P[h] M2: Co-occurrence of two words in a document M2 := E[x1x⊤ 2 ] = E[E[x1x⊤ 2 |h]] = AE[hh⊤ ]A⊤ = k r=1 λrara⊤ r Tensor M3: Co-occurrence of three words M3 := E(x1 ⊗ x2 ⊗ x3) = r λrar ⊗ ar ⊗ ar Matrix and Tensor Forms: ar := rth column of A. M2 = k r=1 λrar ⊗ ar. M3 = k r=1 λrar ⊗ ar ⊗ ar
- 25. Tensor Decomposition Problem M2 = k r=1 λrar ⊗ ar. M3 = k r=1 λrar ⊗ ar ⊗ ar = + .... Tensor M3 λ1a1 ⊗ a1 ⊗ a1 λ2a2 ⊗ a2 ⊗ a2 u ⊗ v ⊗ w is a rank-1 tensor whose i, j, kth entry is uivjwk. k topics, d words in vocabulary. M3: O(d × d × d) tensor, Rank k. Learning Topic Models through Tensor Decomposition
- 26. Detecting Communities in Networks
- 27. Detecting Communities in Networks Stochastic Block Model Non-overlapping
- 28. Detecting Communities in Networks Stochastic Block Model Non-overlapping Mixed Membership Model Overlapping
- 29. Detecting Communities in Networks Stochastic Block Model Non-overlapping Mixed Membership Model Overlapping
- 30. Detecting Communities in Networks Stochastic Block Model Non-overlapping Mixed Membership Model Overlapping Unifying Assumption Edges conditionally independent given community memberships
- 31. Multi-view Mixture Models
- 32. Tensor Forms in Other Models Independent Component Analysis Independent sources, unknown mixing. Blind source separation of speech, image, video.. h1 h2 hk x1 x2 xd A Gaussian Mixtures Hidden Markov Models/Latent Trees x1 x2 x3 x4 x5 h1 h2 h3 Reduction to similar moment forms
- 33. Outline 1 Introduction 2 Topic Models 3 Eﬃcient Tensor Decomposition 4 Experimental Results 5 Conclusion
- 34. Tensor Decomposition Problem M3 = k r=1 λrar ⊗ ar ⊗ ar = + .... Tensor M3 λ1a1 ⊗ a1 ⊗ a1 λ2a2 ⊗ a2 ⊗ a2 u ⊗ v ⊗ w is a rank-1 tensor whose i, j, kth entry is uivjwk. k topics, d words in vocabulary. M3: O(d × d × d) tensor, Rank k. d: vocabulary size for topic models or n: size of network for community models.
- 35. Dimensionality Reduction for Tensor Decomposition M3 = k r=1 λrar ⊗ ar ⊗ ar Dimensionality Reduction (Whitening) Convert M3 of size O(d × d × d) to tensor T of size k × k × k Carry out decomposition of T Tensor M3 Tensor T Dimensionality reduction through multi-linear transforms Computed from data, e.g. pairwise moments. T = i ρir⊗3 i is symmetric orthogonal tensor: {ri} are orthonormal
- 36. Orthogonal/Eigen Decomposition Orthogonal symmetric tensor: T = j∈[k] ρjr⊗3 j T(I, r1, r1) = j∈[k] ρj r1, rj 2rj = ρ1r1
- 37. Orthogonal/Eigen Decomposition Orthogonal symmetric tensor: T = j∈[k] ρjr⊗3 j T(I, r1, r1) = j∈[k] ρj r1, rj 2rj = ρ1r1 Obtaining eigenvectors through power iterations u → T(I, u, u) T(I, u, u)
- 38. Orthogonal/Eigen Decomposition Orthogonal symmetric tensor: T = j∈[k] ρjr⊗3 j T(I, r1, r1) = j∈[k] ρj r1, rj 2rj = ρ1r1 Obtaining eigenvectors through power iterations u → T(I, u, u) T(I, u, u) Basic Algorithm Random initialization, run power iterations and deﬂate
- 39. Practical Considerations k communities, n nodes, k ≪ n. Steps k-SVD of n × n matrix: randomized techniques Online k × k × k tensor decomposition: No tensor explicitly formed. Parallelization: Inherently parallelizable, GPU deployment. Sparse implementation: real-world networks are sparse Validation Metric: p-value test based “soft-pairing” Parallel time complexity: O nsk c + k3 , s is max. degree in graph and c is number of cores. Huang, Niranjan, Hakeem and Anandkumar, “Fast Detection of Overlapping Communities via Online Tensor Methods,” Preprint, Sept. 2013.
- 40. Scaling Of The Stochastic Iterations vt+1 i ← vt i − 3θβt k j=1 vt j, vt i 2 vt j + βt vt i, yt A vt i , yt B yt C + . . . Parallelize across eigenvectors. STGD is iterative: device code reuse buﬀers for updates. vt i yt A,yt B,yt C CPU GPU Standard Interface vt i yt A,yt B,yt C CPU GPU Device Interface vt i
- 41. Scaling Of The Stochastic Iterations 10 2 10 3 10 −1 10 0 10 1 10 2 10 3 10 4 Number of communities k Runningtime(secs) MATLAB Tensor Toolbox CULA Standard Interface CULA Device Interface Eigen Sparse
- 42. Outline 1 Introduction 2 Topic Models 3 Eﬃcient Tensor Decomposition 4 Experimental Results 5 Conclusion
- 43. Experimental Results Friend Users Facebook n ∼ 20, 000 Business User Reviews Yelp n ∼ 40, 000 Author Coauthor DBLP n ∼ 1 million Error (E) and Recovery ratio (R) Dataset ˆk Method Running Time E R Facebook(k=360) 500 ours 468 0.0175 100% Facebook(k=360) 500 variational 86,808 0.0308 100% . Yelp(k=159) 100 ours 287 0.046 86% Yelp(k=159) 100 variational N.A. . DBLP(k=6000) 100 ours 5407 0.105 95%
- 44. Experimental Results on Yelp Lowest error business categories & largest weight businesses Rank Category Business Stars Review Counts 1 Latin American Salvadoreno Restaurant 4.0 36 2 Gluten Free P.F. Chang’s China Bistro 3.5 55 3 Hobby Shops Make Meaning 4.5 14 4 Mass Media KJZZ 91.5FM 4.0 13 5 Yoga Sutra Midtown 4.5 31
- 45. Experimental Results on Yelp Lowest error business categories & largest weight businesses Rank Category Business Stars Review Counts 1 Latin American Salvadoreno Restaurant 4.0 36 2 Gluten Free P.F. Chang’s China Bistro 3.5 55 3 Hobby Shops Make Meaning 4.5 14 4 Mass Media KJZZ 91.5FM 4.0 13 5 Yoga Sutra Midtown 4.5 31 Bridgeness: Distance from vector [1/ˆk, . . . , 1/ˆk]⊤ Top-5 bridging nodes (businesses) Business Categories Four Peaks Brewing Restaurants, Bars, American, Nightlife, Food, Pubs, Tempe Pizzeria Bianco Restaurants, Pizza, Phoenix FEZ Restaurants, Bars, American, Nightlife, Mediterranean, Lounges, Phoenix Matt’s Big Breakfast Restaurants, Phoenix, Breakfast& Brunch Cornish Pasty Co Restaurants, Bars, Nightlife, Pubs, Tempe
- 46. Outline 1 Introduction 2 Topic Models 3 Eﬃcient Tensor Decomposition 4 Experimental Results 5 Conclusion
- 47. Conclusion Guaranteed Learning of Latent Variable Models Guaranteed to recover correct model Eﬃcient sample and computational complexities Better performance compared to EM, Variational Bayes etc. Mixed membership communities, topic models, ICA, Gaussian mixtures... Current and Future Goals Guaranteed online learning in high dimensions Large-scale cloud-based implementation of tensor approaches Code available on website and Github

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment