Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SEA - 5/01/15

1,615 views

Published on

Tensor Methods: A New Paradigm for Training Probabilistic Models and Feature Learning: Tensors are rich structures for modeling complex higher order relationships in data rich domains such as social networks, computer vision, internet of things, and so on. Tensor decomposition methods are embarrassingly parallel and scalable to enormous datasets. They are guaranteed to converge to the global optimum and yield consistent estimates of parameters for many probabilistic models such as topic models, community models, hidden Markov models, and so on. I will show the results of these methods for learning topics from text data, communities in social networks, disease hierarchies from healthcare records, cell types from mouse brain data, etc. I will also demonstrate how tensor methods can yield rich discriminative features for classification tasks and can serve as an alternative method for training neural networks.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SEA - 5/01/15

  1. 1. Tensor Methods: A New Paradigm for Probabilistic Models and Feature Learning Anima Anandkumar U.C. Irvine
  2. 2. Learning with Big Data
  3. 3. Data vs. Information
  4. 4. Data vs. Information
  5. 5. Data vs. Information Missing observations, gross corruptions, outliers.
  6. 6. Data vs. Information Missing observations, gross corruptions, outliers. Learning useful information is finding needle in a haystack!
  7. 7. Matrices and Tensors as Data Structures Multi-modal and multi-relational data. Matrices: pairwise relations. Tensors: higher order relations. Multi-modal data figure from Lise Getoor slides.
  8. 8. Spectral Decomposition of Tensors M2 = i λiui ⊗ vi = + .... Matrix M2 λ1u1 ⊗ v1 λ2u2 ⊗ v2
  9. 9. Spectral Decomposition of Tensors M2 = i λiui ⊗ vi = + .... Matrix M2 λ1u1 ⊗ v1 λ2u2 ⊗ v2 M3 = i λiui ⊗ vi ⊗ wi = + .... Tensor M3 λ1u1 ⊗ v1 ⊗ w1 λ2u2 ⊗ v2 ⊗ w2 We have developed efficient methods to solve tensor decomposition.
  10. 10. Strengths of Tensor Methods Fast and accurate, orders of magnitude faster than previous methods. Embarrassingly parallel and suited for cloud systems, e.g.Spark. Exploit optimized linear algebra libraries. Exploit parallelism of GPU systems.
  11. 11. Strengths of Tensor Methods Fast and accurate, orders of magnitude faster than previous methods. Embarrassingly parallel and suited for cloud systems, e.g.Spark. Exploit optimized linear algebra libraries. Exploit parallelism of GPU systems. 10 2 10 3 10 −1 10 0 10 1 10 2 10 3 10 4 Number of communities k Runningtime(secs) MATLAB Tensor Toolbox(CPU) CULA Standard Interface(GPU) CULA Device Interface(GPU) Eigen Sparse(CPU)
  12. 12. Outline 1 Introduction 2 Learning Probabilistic Models 3 Experiments 4 Feature Learning with Tensor Methods 5 Conclusion
  13. 13. Latent variable models Incorporate hidden or latent variables. Information structures: Relationships between latent variables and observed data.
  14. 14. Latent variable models Incorporate hidden or latent variables. Information structures: Relationships between latent variables and observed data. Basic Approach: mixtures/clusters Hidden variable is categorical.
  15. 15. Latent variable models Incorporate hidden or latent variables. Information structures: Relationships between latent variables and observed data. Basic Approach: mixtures/clusters Hidden variable is categorical. Advanced: Probabilistic models Hidden variables have more general distributions. Can model mixed membership/hierarchical groups. x1 x2 x3 x4 x5 h1 h2 h3
  16. 16. Challenges in Learning LVMs Computational Challenges Maximum likelihood: non-convex optimization. NP-hard. Practice: Local search approaches such as gradient descent, EM, Variational Bayes have no consistency guarantees. Can get stuck in bad local optima. Poor convergence rates and hard to parallelize. Tensor methods yield guaranteed learning for LVMs
  17. 17. Unsupervised Learning of LVMs GMM HMM h1 h2 h3 x1 x2 x3 ICA h1 h2 hk x1 x2 xd Multiview and Topic Models
  18. 18. Overall Framework for Unsupervised Learning = + .... Unlabeled Data Probabilistic admixture models Tensor Method Inference
  19. 19. Outline 1 Introduction 2 Learning Probabilistic Models 3 Experiments 4 Feature Learning with Tensor Methods 5 Conclusion
  20. 20. Demo for Learning Gaussian Mixtures
  21. 21. NYTimes Demo
  22. 22. Experimental Results on Yelp and DBLP Lowest error business categories & largest weight businesses Rank Category Business Stars Review Counts 1 Latin American Salvadoreno Restaurant 4.0 36 2 Gluten Free P.F. Chang’s China Bistro 3.5 55 3 Hobby Shops Make Meaning 4.5 14 4 Mass Media KJZZ 91.5FM 4.0 13 5 Yoga Sutra Midtown 4.5 31
  23. 23. Experimental Results on Yelp and DBLP Lowest error business categories & largest weight businesses Rank Category Business Stars Review Counts 1 Latin American Salvadoreno Restaurant 4.0 36 2 Gluten Free P.F. Chang’s China Bistro 3.5 55 3 Hobby Shops Make Meaning 4.5 14 4 Mass Media KJZZ 91.5FM 4.0 13 5 Yoga Sutra Midtown 4.5 31 Top-5 bridging nodes (businesses) Business Categories Four Peaks Brewing Restaurants, Bars, American, Nightlife, Food, Pubs, Tempe Pizzeria Bianco Restaurants, Pizza, Phoenix FEZ Restaurants, Bars, American, Nightlife, Mediterranean, Lounges, Phoenix Matt’s Big Breakfast Restaurants, Phoenix, Breakfast& Brunch Cornish Pasty Co Restaurants, Bars, Nightlife, Pubs, Tempe
  24. 24. Experimental Results on Yelp and DBLP Lowest error business categories & largest weight businesses Rank Category Business Stars Review Counts 1 Latin American Salvadoreno Restaurant 4.0 36 2 Gluten Free P.F. Chang’s China Bistro 3.5 55 3 Hobby Shops Make Meaning 4.5 14 4 Mass Media KJZZ 91.5FM 4.0 13 5 Yoga Sutra Midtown 4.5 31 Top-5 bridging nodes (businesses) Business Categories Four Peaks Brewing Restaurants, Bars, American, Nightlife, Food, Pubs, Tempe Pizzeria Bianco Restaurants, Pizza, Phoenix FEZ Restaurants, Bars, American, Nightlife, Mediterranean, Lounges, Phoenix Matt’s Big Breakfast Restaurants, Phoenix, Breakfast& Brunch Cornish Pasty Co Restaurants, Bars, Nightlife, Pubs, Tempe Error (E) and Recovery ratio (R) Dataset ˆk Method Running Time E R DBLP sub(n=1e5) 500 ours 10,157 0.139 89% DBLP sub(n=1e5) 500 variational 558,723 16.38 99% DBLP(n=1e6) 100 ours 5407 0.105 95%
  25. 25. Discovering Gene Profiles of Neuronal Cell Types Learning mixture of point processes of cells through tensor methods. Components of mixture are candidates for neuronal cell types.
  26. 26. Discovering Gene Profiles of Neuronal Cell Types Learning mixture of point processes of cells through tensor methods. Components of mixture are candidates for neuronal cell types.
  27. 27. Hierarchical Tensors for Healthcare Analytics = + .... = + .... = + .... = + ....
  28. 28. Hierarchical Tensors for Healthcare Analytics = + .... = + .... = + .... = + .... CMS dataset: 1.6 million patients, 15.8 million events. Mining disease inferences from patient records.
  29. 29. Outline 1 Introduction 2 Learning Probabilistic Models 3 Experiments 4 Feature Learning with Tensor Methods 5 Conclusion
  30. 30. Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to Fei-Fei Li, Rob Fergus, Antonio Torralba, et al.
  31. 31. Tensor Methods for Training Neural Networks First order score function m-th order score function Input pdf is p(x) Sm(x) := (−1)m∇(m)p(x) p(x) Capture local variations in data. Algorithm for Training Neural Networks Estimate score functions using autoencoder. Decompose tensor E[y ⊗ Sm(x)] to obtain weights, for m ≥ 3. Recursively estimate score function of autoencoder and repeat.
  32. 32. Tensor Methods for Training Neural Networks Second order score function m-th order score function Input pdf is p(x) Sm(x) := (−1)m∇(m)p(x) p(x) Capture local variations in data. Algorithm for Training Neural Networks Estimate score functions using autoencoder. Decompose tensor E[y ⊗ Sm(x)] to obtain weights, for m ≥ 3. Recursively estimate score function of autoencoder and repeat.
  33. 33. Tensor Methods for Training Neural Networks Third order score function m-th order score function Input pdf is p(x) Sm(x) := (−1)m∇(m)p(x) p(x) Capture local variations in data. Algorithm for Training Neural Networks Estimate score functions using autoencoder. Decompose tensor E[y ⊗ Sm(x)] to obtain weights, for m ≥ 3. Recursively estimate score function of autoencoder and repeat.
  34. 34. Demo: Training Neural Networks
  35. 35. Combining Probabilistic Models with Deep Learning Multi-object Detection in Computer vision Deep learning is able to extract good features, but not context. Probabilistic models capture contextual information. Hierarchical models + pre-trained deep learning features. State-of-art results on Microsoft COCO.
  36. 36. Outline 1 Introduction 2 Learning Probabilistic Models 3 Experiments 4 Feature Learning with Tensor Methods 5 Conclusion
  37. 37. Conclusion: Tensor Methods for Learning Tensor Decomposition Efficient sample and computational complexities Better performance compared to EM, Variational Bayes etc. In practice Scalable and embarrassingly parallel: handle large datasets. Efficient performance: perplexity or ground truth validation.
  38. 38. My Research Group and Resources Furong H. Majid J. Hanie S. Niranjan U.N. Forough A. Tejaswi N. Hao L. Yang S. ML summer school lectures available at http://newport.eecs.uci.edu/anandkumar/MLSS.html

×