Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Anima Anadkumar, Principal Scientis... by MLconf 2121 views
- Rushin Shah, Engineering Manager, F... by MLconf 1077 views
- Daniel Shank, Data Scientist, Talla... by MLconf 938 views
- Michael Alcorn, Sr. Software Engine... by MLconf 800 views
- Dr. June Andrews, Principal Data Sc... by MLconf 984 views
- Talha Obaid, Email Security, Symant... by MLconf 884 views

1,419 views

Published on

Tensors are multiway arrays, and tensor decompositions are powerful tools for data analysis. In this talk, we demonstrate the wide-ranging utility of the canonical polyadic (CP) tensor decomposition with examples in neuroscience and chemical detection. The CP model is extremely useful for interpretation, as we show with an example in neuroscience. However, it can be difficult to fit to real data for a variety of reasons. We present a novel randomized method for fitting the CP decomposition to dense data that is more scalable and robust than the standard techniques. We further consider the modeling assumptions for fitting tensor decompositions to data and explain alternative strategies for different statistical scenarios, resulting in a _generalized_ CP tensor decomposition.

Bio: Tamara G. Kolda is a member of the Data Science and Cyber Analytics Department at Sandia National Laboratories in Livermore, CA. Her research is generally in the area of computational science and data analysis, with specialties in multilinear algebra and tensor decompositions, graph models and algorithms, data mining, optimization, nonlinear solvers, parallel computing and the design of scientific software. She has received a Presidential Early Career Award for Scientists and Engineers (PECASE), been named a Distinguished Scientist of the Association for Computing Machinery (ACM) and a Fellow of the Society for Industrial and Applied Mathematics (SIAM). She was the winner of an R&D100 award and three best paper prizes at international conferences. She is currently a member of the SIAM Board of Trustees and serves as associate editor for both the SIAM J. Scientific Computing and the SIAM J. Matrix Analysis and Applications.

Published in:
Technology

No Downloads

Total views

1,419

On SlideShare

0

From Embeds

0

Number of Embeds

7

Shares

0

Downloads

50

Comments

2

Likes

1

No notes for slide

- 1. IllustrationbyChrisBrigman Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. MLconf: The Machine Learning Conference San Francisco, CA, Nov 10, 2017
- 2. A Tensor is an d-Way Array 11/10/2017 Kolda @ MLconf 2 Vector d = 1 Matrix d = 2 3rd-Order Tensor d = 3 4th-Order Tensor d = 4 5th-Order Tensor d = 5
- 3. From Matrices to Tensors: Two Points of View 11/10/2017 Kolda @ MLconf 3 Singular value decomposition (SVD), eigendecomposition (EVD), nonnegative matrix factorization (NMF), sparse SVD, CUR, etc. Viewpoint 1: Sum of outer products, useful for interpretation Tucker Model: Project onto high-variance subspaces to reduce dimensionality CP Model: Sum of d-way outer products, useful for interpretation CANDECOMP, PARAFAC, Canonical Polyadic, CP HO-SVD, Best Rank-(𝑹1,𝑹2,…,𝑹d) decomposition Other models for compression include hierarchical Tucker and tensor train. Viewpoint 2: High-variance subspaces, useful for compression
- 4. Matrix Factorization 11/10/2017 Kolda @ MLconf 4 ModelData ≈ Standard Matrix Formulation
- 5. CP Tensor Factorization (3-way) 11/10/2017 Kolda @ MLconf 5 ModelData ≈ CP = CANDECOMP/PARAFAC or Canonical Polyadic Hitchcock 1927, Harshman 1970, Carroll & Chang 1970
- 6. CP Tensor Factorization (𝒅-way) 11/10/2017 Kolda @ MLconf 6 ModelData ≈ Hitchcock 1927, Harshman 1970, Carroll & Chang 1970
- 7. 11/10/2017 Kolda @ MLconf 7
- 8. Amino Acids Fluorescence Dataset ▪ Fluorescence measurements of 5 samples containing 3 amino acids ▪ Tryptophan ▪ Tyrosine ▪ Phenylalanine ▪ Tensor of size 5 x 51 x 201 ▪ 5 samples ▪ 51 excitations ▪ 201 emissions 11/10/2017 Kolda @ MLconf 8 Unknown mixture of three amino acids samples excitation R. Bro, PARAFAC: Tutorial and Applications, Chemometrics and Intelligent Laboratory Systems, 38:149-171, 1997
- 9. Rank-3 CP Factorization of Amino Acids Data 11/10/2017 Kolda @ MLconf 9 𝐀 (5 × 3) 𝐁 (201 × 3) 𝐂 (51 × 3) Bro 1997 samples excitation
- 10. 11/10/2017 Kolda @ MLconf 10
- 11. Motivating Example: Neuron Activity in Learning 11/10/2017 Kolda @ MLconf 11 Thanks to Schnitzer Group @ Stanford Mark Schnitzer, Fori Wang, Tony Kim mouse in “maze” neural activity × 120 time bins × 600 trials (over 5 days) Microscope by Inscopix One Column of Neuron x Time Matrix 300 neurons One Trial Williams et al., bioRxiv, 2017, DOI:10.1101/211128
- 12. Trials Vary Start Position and Strategies 11/10/2017 Kolda @ MLconf 12 • 600 Trials over 5 Days • Start West or East • Conditions Swap Twice ❖ Always Turn South ❖ Always Turn Right ❖ Always Turn South wall W E N S note different patterns on curtains Williams et al., bioRxiv, 2017, DOI:10.1101/211128
- 13. CP for Simultaneous Analysis of Neurons, Time, and Trial 11/10/2017 Kolda @ MLconf 13 Prior tensor work in neuroscience for fMRI and EEG: Andersen and Rayens (2004), Mørup et al. (2004), Acar et al. (2007), De Vos et al. (2007), and more Williams et al., bioRxiv, 2017, DOI:10.1101/211128
- 14. 8-Component CP Decomposition of Mouse Neuron Data 11/10/2017 Kolda @ MLconf 14
- 15. Interpretation of Mouse Neuron Data 11/10/2017 Kolda @ MLconf 15
- 16. 11/10/2017 Kolda @ MLconf 16
- 17. Tensor Factorization (3-way) 11/10/2017 Kolda @ MLconf 17 ModelData ≈ We can rewrite this as a matrix equation in 𝐀, 𝐁, or 𝐂.
- 18. CP-ALS: Fitting CP Model via Alternating Least Squares 11/10/2017 Kolda @ MLconf 18 Harshman, 1970; Carroll & Chang, 1970 ▪ Rank (R) NP-Hard: Even best low-rank solution may not exist (Håstad 1990, Silva & Lim 2006, Hillar & Lim 2009) ▪ Not nested: Best rank-(R-1) factorization may not be part of best rank-R factorization (Kolda 2001) ▪ Nonconvex: But convex linear least squares problems ▪ Not orthogonal: Factor matrices are not orthogonal and may even have linearly dependent columns ▪ Essentially Unique: Under modest conditions, CP is unique up to permutation and scaling unique (Kruskal 1977) Repeat until convergence: Step 1: Step 2: Step 3:
- 19. CP-ALS Least Squares Problem 11/10/2017 Kolda @ MLconf 19 𝐗(1) − Khatri-Rao Product “right hand sides” “matrix” 𝐀 (𝐂 ⊙ 𝐁)′ 𝑛 × 𝑛 𝑑−1 𝑛 × 𝑟 𝑟 × 𝑛 𝑑−1 𝑛 × 𝑛2 𝑛 × 𝑟 𝑟 × 𝑛2
- 20. CP Least Squares Problem 11/10/2017 Kolda @ MLconf 20 − − How to randomize this? 𝑛 × 𝑛 𝑑−1 𝑛 × 𝑟 𝑟 × 𝑛 𝑑−1
- 21. Aside: Sketching for Standard Least Squares 11/10/2017 Kolda @ MLconf 21 𝐀 𝐛𝐱 − ො𝑛 𝑛 Backslash causes MATLAB to automatically call the best solver (cholesky, qr, etc.) 𝒪(ො𝑛𝑛2) Sarlós 2006, Woodruff 2014
- 22. Sampled Least Squares 11/10/2017 Kolda @ MLconf 22 𝐀 𝐛𝐱 − Choose 𝑞 rows, uniformly at random 𝐒𝐀 𝐒𝐛𝐱 − 𝑞 𝑛 𝒪(𝑞𝑛2) approximate Sampling only guaranteed to “work” if the 𝐀 is incoherent. 𝐒 𝒪(ො𝑛𝑛2 ) ො𝑛 𝑛 Sarlós 2006, Woodruff 2014
- 23. CP-ALS-RAND 11/10/2017 Kolda @ MLconf 23 − − − Battaglino, Ballard, & Kolda 2017
- 24. Randomizing the Convergence Check 11/10/2017 Kolda @ MLconf 24 Estimate convergence of function values using small random subset of elements in function evaluation (use Chernoff-Hoeffding to bound accuracy) 16000 samples < 1% of full data Battaglino, Ballard, & Kolda 2017
- 25. Speed Advantage: Analysis of Hazardous Gas Data 11/10/2017 Kolda @ MLconf 25 Data from Vergara et al. 2013; see also Vervliet and De Lathauwer (2016) This mode scaled by component size Color-coded by gas type 900 experiments (with three different gas types) x 72 sensors x 25,900 time steps (13 GB) Battaglino, Ballard, & Kolda 2017
- 26. Globalization Advantage? Amino Acids Data 11/10/2017 Kolda @ MLconf 26 Benefits are not as clear without mixing. Fit = 0.92 Fit = 0.97
- 27. 11/10/2017 Kolda @ MLconf 27
- 28. Generalizing the Goodness-of-Fit Criteria 11/10/2017 Kolda @ MLconf 28 Anderson-Bergman, Duersch, Hong, Kolda 2017 Similar ideas have been proposed in matrix world, e.g., Collins, Dasgupta, Schapire 2002
- 29. “Standard” CP 11/10/2017 Kolda @ MLconf 29 Typically: Consider data to be low-rank plus “white noise” Equivalently, Gaussian with mean 𝑚𝑖𝑗𝑘 Gaussian Probability Density Function (PDF) Minimize negative log likelihood: Results in the “standard” objective: Link: Anderson-Bergman, Duersch, Hong, Kolda 2017
- 30. “Boolean CP”: Odds Link 11/10/2017 Kolda @ MLconf 30 Consider data to be Bernoulli distributed with probability 𝑝𝑖jk Equivalent to minimizing negative log likelihood: Probability Mass Function (PMF): 𝑝𝑖𝑗𝑘 = 𝑚𝑖𝑗𝑘 1 + 𝑚𝑖𝑗𝑘 ⇔ 𝑚𝑖𝑗𝑘 = 𝑝𝑖𝑗𝑘 1 − 𝑝𝑖𝑗𝑘 Convert from probability to odds: 𝑝 𝑥 1 − 𝑝 1−𝑥 Anderson-Bergman, Duersch, Hong, Kolda 2017
- 31. Generalized CP 11/10/2017 Kolda @ MLconf 31 “Standard” CP uses: “Poisson” CP (Chi-Kolda 2012) uses: “Boolean-Odds” CP uses: Apply favorite optimization method (including SGD) to compute the solution. Anderson-Bergman, Duersch, Hong, Kolda 2017
- 32. A Sparse Dataset ▪ UC Irvine Chat Network ▪ 4-way binary tensor ▪ Sender (205) ▪ Receiver (210) ▪ Hour of Day (24) ▪ Day (194) ▪ 14,953 nonzeros (very sparse) ▪ Goodness-of-fit (odds): 𝑓 𝑥, 𝑚 = log 𝑚 + 1 − 𝑥 log 𝑚 ▪ Use GCP to compute rank-12 decomposition 11/10/2017 Kolda @ MLconf 32 Opsahl, T., Panzarasa, P., 2009. Clustering in weighted networks. Social Networks 31 (2), 155-163, doi: 10.1016/j.socnet.2009.02.002
- 33. Binary Chat Data using Boolean CP 11/10/2017 Kolda @ MLconf 33 Anderson-Bergman, Duersch, Hong, Kolda 2017
- 34. Tensors & Data Analysis ▪ CP tensor decomposition is effective for unsupervised data analysis ▪ Latent factor analysis ▪ Dimension reduction ▪ CP can be generalized to alternative fit functions ▪ Boolean data, count data, etc. ▪ Randomized techniques are open new doorways to larger datasets and more robust solutions ▪ Matrix sketching ▪ Stochastic gradient descent ▪ Other on-going & future work ▪ Parallel CP and GCP implementations (https://gitlab.com/tensors/genten) ▪ Parallel Tucker for compression (https://gitlab.com/tensors/TuckerMPI) ▪ Randomized ST-HOSVD (Tucker) ▪ Functional tensor factorization as surrogate for expensive functions ▪ Extensions to many more applications (binary data, signals, etc.) 11/10/2017 Kolda @ MLconf 34 Acknowledgements ▪ Cliff Anderson- Bergman (Sandia) ▪ Grey Ballard (Wake Forrest) ▪ Casey Battaglino (Georgia Tech) ▪ Jed Duersch (Sandia) ▪ David Hong (U. Michigan) ▪ Alex Williams (Stanford) Kolda and Bader, Tensor Decompositions and Applications, SIAM Review ‘09 Tensor Toolbox for MATLAB: www.tensortoolbox.org Bader, Kolda, Acar, Dunlavy, and othersContact: Tammy Kolda, www.kolda.net, tgkolda@sandia.gov

No public clipboards found for this slide

Login to see the comments