Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Morten Mørup
Abstract Tensor (multiway array) factorization and decomposition has become an  important tool for data mining. Fueled b...
INTRODUCTION Tensors, or multiway arrays, are generalizations of  vectors (first-order tensors) and matrices (second-  or...
Examples Psychometrics: which group of subjects behave differently  on which variables under which conditions? Chemistry...
Factorizing tensors advantages over two-waymatrix factorization: uniqueness of the optimal solution (without imposing  co...
Tensor factorization: changelling and openingproblems: its geometry is not yet fully understood the occurrence of degene...
Tensor Nomenclatura Real tensor of order N (calligraphed letters): More compactly Element of the tensor
Tensor Operations                    It is a mistake?
Matricizing and unmatricizing
Matricizing operation
n-mode multiplication Singular value decomposition (SVD) can be written as
Outer, Kronecker and Khatri-Rao            product• Outer product of three vectors a, b, c:• Kronecker product • Khatri-Ra...
Moore-Penrose inverse
Summary
Tucker Model
Tucker Model
Tucker ModelThe Tucker model is not unique, an equivalent representation is:        where                         are inve...
Tucker ModelTucker3 model: Using the n-mode matricizing and Kronecker product operation.
Tucker Model: Estimation
Tucker Model: Estimation
Tucker Model: Estimation
CP ModelAlso known as:     Canonical Decomposition (CANDECOMP)     Parallel Factor Analysis (PARAFAC).
CP Model Tucker model on core array G is the same:   1. L=M=N=D  2.   gi,i,i ≠ 0 .
CP Model
CP Model: Estimation
CP Model: Estimation
CP Model: Estimation
Missing values Solutions  1.   Impute missing data values:  2.   Marginalization : missing values ignored during optimiza...
Missing values
Model Order Estimation
Model Order Estimation
Model Order Estimation
Common constraints
Some Available Tensor Software Matlab: N-way toolbox, CuBatch, PLS_Toolbox, Tensor toolbox Mathematica and Maple support...
Bioinformatics Multiway modeling has recently found use within bioinformatics. Here, the HOSVD has been shown to enable ...
Bioinformaticsit was demonstrated that HOSVDcomputationally can removeexperimental artifacts from theglobal mRNA expression
Bioinformatics
Bioinformatics The Tucker analysis was here  demonstrated to have the  advantage of easily interpretable  time profiles a...
Chemistry
Chemistry
Neuroscience
Neuroscience
Computer vision
CONCLUSION
EiB Seminar from Esteban Vegas, Ph.D.
Upcoming SlideShare
Loading in …5
×

EiB Seminar from Esteban Vegas, Ph.D.

824 views

Published on


Applications of tensor (multiway array) factorizations and decomposition in data mining presentation

Published in: Technology
  • Be the first to comment

EiB Seminar from Esteban Vegas, Ph.D.

  1. 1. Morten Mørup
  2. 2. Abstract Tensor (multiway array) factorization and decomposition has become an important tool for data mining. Fueled by the computational power of modern computer researchers can now analyze large-scale tensorial structured data that only a few years ago would have been impossible. Tensor factorizations have several advantages over two-way matrix factorizations including uniqueness of the optimal solution and component identification even when most of the data is missing. Furthermore, multiway decomposition techniques explicitly exploit the multiway structure that is lost when collapsing some of the modes of the tensor in order to analyze the data by regular matrix factorization approaches. Multiway decomposition is being applied to new fields every year and there is no doubt that the future will bring many exciting new applications. The aim of this overview is to introduce the basic concepts of tensor decompositions and demonstrate some of the many benefits and challenges of modeling data multiway for a wide variety of data and problem domains.
  3. 3. INTRODUCTION Tensors, or multiway arrays, are generalizations of vectors (first-order tensors) and matrices (second- order tensors) to arrays of higher orders (N > 2). Hence, a third-order tensor is an array with elements xi,j,k Tensor decompositions are in frequent use today in a variety of fields ranging from psychology, chemometrics, signal processing, bioinformatics, neuroscience, web mining, and computer vision to mention but a few
  4. 4. Examples Psychometrics: which group of subjects behave differently on which variables under which conditions? Chemistry: low concentrations to be the physical model of fluorescence spectroscopy admitting unique recovery of the chemical compounds from sampled mixtures. Neuroimaging: extract the consistent patterns of activation whereas noisy trials/subjects can be downweighted in the averaging process. Bioinformatics: understanding of cellular states and biological processes. Signal processing, computer vision, web mining …
  5. 5. Factorizing tensors advantages over two-waymatrix factorization: uniqueness of the optimal solution (without imposing constraints such as orthogonality and independence) and component identification even when only a relatively small fraction of all the data is observed (i.e., due to missing values). Furthermore, multiway decomposition techniques can explicitly take into account the multiway structure of the data that would otherwise be lost when analyzing the data by matrix factorization approaches by collapsing some of the modes.
  6. 6. Tensor factorization: changelling and openingproblems: its geometry is not yet fully understood the occurrence of degenerate solutions. no guarantee of finding the optimal solution most tensor decomposition models impose a very restricted structure on the data which in turn require that data exhibit a strong degree of regularity. understanding the data generating process is key for the formulation of adequate tensor decomposition models that can well extract the inherent multimodal structure.
  7. 7. Tensor Nomenclatura Real tensor of order N (calligraphed letters): More compactly Element of the tensor
  8. 8. Tensor Operations It is a mistake?
  9. 9. Matricizing and unmatricizing
  10. 10. Matricizing operation
  11. 11. n-mode multiplication Singular value decomposition (SVD) can be written as
  12. 12. Outer, Kronecker and Khatri-Rao product• Outer product of three vectors a, b, c:• Kronecker product • Khatri-Rao product is defined as a column-wise Kronecker product
  13. 13. Moore-Penrose inverse
  14. 14. Summary
  15. 15. Tucker Model
  16. 16. Tucker Model
  17. 17. Tucker ModelThe Tucker model is not unique, an equivalent representation is: where are invertible matrices
  18. 18. Tucker ModelTucker3 model: Using the n-mode matricizing and Kronecker product operation.
  19. 19. Tucker Model: Estimation
  20. 20. Tucker Model: Estimation
  21. 21. Tucker Model: Estimation
  22. 22. CP ModelAlso known as:  Canonical Decomposition (CANDECOMP)  Parallel Factor Analysis (PARAFAC).
  23. 23. CP Model Tucker model on core array G is the same: 1. L=M=N=D 2. gi,i,i ≠ 0 .
  24. 24. CP Model
  25. 25. CP Model: Estimation
  26. 26. CP Model: Estimation
  27. 27. CP Model: Estimation
  28. 28. Missing values Solutions 1. Impute missing data values: 2. Marginalization : missing values ignored during optimization
  29. 29. Missing values
  30. 30. Model Order Estimation
  31. 31. Model Order Estimation
  32. 32. Model Order Estimation
  33. 33. Common constraints
  34. 34. Some Available Tensor Software Matlab: N-way toolbox, CuBatch, PLS_Toolbox, Tensor toolbox Mathematica and Maple support tensors R language:  PTAk: Spatio-Temporal Multiway Decompositions Using Principal Tensor Analysis on k-Modes
  35. 35. Bioinformatics Multiway modeling has recently found use within bioinformatics. Here, the HOSVD has been shown to enable the interpretation of cellular states and biological processes by defining the significance of each combination of extracted patterns.
  36. 36. Bioinformaticsit was demonstrated that HOSVDcomputationally can removeexperimental artifacts from theglobal mRNA expression
  37. 37. Bioinformatics
  38. 38. Bioinformatics The Tucker analysis was here demonstrated to have the advantage of easily interpretable time profiles and extraction of metabolic perturbations with common time profiles only.
  39. 39. Chemistry
  40. 40. Chemistry
  41. 41. Neuroscience
  42. 42. Neuroscience
  43. 43. Computer vision
  44. 44. CONCLUSION

×