Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

4,384 views

Published on

The current standard framework for working with tensors, however, is Matlab. We will show how tensor decompositions can be carried out using Python, how to obtain latent components and how they can be interpreted, and what are some applications of this technique in the academy and industry. We will see a use case where a Python implementation of tensor decomposition is applied to a dataset that describes social interactions of people, collected using the SocioPatterns platform. This platform was deployed in different settings such as conferences, schools and hospitals, in order to support mathematical modelling and simulation of airborne infectious diseases. Tensor decomposition has been used in these scenarios to solve different types of problems: it can be used for data cleaning, where time-varying graph anomalies can be identified and removed from data; it can also be used to assess the impact of latent components in the spreading of a disease, and to devise intervention strategies that are able to reduce the number of infection cases in a school or hospital. These are just a few examples that show the potential of this technique in data mining and machine learning applications.

Published in:
Data & Analytics

No Downloads

Total views

4,384

On SlideShare

0

From Embeds

0

Number of Embeds

9

Shares

0

Downloads

106

Comments

0

Likes

7

No embeds

No notes for slide

- 1. EXPLORING TEMPORAL GRAPH DATA WITH PYTHON A STUDY ON TENSOR DECOMPOSITION OF WEARABLE SENSOR DATA ANDRÉ PANISSON @apanisson ISI Foundation, Torino, Italy & New York City
- 2. WHY TENSOR FACTORIZATION + PYTHON? ▸ Matrix Factorization is already used in many ﬁelds ▸ Tensor Factorization is becoming very popular for multiway data analysis ▸ TF is very useful to explore temporal graph data ▸ But still, the most used tool is Matlab ▸ There’s room for improvement in the Python libraries for TF ▸ Study: NTF of wearable sensor data
- 3. TENSORS AND TENSOR DECOMPOSITION
- 4. FACTOR ANALYSIS Spearman ~1900 X≈WH Xtests x subjects ≈ Wtests x intelligences Hintelligences x subjects Spearman, 1927: The abilities of man. ≈ tests subjects subjects tests Int. Int. X W H
- 5. TOPIC MODELING / LATENT SEMANTIC ANALYSIS Blei, David M. "Probabilistic topic models." Communications of the ACM 55.4 (2012): 77-84. . , , . , , . . . gene dna genetic life evolve organism brai n neuron nerve data number computer . , , Topics Documents Topic proportions and assignments 0.04 0.02 0.01 0.04 0.02 0.01 0.02 0.01 0.01 0.02 0.02 0.01 data number computer . , , 0.02 0.02 0.01
- 6. TOPIC MODELING / LATENT SEMANTIC ANALYSIS X≈WH Non-negative Matrix Factorization (NMF): (~1970 Lawson, ~1995 Paatero, ~2000 Lee & Seung) 2005 Gaussier et al. "Relation between PLSA and NMF and implications." arg min W,H kX WHk s. t. W, H 0 ≈ documents terms terms documents topic topic Sparse Matrix!
- 7. NON-NEGATIVE MATRIX FACTORIZATION (NMF) NMF gives Part based representation (Lee & Seung – Nature 1999) NMF =× Original PCA × = NMF is equivalent to Spectral Clustering (Ding et al. - SDM 2005) W W • VHT WHHT H H • WT V WTWH arg min W,H kX WHk s. t. W, H 0
- 8. from sklearn import datasets, decomposition digits = datasets.load_digits() A = digits.data nmf = decomposition.NMF(n_components=10) W = nmf.fit_transform(A) H = nmf.components_ plt.rc("image", cmap="binary") plt.figure(figsize=(8,4)) for i in range(10): plt.subplot(2,5,i+1) plt.imshow(H[i].reshape(8,8)) plt.xticks(()) plt.yticks(()) plt.tight_layout()
- 9. BEYOND MATRICES: HIGH DIMENSIONAL DATASETS Cichocki et al. Nonnegative Matrix and Tensor Factorizations Environmental analysis ▸ Measurement as a function of (Location, Time, Variable) Sensory analysis ▸ Score as a function of (Food sample, Judge, Attribute) Process analysis ▸ Measurement as a function of (Batch, Variable, time) Spectroscopy ▸ Intensity as a function of (Wavelength, Retention, Sample, Time, Location, …) … MULTIWAY DATA ANALYSIS
- 10. DIGITAL TRACES FROM SENSORS AND IOT USER POSITION TIME …
- 11. Sidiropoulos, Giannakis and Bro, IEEE Trans. Signal Processing, 2000. Mørup, Hansen and Arnfred, Journal of Neuroscience Methods, 2007. Hazan, Polak and Shashua, ICCV 2005. Bader, Berry, Browne, Survey of Text Mining: Clustering, Classification, and Retrieval, 2nd Ed., 2007. Doostan and Iaccarino, Journal of Computational Physics, 2009. Andersen and Bro, Journal of Chemometrics, 2003. • Chemometrics – Fluorescence Spectroscopy – Chromatographic Data Analysis • Neuroscience – Epileptic Seizure Localization – Analysis of EEG and ERP • Signal Processing • Computer Vision – Image compression, classification – Texture analysis • Social Network Analysis – Web link analysis – Conversation detection in emails – Text analysis • Approximation of PDEs data reconstruction, cluster analysis, compression, dimensionality reduction, latent semantic analysis, …
- 12. TENSORS
- 13. WHAT IS A TENSOR? A tensor is a multidimensional array E.g., three-way tensor: Mode-1 Mode-2 Mode-3 651a
- 14. FIBERS AND SLICES Cichocki et al. Nonnegative Matrix and Tensor Factorizations Column (Mode-1) Fibers Row (Mode-2) Fibers Tube (Mode-3) Fibers Horizontal Slices Lateral Slices Frontal Slices A[:, 4, 1] A[:, 1, 4] A[1, 3, :] A[1, :, :] A[:, :, 1]A[:, 1, :]
- 15. TENSOR UNFOLDINGS: MATRICIZATION AND VECTORIZATION Matricization: convert a tensor to a matrix Vectorization: convert a tensor to a vector
- 16. >>> T = np.arange(0, 24).reshape((3, 4, 2)) >>> T array([[[ 0, 1], [ 2, 3], [ 4, 5], [ 6, 7]], [[ 8, 9], [10, 11], [12, 13], [14, 15]], [[16, 17], [18, 19], [20, 21], [22, 23]]]) OK for dense tensors: use a combination of transpose() and reshape() Not simple for sparse datasets (e.g.: <authors, terms, time>) for j in range(2): for i in range(4): print T[:, i, j] [ 0 8 16] [ 2 10 18] [ 4 12 20] [ 6 14 22] [ 1 9 17] [ 3 11 19] [ 5 13 21] [ 7 15 23] # supposing the existence of unfold >>> T.unfold(0) array([[ 0, 2, 4, 6, 1, 3, 5, 7], [ 8, 10, 12, 14, 9, 11, 13, 15], [16, 18, 20, 22, 17, 19, 21, 23]]) >>> T.unfold(1) array([[ 0, 8, 16, 1, 9, 17], [ 2, 10, 18, 3, 11, 19], [ 4, 12, 20, 5, 13, 21], [ 6, 14, 22, 7, 15, 23]]) >>> T.unfold(2) array([[ 0, 8, 16, 2, 10, 18, 4, 12, 20, 6, 14, 22], [ 1, 9, 17, 3, 11, 19, 5, 13, 21, 7, 15, 23]])
- 17. RANK-1 TENSOR The outer product of N vectors results in a rank-1 tensor array([[[ 1., 2.], [ 2., 4.], [ 3., 6.], [ 4., 8.]], [[ 2., 4.], [ 4., 8.], [ 6., 12.], [ 8., 16.]], [[ 3., 6.], [ 6., 12.], [ 9., 18.], [ 12., 24.]]]) a = np.array([1, 2, 3]) b = np.array([1, 2, 3, 4]) c = np.array([1, 2]) T = np.zeros((a.shape[0], b.shape[0], c.shape[0])) for i in range(a.shape[0]): for j in range(b.shape[0]): for k in range(c.shape[0]): T[i, j, k] = a[i] * b[j] * c[k] T = a(1) · · · a(N) = a c b Ti,j,k = aibjck
- 18. TENSOR RANK ▸ Every tensor can be written as a sum of rank-1 tensors = a1 aJ c1 cJ b1 bJ + + ▸ Tensor rank: smallest number of rank-1 tensors that can generate it by summing up X ⇡ RX r=1 a(1) r a(2) r · · · a(N) r ⌘ JA(1) , A(2) , · · · , A(N) K T ⇡ RX r=1 ar br cr ⌘ JA, B, CK
- 19. array([[[ 61., 82.], [ 74., 100.], [ 87., 118.], [ 100., 136.]], [[ 77., 104.], [ 94., 128.], [ 111., 152.], [ 128., 176.]], [[ 93., 126.], [ 114., 156.], [ 135., 186.], [ 156., 216.]]]) A = np.array([[1, 2, 3], [4, 5, 6]]).T B = np.array([[1, 2, 3, 4], [5, 6, 7, 8]]).T C = np.array([[1, 2], [3, 4]]).T T = np.zeros((A.shape[0], B.shape[0], C.shape[0])) for i in range(A.shape[0]): for j in range(B.shape[0]): for k in range(C.shape[0]): for r in range(A.shape[1]): T[i, j, k] += A[i, r] * B[j, r] * C[k, r] T = np.einsum('ir,jr,kr->ijk', A, B, C) : Kruskal Tensorbr cr ⌘ JA, B, CK
- 20. TENSOR FACTORIZATION ▸ CANDECOMP/PARAFAC factorization (CP) ▸ extensions of SVD / PCA / NMF of matrices NON-NEGATIVE TENSOR FACTORIZATION ▸ Decompose a non-negative tensor to a sum of R non-negative rank-1 tensors arg min A,B,C kT JA, B, CKk with JA, B, CK ⌘ RX r=1 ar br cr subject to A 0, B 0, C 0
- 21. TENSOR FACTORIZATION: HOW TO Alternating Least Squares(ALS): Fix all but one factor matrix to which LS is applied min A 0 kT(1) A(C B)T k min B 0 kT(2) B(C A)T k min C 0 kT(3) C(B A)T k denotes the Khatri-Rao product, which is a column-wise Kronecker product, i.e., C B = [c1 ⌦ b1, c2 ⌦ b2, . . . , cr ⌦ br] T(1) = ˆA(ˆC ˆB)T T(2) = ˆB(ˆC ˆA)T T(3) = ˆC(ˆB ˆA)T Unfolded Tensor on the kth mode
- 22. F = [zeros(n, r), zeros(m, r), zeros(t, r)] FF_init = np.rand((len(F), r, r)) def iter_solver(T, F, FF_init): # Update each factor for k in range(len(F)): # Compute the inner-product matrix FF = ones((r, r)) for i in range(k) + range(k+1, len(F)): FF = FF * FF_init[i] # unfolded tensor times Khatri-Rao product XF = T.uttkrp(F, k) F[k] = F[k]*XF/(F[k].dot(FF)) # F[k] = nnls(FF, XF.T).T FF_init[k] = (F[k].T.dot(F[k])) return F, FF_init W W • VHT WHHT H H • WT V WTWH min A 0 kT(1) A(C B)T k min B 0 kT(2) B(C A)T k min C 0 kT(3) C(B A)T k arg min W,H kX WHk s. J. Kim and H. Park. Fast Nonnegative Tensor Factorization with an Active-set-like Method. In High- Performance Scientiﬁc Computing: Algorithms and Applications, Springer, 2012, pp. 311-326.
- 23. HOW TO INTERPRET: USER X TERM X TIME X is a 3-way tensor in which xnmt is 1 if the term m was used by user n at interval t, 0 otherwise ANxK is the the association of each user n to a factor k BMxK is the association of each term m to a factor k CTxK shows the time activity of each factor users users C = X A B (N×M×T) (T×K) (N×K) (M×K) terms tim e tim e terms factors
- 24. http://www.datainterfaces.org/2013/06/twitter-topic-explorer/
- 25. TOOLS FOR TENSOR DECOMPOSITION
- 26. TOOLS FOR TENSOR FACTORIZATION
- 27. TOOLS: THE PYTHON WORLD NumPy SciPy Scikit-Tensor (under development): github.com/mnick/scikit-tensor NTF: gist.github.com/panisson/7719245
- 28. TENSOR DECOMPOSITION OF WEARABLE SENSOR DATA
- 29. recorded proximity data direct proximity sensing
- 30. primary school Lyon, France primary school 231 students 10 teachers
- 31. Hong Kong primary school 900 students 65 teachers
- 32. SocioPatterns.org 7 years, 30+ deployments, 10 countries, 50,000+ persons • Mongan Institute for Health Policy, Boston • US Army Medical Component of the Armed Forces, Bangkok • School of Public Health of the University of Hong Kong • KEMRI Wellcome Trust, Kenya • London School for Hygiene and Tropical Medicine, London • Public Health England, London • Saw Swee Hock School of Public Health, Singapore
- 33. TENSORS
- 34. 0 1 0 1 0 1 0 1 0 FROM TEMPORAL GRAPHS TO 3-WAY TENSORS
- 35. temporal network tensorial representation tensor factorization factors communities temporal activity factorization quality A,B C tuning the complexity of the model nodes communities 1B 5A 3B 5B 2B 2A 3A 4A 1A 4B 50 60 70 80 0 10 20 30 404050 60 70 80 0 10 20 30 404050 60 70 80 0 10 20 30 4040 structures in temporal networks components nodes time time interval quality metrics component
- 36. L. Gauvin et al., PLoS ONE 9(1), e86028 (2014) 1B 5A 3B 5B 2B 2A 3A 4A 1A 4B TENSOR DECOMPOSITION OF SCHOOL NETWORK
- 37. https://github.com/panisson/ntf-school
- 38. ANOMALY DETECTION IN TEMPORAL NETWORKS
- 39. ANOMALY DETECTION IN TEMPORAL NETWORKS A. Sapienza et al. ”Detecting anomalies in time-varying networks using tensor decomposition”, ICDM Data Mining in Networks
- 40. anomaly detection in temporal networks
- 41. Laetitia Gauvin Ciro Cattuto Anna Sapienza .fit().predict() ( )
- 42. @apanisson panisson@gmail.com thank you

No public clipboards found for this slide

Be the first to comment