Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
EXPLORING TEMPORAL GRAPH DATA WITH PYTHON

A STUDY ON TENSOR DECOMPOSITION OF WEARABLE SENSOR DATA
ANDRÉ PANISSON
@apaniss...
WHY TENSOR FACTORIZATION + PYTHON?
▸ Matrix Factorization is already used in many fields
▸ Tensor Factorization is becoming...
TENSORS AND TENSOR DECOMPOSITION
FACTOR ANALYSIS
Spearman ~1900
X≈WH
Xtests x subjects ≈ Wtests x intelligences Hintelligences x subjects
Spearman, 1927: T...
TOPIC MODELING / LATENT SEMANTIC ANALYSIS
Blei, David M. "Probabilistic topic models." Communications of the ACM 55.4 (201...
TOPIC MODELING / LATENT SEMANTIC ANALYSIS
X≈WH
Non-negative Matrix Factorization (NMF):
(~1970 Lawson, ~1995 Paatero, ~200...
NON-NEGATIVE MATRIX FACTORIZATION (NMF)
NMF gives Part based representation

(Lee & Seung – Nature 1999)
NMF
=×
Original
P...
from sklearn import datasets, decomposition
digits = datasets.load_digits()
A = digits.data
nmf = decomposition.NMF(n_comp...
BEYOND MATRICES: HIGH DIMENSIONAL DATASETS
Cichocki et al. Nonnegative Matrix and Tensor Factorizations
Environmental anal...
DIGITAL TRACES FROM SENSORS AND IOT
USER
POSITION
TIME
…
Sidiropoulos,
Giannakis and Bro,
IEEE Trans. Signal
Processing, 2000.
Mørup, Hansen and Arnfred,
Journal of Neuroscience
M...
TENSORS
WHAT IS A TENSOR?
A tensor is a multidimensional array

E.g., three-way tensor:
Mode-1
Mode-2
Mode-3
651a
FIBERS AND SLICES
Cichocki et al. Nonnegative Matrix and Tensor Factorizations
Column (Mode-1) Fibers Row (Mode-2) Fibers ...
TENSOR UNFOLDINGS: MATRICIZATION AND VECTORIZATION
Matricization: convert a tensor to a matrix
Vectorization: convert a te...
>>> T = np.arange(0, 24).reshape((3, 4, 2))
>>> T
array([[[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7]],
[[ 8, 9],
[10, 11],
[12, 13...
RANK-1 TENSOR
The outer product of N vectors results in a rank-1 tensor
array([[[ 1., 2.],
[ 2., 4.],
[ 3., 6.],
[ 4., 8.]...
TENSOR RANK
▸ Every tensor can be written as a sum of rank-1 tensors
=
a1 aJ
c1 cJ
b1 bJ
+ +
▸ Tensor rank: smallest numbe...
array([[[ 61., 82.],
[ 74., 100.],
[ 87., 118.],
[ 100., 136.]],
[[ 77., 104.],
[ 94., 128.],
[ 111., 152.],
[ 128., 176.]...
TENSOR FACTORIZATION
▸ CANDECOMP/PARAFAC factorization (CP)
▸ extensions of SVD / PCA / NMF of matrices
NON-NEGATIVE TENSO...
TENSOR FACTORIZATION: HOW TO
Alternating Least Squares(ALS):

Fix all but one factor matrix to which LS is applied
min
A 0...
F = [zeros(n, r), zeros(m, r), zeros(t, r)]
FF_init = np.rand((len(F), r, r))
def iter_solver(T, F, FF_init):
# Update eac...
HOW TO INTERPRET: USER X TERM X TIME
X is a 3-way tensor in which xnmt is 1 if the term m was used by user
n at interval t...
http://www.datainterfaces.org/2013/06/twitter-topic-explorer/
TOOLS FOR TENSOR DECOMPOSITION
TOOLS FOR TENSOR FACTORIZATION
TOOLS: THE PYTHON WORLD
NumPy SciPy
Scikit-Tensor (under development):
github.com/mnick/scikit-tensor
NTF: gist.github.com...
TENSOR DECOMPOSITION OF WEARABLE SENSOR DATA
recorded proximity data
direct proximity
sensing
primary
school
Lyon, France
primary school
231 students
10 teachers
Hong Kong
primary school
900 students
65 teachers
SocioPatterns.org
7 years, 30+ deployments, 10 countries, 50,000+ persons
• Mongan Institute for Health Policy, Boston

• ...
TENSORS
0 1 0
1 0 1
0 1 0
FROM TEMPORAL GRAPHS TO 3-WAY TENSORS
temporal network
tensorial
representation
tensor factorization
factors
communities temporal activity
factorization
quality...
L. Gauvin et al., PLoS ONE 9(1), e86028 (2014)
1B
5A
3B
5B
2B
2A
3A
4A
1A
4B
TENSOR DECOMPOSITION OF SCHOOL NETWORK
https://github.com/panisson/ntf-school
ANOMALY DETECTION
IN TEMPORAL NETWORKS
ANOMALY DETECTION IN TEMPORAL NETWORKS
A. Sapienza et al. ”Detecting anomalies in time-varying networks using tensor decom...
anomaly detection in temporal networks
Laetitia Gauvin Ciro Cattuto Anna Sapienza
.fit().predict()
( )
@apanisson
panisson@gmail.com
thank you
Exploring temporal graph data with Python: 
a study on tensor decomposition of wearable sensor data (PyData NYC 2015)
Upcoming SlideShare
Loading in …5
×

Exploring temporal graph data with Python: 
a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

4,384 views

Published on

Tensor decompositions have gained a steadily increasing popularity in data mining applications. Data sources from sensor networks and Internet-of-Things applications promise a wealth of interaction data that can be naturally represented as multidimensional structures such as tensors. For example, time-varying social networks collected from wearable proximity sensors can be represented as 3-way tensors. By representing this data as tensors, we can use tensor decomposition to extract community structures with their structural and temporal signatures.

The current standard framework for working with tensors, however, is Matlab. We will show how tensor decompositions can be carried out using Python, how to obtain latent components and how they can be interpreted, and what are some applications of this technique in the academy and industry. We will see a use case where a Python implementation of tensor decomposition is applied to a dataset that describes social interactions of people, collected using the SocioPatterns platform. This platform was deployed in different settings such as conferences, schools and hospitals, in order to support mathematical modelling and simulation of airborne infectious diseases. Tensor decomposition has been used in these scenarios to solve different types of problems: it can be used for data cleaning, where time-varying graph anomalies can be identified and removed from data; it can also be used to assess the impact of latent components in the spreading of a disease, and to devise intervention strategies that are able to reduce the number of infection cases in a school or hospital. These are just a few examples that show the potential of this technique in data mining and machine learning applications.

Published in: Data & Analytics

Exploring temporal graph data with Python: 
a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

  1. 1. EXPLORING TEMPORAL GRAPH DATA WITH PYTHON
 A STUDY ON TENSOR DECOMPOSITION OF WEARABLE SENSOR DATA ANDRÉ PANISSON @apanisson ISI Foundation, Torino, Italy & New York City
  2. 2. WHY TENSOR FACTORIZATION + PYTHON? ▸ Matrix Factorization is already used in many fields ▸ Tensor Factorization is becoming very popular
 for multiway data analysis ▸ TF is very useful to explore temporal graph data ▸ But still, the most used tool is Matlab ▸ There’s room for improvement in 
 the Python libraries for TF ▸ Study: NTF of wearable sensor data
  3. 3. TENSORS AND TENSOR DECOMPOSITION
  4. 4. FACTOR ANALYSIS Spearman ~1900 X≈WH Xtests x subjects ≈ Wtests x intelligences Hintelligences x subjects Spearman, 1927: The abilities of man. ≈ tests subjects subjects tests Int. Int. X W H
  5. 5. TOPIC MODELING / LATENT SEMANTIC ANALYSIS Blei, David M. "Probabilistic topic models." Communications of the ACM 55.4 (2012): 77-84. . , , . , , . . . gene dna genetic life evolve organism brai n neuron nerve data number computer . , , Topics Documents Topic proportions and assignments 0.04 0.02 0.01 0.04 0.02 0.01 0.02 0.01 0.01 0.02 0.02 0.01 data number computer . , , 0.02 0.02 0.01
  6. 6. TOPIC MODELING / LATENT SEMANTIC ANALYSIS X≈WH Non-negative Matrix Factorization (NMF): (~1970 Lawson, ~1995 Paatero, ~2000 Lee & Seung) 2005 Gaussier et al. "Relation between PLSA and NMF and implications." arg min W,H kX WHk s. t. W, H 0 ≈ documents terms terms documents topic topic Sparse
 Matrix!
  7. 7. NON-NEGATIVE MATRIX FACTORIZATION (NMF) NMF gives Part based representation
 (Lee & Seung – Nature 1999) NMF =× Original PCA × = NMF is equivalent to Spectral Clustering
 (Ding et al. - SDM 2005) W W • VHT WHHT H H • WT V WTWH arg min W,H kX WHk s. t. W, H 0
  8. 8. from sklearn import datasets, decomposition digits = datasets.load_digits() A = digits.data nmf = decomposition.NMF(n_components=10) W = nmf.fit_transform(A) H = nmf.components_ plt.rc("image", cmap="binary") plt.figure(figsize=(8,4)) for i in range(10): plt.subplot(2,5,i+1) plt.imshow(H[i].reshape(8,8)) plt.xticks(()) plt.yticks(()) plt.tight_layout()
  9. 9. BEYOND MATRICES: HIGH DIMENSIONAL DATASETS Cichocki et al. Nonnegative Matrix and Tensor Factorizations Environmental analysis ▸ Measurement as a function of (Location, Time, Variable) Sensory analysis ▸ Score as a function of (Food sample, Judge, Attribute) Process analysis ▸ Measurement as a function of (Batch, Variable, time) Spectroscopy ▸ Intensity as a function of (Wavelength, Retention, Sample, Time, Location, …) … MULTIWAY DATA ANALYSIS
  10. 10. DIGITAL TRACES FROM SENSORS AND IOT USER POSITION TIME …
  11. 11. Sidiropoulos, Giannakis and Bro, IEEE Trans. Signal Processing, 2000. Mørup, Hansen and Arnfred, Journal of Neuroscience Methods, 2007. Hazan, Polak and Shashua, ICCV 2005. Bader, Berry, Browne, Survey of Text Mining: Clustering, Classification, and Retrieval, 2nd Ed., 2007. Doostan and Iaccarino, Journal of Computational Physics, 2009. Andersen and Bro, Journal of Chemometrics, 2003. • Chemometrics – Fluorescence Spectroscopy – Chromatographic Data Analysis • Neuroscience – Epileptic Seizure Localization – Analysis of EEG and ERP • Signal Processing • Computer Vision – Image compression, classification – Texture analysis • Social Network Analysis – Web link analysis – Conversation detection in emails – Text analysis • Approximation of PDEs data reconstruction, cluster analysis, compression, 
 dimensionality reduction, latent semantic analysis, …
  12. 12. TENSORS
  13. 13. WHAT IS A TENSOR? A tensor is a multidimensional array
 E.g., three-way tensor: Mode-1 Mode-2 Mode-3 651a
  14. 14. FIBERS AND SLICES Cichocki et al. Nonnegative Matrix and Tensor Factorizations Column (Mode-1) Fibers Row (Mode-2) Fibers Tube (Mode-3) Fibers Horizontal Slices Lateral Slices Frontal Slices A[:, 4, 1] A[:, 1, 4] A[1, 3, :] A[1, :, :] A[:, :, 1]A[:, 1, :]
  15. 15. TENSOR UNFOLDINGS: MATRICIZATION AND VECTORIZATION Matricization: convert a tensor to a matrix Vectorization: convert a tensor to a vector
  16. 16. >>> T = np.arange(0, 24).reshape((3, 4, 2)) >>> T array([[[ 0, 1], [ 2, 3], [ 4, 5], [ 6, 7]], [[ 8, 9], [10, 11], [12, 13], [14, 15]], [[16, 17], [18, 19], [20, 21], [22, 23]]]) OK for dense tensors: use a combination 
 of transpose() and reshape() Not simple for sparse datasets (e.g.: <authors, terms, time>) for j in range(2): for i in range(4): print T[:, i, j] [ 0 8 16] [ 2 10 18] [ 4 12 20] [ 6 14 22] [ 1 9 17] [ 3 11 19] [ 5 13 21] [ 7 15 23] # supposing the existence of unfold >>> T.unfold(0) array([[ 0, 2, 4, 6, 1, 3, 5, 7], [ 8, 10, 12, 14, 9, 11, 13, 15], [16, 18, 20, 22, 17, 19, 21, 23]]) >>> T.unfold(1) array([[ 0, 8, 16, 1, 9, 17], [ 2, 10, 18, 3, 11, 19], [ 4, 12, 20, 5, 13, 21], [ 6, 14, 22, 7, 15, 23]]) >>> T.unfold(2) array([[ 0, 8, 16, 2, 10, 18, 4, 12, 20, 6, 14, 22], [ 1, 9, 17, 3, 11, 19, 5, 13, 21, 7, 15, 23]])
  17. 17. RANK-1 TENSOR The outer product of N vectors results in a rank-1 tensor array([[[ 1., 2.], [ 2., 4.], [ 3., 6.], [ 4., 8.]], [[ 2., 4.], [ 4., 8.], [ 6., 12.], [ 8., 16.]], [[ 3., 6.], [ 6., 12.], [ 9., 18.], [ 12., 24.]]]) a = np.array([1, 2, 3]) b = np.array([1, 2, 3, 4]) c = np.array([1, 2]) T = np.zeros((a.shape[0], b.shape[0], c.shape[0])) for i in range(a.shape[0]): for j in range(b.shape[0]): for k in range(c.shape[0]): T[i, j, k] = a[i] * b[j] * c[k] T = a(1) · · · a(N) = a c b Ti,j,k = aibjck
  18. 18. TENSOR RANK ▸ Every tensor can be written as a sum of rank-1 tensors = a1 aJ c1 cJ b1 bJ + + ▸ Tensor rank: smallest number of rank-1 tensors 
 that can generate it by summing up X ⇡ RX r=1 a(1) r a(2) r · · · a(N) r ⌘ JA(1) , A(2) , · · · , A(N) K T ⇡ RX r=1 ar br cr ⌘ JA, B, CK
  19. 19. array([[[ 61., 82.], [ 74., 100.], [ 87., 118.], [ 100., 136.]], [[ 77., 104.], [ 94., 128.], [ 111., 152.], [ 128., 176.]], [[ 93., 126.], [ 114., 156.], [ 135., 186.], [ 156., 216.]]]) A = np.array([[1, 2, 3], [4, 5, 6]]).T B = np.array([[1, 2, 3, 4], [5, 6, 7, 8]]).T C = np.array([[1, 2], [3, 4]]).T T = np.zeros((A.shape[0], B.shape[0], C.shape[0])) for i in range(A.shape[0]): for j in range(B.shape[0]): for k in range(C.shape[0]): for r in range(A.shape[1]): T[i, j, k] += A[i, r] * B[j, r] * C[k, r] T = np.einsum('ir,jr,kr->ijk', A, B, C) : Kruskal Tensorbr cr ⌘ JA, B, CK
  20. 20. TENSOR FACTORIZATION ▸ CANDECOMP/PARAFAC factorization (CP) ▸ extensions of SVD / PCA / NMF of matrices NON-NEGATIVE TENSOR FACTORIZATION ▸ Decompose a non-negative tensor to 
 a sum of R non-negative rank-1 tensors arg min A,B,C kT JA, B, CKk with JA, B, CK ⌘ RX r=1 ar br cr subject to A 0, B 0, C 0
  21. 21. TENSOR FACTORIZATION: HOW TO Alternating Least Squares(ALS):
 Fix all but one factor matrix to which LS is applied min A 0 kT(1) A(C B)T k min B 0 kT(2) B(C A)T k min C 0 kT(3) C(B A)T k denotes the Khatri-Rao product, which is a column-wise Kronecker product, i.e., C B = [c1 ⌦ b1, c2 ⌦ b2, . . . , cr ⌦ br] T(1) = ˆA(ˆC ˆB)T T(2) = ˆB(ˆC ˆA)T T(3) = ˆC(ˆB ˆA)T Unfolded Tensor
 on the kth mode
  22. 22. F = [zeros(n, r), zeros(m, r), zeros(t, r)] FF_init = np.rand((len(F), r, r)) def iter_solver(T, F, FF_init): # Update each factor for k in range(len(F)): # Compute the inner-product matrix FF = ones((r, r)) for i in range(k) + range(k+1, len(F)): FF = FF * FF_init[i] # unfolded tensor times Khatri-Rao product XF = T.uttkrp(F, k) F[k] = F[k]*XF/(F[k].dot(FF)) # F[k] = nnls(FF, XF.T).T FF_init[k] = (F[k].T.dot(F[k])) return F, FF_init W W • VHT WHHT H H • WT V WTWH min A 0 kT(1) A(C B)T k min B 0 kT(2) B(C A)T k min C 0 kT(3) C(B A)T k arg min W,H kX WHk s. J. Kim and H. Park. Fast Nonnegative Tensor Factorization with an Active-set-like Method. In High- Performance Scientific Computing: Algorithms and Applications, Springer, 2012, pp. 311-326.
  23. 23. HOW TO INTERPRET: USER X TERM X TIME X is a 3-way tensor in which xnmt is 1 if the term m was used by user n at interval t, 0 otherwise ANxK is the the association of each user n to a factor k BMxK is the association of each term m to a factor k CTxK shows the time activity of each factor users users C = X A B (N×M×T) (T×K) (N×K) (M×K) terms tim e tim e terms factors
  24. 24. http://www.datainterfaces.org/2013/06/twitter-topic-explorer/
  25. 25. TOOLS FOR TENSOR DECOMPOSITION
  26. 26. TOOLS FOR TENSOR FACTORIZATION
  27. 27. TOOLS: THE PYTHON WORLD NumPy SciPy Scikit-Tensor (under development): github.com/mnick/scikit-tensor NTF: gist.github.com/panisson/7719245
  28. 28. TENSOR DECOMPOSITION OF WEARABLE SENSOR DATA
  29. 29. recorded proximity data direct proximity sensing
  30. 30. primary school Lyon, France primary school 231 students 10 teachers
  31. 31. Hong Kong primary school 900 students 65 teachers
  32. 32. SocioPatterns.org 7 years, 30+ deployments, 10 countries, 50,000+ persons • Mongan Institute for Health Policy, Boston • US Army Medical Component of the Armed Forces, Bangkok • School of Public Health of the University of Hong Kong • KEMRI Wellcome Trust, Kenya • London School for Hygiene and Tropical Medicine, London • Public Health England, London • Saw Swee Hock School of Public Health, Singapore
  33. 33. TENSORS
  34. 34. 0 1 0 1 0 1 0 1 0 FROM TEMPORAL GRAPHS TO 3-WAY TENSORS
  35. 35. temporal network tensorial representation tensor factorization factors communities temporal activity factorization quality A,B C tuning the complexity of the model nodes communities 1B 5A 3B 5B 2B 2A 3A 4A 1A 4B 50 60 70 80 0 10 20 30 404050 60 70 80 0 10 20 30 404050 60 70 80 0 10 20 30 4040 structures in temporal networks components nodes time time interval quality metrics component
  36. 36. L. Gauvin et al., PLoS ONE 9(1), e86028 (2014) 1B 5A 3B 5B 2B 2A 3A 4A 1A 4B TENSOR DECOMPOSITION OF SCHOOL NETWORK
  37. 37. https://github.com/panisson/ntf-school
  38. 38. ANOMALY DETECTION IN TEMPORAL NETWORKS
  39. 39. ANOMALY DETECTION IN TEMPORAL NETWORKS A. Sapienza et al. ”Detecting anomalies in time-varying networks using tensor decomposition”, ICDM Data Mining in Networks
  40. 40. anomaly detection in temporal networks
  41. 41. Laetitia Gauvin Ciro Cattuto Anna Sapienza .fit().predict() ( )
  42. 42. @apanisson panisson@gmail.com thank you

×