Exploring temporal graph data with Python:  a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

EXPLORING TEMPORAL GRAPH DATA WITH PYTHON 
A STUDY ON TENSOR DECOMPOSITION OF WEARABLE SENSOR DATA
ANDRÉ PANISSON
@apanisson
ISI Foundation, Torino, Italy & New York City

WHY TENSOR FACTORIZATION + PYTHON?
▸ Matrix Factorization is already used in many ﬁelds
▸ Tensor Factorization is becoming very popular 
for multiway data analysis
▸ TF is very useful to explore temporal graph data
▸ But still, the most used tool is Matlab
▸ There’s room for improvement in  
the Python libraries for TF
▸ Study: NTF of wearable sensor data

TENSORS AND TENSOR DECOMPOSITION

FACTOR ANALYSIS
Spearman ~1900
X≈WH
Xtests x subjects ≈ Wtests x intelligences Hintelligences x subjects
Spearman, 1927: The abilities of man.
≈
tests
subjects subjects
tests
Int.
Int.
X W
H

TOPIC MODELING / LATENT SEMANTIC ANALYSIS
Blei, David M. "Probabilistic topic models." Communications of the ACM 55.4 (2012): 77-84.
. , ,
. , ,
. . .
gene
dna
genetic
life
evolve
organism
brai n
neuron
nerve
data
number
computer
. , ,
Topics Documents
Topic proportions and
assignments
0.04
0.02
0.01
0.04
0.02
0.01
0.02
0.01
0.01
0.02
0.02
0.01
data
number
computer
. , ,
0.02
0.02
0.01

TOPIC MODELING / LATENT SEMANTIC ANALYSIS
X≈WH
Non-negative Matrix Factorization (NMF):
(~1970 Lawson, ~1995 Paatero, ~2000 Lee & Seung)
2005 Gaussier et al. "Relation between PLSA and NMF and implications."
arg min
W,H
kX WHk s. t. W, H 0
≈
documents
terms terms
documents
topic
topic
Sparse 
Matrix!

NON-NEGATIVE MATRIX FACTORIZATION (NMF)
NMF gives Part based representation 
(Lee & Seung – Nature 1999)
NMF
=×
Original
PCA
×
=
NMF is equivalent to Spectral Clustering 
(Ding et al. - SDM 2005)
W W •
VHT
WHHT
H H •
WT
V
WTWH
arg min
W,H
kX WHk s. t. W, H 0

from sklearn import datasets, decomposition
digits = datasets.load_digits()
A = digits.data
nmf = decomposition.NMF(n_components=10)
W = nmf.fit_transform(A)
H = nmf.components_
plt.rc("image", cmap="binary")
plt.figure(figsize=(8,4))
for i in range(10):
plt.subplot(2,5,i+1)
plt.imshow(H[i].reshape(8,8))
plt.xticks(())
plt.yticks(())
plt.tight_layout()

BEYOND MATRICES: HIGH DIMENSIONAL DATASETS
Cichocki et al. Nonnegative Matrix and Tensor Factorizations
Environmental analysis
▸ Measurement as a function of (Location, Time, Variable)
Sensory analysis
▸ Score as a function of (Food sample, Judge, Attribute)
Process analysis
▸ Measurement as a function of (Batch, Variable, time)
Spectroscopy
▸ Intensity as a function of (Wavelength, Retention, Sample, Time,
Location, …)
…
MULTIWAY DATA ANALYSIS

DIGITAL TRACES FROM SENSORS AND IOT
USER
POSITION
TIME
…

Sidiropoulos,
Giannakis and Bro,
IEEE Trans. Signal
Processing, 2000.
Mørup, Hansen and Arnfred,
Journal of Neuroscience
Methods, 2007.
Hazan, Polak and
Shashua, ICCV 2005.
Bader, Berry, Browne,
Survey of Text Mining:
Clustering, Classification,
and Retrieval, 2nd Ed.,
2007.
Doostan and Iaccarino, Journal of
Computational Physics, 2009.
Andersen and Bro, Journal
of Chemometrics, 2003.
• Chemometrics
– Fluorescence Spectroscopy
– Chromatographic Data
Analysis
• Neuroscience
– Epileptic Seizure Localization
– Analysis of EEG and ERP
• Signal Processing
• Computer Vision
– Image compression,
classification
– Texture analysis
• Social Network Analysis
– Web link analysis
– Conversation detection in
emails
– Text analysis
• Approximation of PDEs
data reconstruction, cluster analysis, compression,  
dimensionality reduction, latent semantic analysis, …

WHAT IS A TENSOR?
A tensor is a multidimensional array 
E.g., three-way tensor:
Mode-1
Mode-2
Mode-3
651a

FIBERS AND SLICES
Cichocki et al. Nonnegative Matrix and Tensor Factorizations
Column (Mode-1) Fibers Row (Mode-2) Fibers Tube (Mode-3) Fibers
Horizontal Slices Lateral Slices Frontal Slices
A[:, 4, 1] A[:, 1, 4] A[1, 3, :]
A[1, :, :] A[:, :, 1]A[:, 1, :]

TENSOR UNFOLDINGS: MATRICIZATION AND VECTORIZATION
Matricization: convert a tensor to a matrix
Vectorization: convert a tensor to a vector

>>> T = np.arange(0, 24).reshape((3, 4, 2))
>>> T
array([[[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7]],
[[ 8, 9],
[10, 11],
[12, 13],
[14, 15]],
[[16, 17],
[18, 19],
[20, 21],
[22, 23]]])
OK for dense tensors: use a combination  
of transpose() and reshape()
Not simple for sparse datasets (e.g.: <authors, terms, time>)
for j in range(2):
for i in range(4):
print T[:, i, j]
[ 0 8 16]
[ 2 10 18]
[ 4 12 20]
[ 6 14 22]
[ 1 9 17]
[ 3 11 19]
[ 5 13 21]
[ 7 15 23]
# supposing the existence of unfold
>>> T.unfold(0)
array([[ 0, 2, 4, 6, 1, 3, 5, 7],
[ 8, 10, 12, 14, 9, 11, 13, 15],
[16, 18, 20, 22, 17, 19, 21, 23]])
>>> T.unfold(1)
array([[ 0, 8, 16, 1, 9, 17],
[ 2, 10, 18, 3, 11, 19],
[ 4, 12, 20, 5, 13, 21],
[ 6, 14, 22, 7, 15, 23]])
>>> T.unfold(2)
array([[ 0, 8, 16, 2, 10, 18, 4, 12, 20, 6, 14, 22],
[ 1, 9, 17, 3, 11, 19, 5, 13, 21, 7, 15, 23]])

RANK-1 TENSOR
The outer product of N vectors results in a rank-1 tensor
array([[[ 1., 2.],
[ 2., 4.],
[ 3., 6.],
[ 4., 8.]],
[[ 2., 4.],
[ 4., 8.],
[ 6., 12.],
[ 8., 16.]],
[[ 3., 6.],
[ 6., 12.],
[ 9., 18.],
[ 12., 24.]]])
a = np.array([1, 2, 3])
b = np.array([1, 2, 3, 4])
c = np.array([1, 2])
T = np.zeros((a.shape[0], b.shape[0], c.shape[0]))
for i in range(a.shape[0]):
for j in range(b.shape[0]):
for k in range(c.shape[0]):
T[i, j, k] = a[i] * b[j] * c[k]
T = a(1)
· · · a(N)
=
a
c
b
Ti,j,k = aibjck

TENSOR RANK
▸ Every tensor can be written as a sum of rank-1 tensors
=
a1 aJ
c1 cJ
b1 bJ
+ +
▸ Tensor rank: smallest number of rank-1 tensors  
that can generate it by summing up
X ⇡
RX
r=1
a(1)
r a(2)
r · · · a(N)
r ⌘ JA(1)
, A(2)
, · · · , A(N)
K
T ⇡
RX
r=1
ar br cr ⌘ JA, B, CK

array([[[ 61., 82.],
[ 74., 100.],
[ 87., 118.],
[ 100., 136.]],
[[ 77., 104.],
[ 94., 128.],
[ 111., 152.],
[ 128., 176.]],
[[ 93., 126.],
[ 114., 156.],
[ 135., 186.],
[ 156., 216.]]])
A = np.array([[1, 2, 3],
[4, 5, 6]]).T
B = np.array([[1, 2, 3, 4],
[5, 6, 7, 8]]).T
C = np.array([[1, 2],
[3, 4]]).T
T = np.zeros((A.shape[0], B.shape[0], C.shape[0]))
for i in range(A.shape[0]):
for j in range(B.shape[0]):
for k in range(C.shape[0]):
for r in range(A.shape[1]):
T[i, j, k] += A[i, r] * B[j, r] * C[k, r]
T = np.einsum('ir,jr,kr->ijk', A, B, C)
: Kruskal Tensorbr cr ⌘ JA, B, CK

TENSOR FACTORIZATION
▸ CANDECOMP/PARAFAC factorization (CP)
▸ extensions of SVD / PCA / NMF of matrices
NON-NEGATIVE TENSOR FACTORIZATION
▸ Decompose a non-negative tensor to  
a sum of R non-negative rank-1 tensors
arg min
A,B,C
kT JA, B, CKk
with JA, B, CK ⌘
RX
r=1
ar br cr
subject to A 0, B 0, C 0

TENSOR FACTORIZATION: HOW TO
Alternating Least Squares(ALS): 
Fix all but one factor matrix to which LS is applied
min
A 0
kT(1) A(C B)T
k
min
B 0
kT(2) B(C A)T
k
min
C 0
kT(3) C(B A)T
k
denotes the Khatri-Rao product, which is a
column-wise Kronecker product, i.e., C B = [c1 ⌦ b1, c2 ⌦ b2, . . . , cr ⌦ br]
T(1) = Â(ˆC ˆB)T
T(2) = ˆB(ˆC Â)T
T(3) = ˆC(ˆB Â)T
Unfolded Tensor 
on the kth mode

F = [zeros(n, r), zeros(m, r), zeros(t, r)]
FF_init = np.rand((len(F), r, r))
def iter_solver(T, F, FF_init):
# Update each factor
for k in range(len(F)):
# Compute the inner-product matrix
FF = ones((r, r))
for i in range(k) + range(k+1, len(F)):
FF = FF * FF_init[i]
# unfolded tensor times Khatri-Rao product
XF = T.uttkrp(F, k)
F[k] = F[k]*XF/(F[k].dot(FF))
# F[k] = nnls(FF, XF.T).T
FF_init[k] = (F[k].T.dot(F[k]))
return F, FF_init
W W •
VHT
WHHT
H H •
WT
V
WTWH
min
A 0
kT(1) A(C B)T
k
min
B 0
kT(2) B(C A)T
k
min
C 0
kT(3) C(B A)T
k
arg min
W,H
kX WHk s.
J. Kim and H. Park. Fast Nonnegative Tensor Factorization with an Active-set-like Method. In High-
Performance Scientiﬁc Computing: Algorithms and Applications, Springer, 2012, pp. 311-326.

HOW TO INTERPRET: USER X TERM X TIME
X is a 3-way tensor in which xnmt is 1 if the term m was used by user
n at interval t, 0 otherwise
ANxK
is the the association of each user n to a factor k
BMxK
is the association of each term m to a factor k
CTxK
shows the time activity of each factor
users
users
C
=
X
A
B
(N×M×T)
(T×K)
(N×K)
(M×K)
terms
tim
e
tim
e
terms
factors

http://www.datainterfaces.org/2013/06/twitter-topic-explorer/

TOOLS FOR TENSOR DECOMPOSITION

TOOLS FOR TENSOR FACTORIZATION

TOOLS: THE PYTHON WORLD
NumPy SciPy
Scikit-Tensor (under development):
github.com/mnick/scikit-tensor
NTF: gist.github.com/panisson/7719245

TENSOR DECOMPOSITION OF WEARABLE SENSOR DATA

recorded proximity data
direct proximity
sensing

primary
school
Lyon, France
primary school
231 students
10 teachers

Hong Kong
primary school
900 students
65 teachers

SocioPatterns.org
7 years, 30+ deployments, 10 countries, 50,000+ persons
• Mongan Institute for Health Policy, Boston

• US Army Medical Component of the Armed Forces, Bangkok

• School of Public Health of the University of Hong Kong

• KEMRI Wellcome Trust, Kenya

• London School for Hygiene and Tropical Medicine, London

• Public Health England, London

• Saw Swee Hock School of Public Health, Singapore

0 1 0
1 0 1
0 1 0
FROM TEMPORAL GRAPHS TO 3-WAY TENSORS

temporal network
tensorial
representation
tensor factorization
factors
communities temporal activity
factorization
quality
A,B C
tuning the complexity
of the model
nodes
communities
1B
5A
3B
5B
2B
2A
3A
4A
1A
4B
50
60
70
80
0
10
20
30
404050
60
70
80
0
10
20
30
404050
60
70
80
0
10
20
30
4040
structures in temporal networks
components
nodes
time
time interval
quality metrics
component

L. Gauvin et al., PLoS ONE 9(1), e86028 (2014)
1B
5A
3B
5B
2B
2A
3A
4A
1A
4B
TENSOR DECOMPOSITION OF SCHOOL NETWORK

https://github.com/panisson/ntf-school

ANOMALY DETECTION
IN TEMPORAL NETWORKS

ANOMALY DETECTION IN TEMPORAL NETWORKS
A. Sapienza et al. ”Detecting anomalies in time-varying networks using tensor decomposition”, ICDM Data Mining in Networks

anomaly detection in temporal networks

Laetitia Gauvin Ciro Cattuto Anna Sapienza
.fit().predict()
( )

@apanisson
panisson@gmail.com
thank you

Exploring temporal graph data with Python:  a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Exploring temporal graph data with Python:  a study on tensor decomposition of wearable sensor data (PyData NYC 2015)

Similar to Exploring temporal graph data with Python:  a study on tensor decomposition of wearable sensor data (PyData NYC 2015) (20)

Recently uploaded

Recently uploaded (20)