Self-organizing Network for Variable Clustering and Predictive Modeling

Self-organizing Network for Variable Clustering
and Predictive Analytics
Hui Yang, PhD
杨徽
Associate Professor
Complex Systems Monitoring, Modeling and Control Lab
The Pennsylvania State University
University Park, PA 16802
November 25, 2017

Outline
1 Introduction
2 Clustering
3 Self-organizing Variable Clustering
4 Case Studies - Theoretical and Practical
5 Conclusions and Future Directions

Introduction Clustering Self-organizing Variable Clustering Experiments Conclusions
Research Roadmap
Yang, Hui (Penn State) Self-organizing Analytics November 25, 2017 3 / 45

Research Projects
Manufacturing Processes, Precision Machining
Publications: Pattern Recognition, IEEE Transactions, Chaos, IIE Transactions

Research Projects
Electro-mechanical Processes, Biomanufacturing
Publications: IEEE Transactions, Physical Review, Physiological Measurements

Predictive Analytics - Manufacturing

Predictive Analytics - Healthcare

Big Data

Challenges - Curse of Dimensionality

Challenges - Dimensionality Reduction

Challenges - Variable Redundancy
Least square estimate: ˆβ = (X’X)−1
X y and var (β) = σ2 (X’X)−1

State of The Art
Variable Selection - Relevancy
Generalized linear models, shrinkage methods, best-subset selection,
ridge regression, LASSO, least angle regression and elastic net
Relevancy between predictors and response variables, while do not
explicitly consider redundancy among predictors.
Variable Clustering - Redundancy
Redundancy measures - linear correlation or mutual information
Linear correlation nonlinear interdependences among variables
Mutual information characterizes linear and nonlinear correlation, but
requires the stationarity assumption
Latent-variable methods - oblique principal component clustering
(linear projections of variables)
Need to ﬁll the gap

Research Objectives
Self-organizing Network for Variable Clustering
Network Theory
Nodes ⇐= Variables
Edge weight ⇐= Redundancy measure
Adjacency matrix ⇐= Redundancy matrix
Network community ⇐= Variable cluster
Self-organizing Variable Clustering
Redundancy measures - Nonlinear coupling analysis
Measure nonlinear interdependence structures among variables
Network embedding
Embed variables as nodes in a complex network
Self-organization
Nonlinear-coupling forces move nodes to derive network topology
Community detection
Variables are clustered as sub-network communities in the space

Review of Clustering
Data Clustering vs. Variable Clustering

Hierarchical Clustering: Agglomerative vs. Divisive
Dissimilarity measure - symmetrical matrix
Cluster Distance - single linkage, complete linkage, group average
Variable correlation - linear correlation, mutual information

Oblique principal component analysis
Principal component analysis
The ﬁrst two PCs, eigenvectors (or loading matrix in factor analysis)
Oblique rotation
Oblique rotation of eigenvectors, Z, to obtain the B
B = ZΩ
max
Ω
p
i=1


q
j=1
b4
ij −


q
j=1
b2
ij


2

Cluster assignment
Calculate the linear correlation between all variables and rotated
components, and then assign each variable to one of two clusters
based on the higher squared correlation.
Recursive partition
Repeat the binary split for each cluster.

Demo - Nonlinear Correlation
Cluster 1 (5 variables: 1∼5)
x1, |x1| , x2
1, x3
1, x4
1
x1 ∼ N(0, 1)
x2, |x2| , x2
2, x3
2, x4
2
x2 ∼ N(0, 1)
x3, x3(t + 3), x3(t + 5), x3(t + 7), x3(t + 9)
x3(n + 1) = 3.8 × x3(n) × (1 − x3(n)), logistic map
x4, x4(t + 10), x4(t + 20), x4(t + 30), x4(t + 40)
x4(n) = 1.905x4(n − 1) − 0.4x4(n − 2) + 0.7ε + 0.3x2
lorenz
x4 is a second-order autoregressive variable that is nonlinearly coupled
with the x component of nonlinear Lorenz system

Demo - Results
Hierarchical Clustering and Oblique PCA Clustering - Failed
Nonlinear and asymmetric interdependence structures

Research Methodology
Nonlinear Variable - State Space Reconstruction
Takens’ embedding theorem

Research Methodology
Nonlinear Interdependence
Cross recurrences of two variables in the state space
ˆIx1x2 =
rm (x2) − dm (x2 | x1)
rm (x2) − dm (x2) m
rm (x2) is the average distance from x2(m) to k randomly chosen
x2(i), rm (x2) = 1
k
k
i=1 (x2(m) − x2(i))2
dm (x2 | x1) is the average conditional distance from x2(m) to k
samples of x2(i) whose indices i ∈ {n1, · · · , nk} are from the
recurrence set Γ (x1(m)) of the variable x1
dm (x2 | x1) =
1
k
i∈Γ(x1(m))
(x2(m) − x2(i))2
dm (x2) is the average distance from x2(m) to k nearest neighbors of
x2(m) in the state space

Network Topology
Nonlinear Interdependence Matrix ⇒ Network Topology
From the nonlinear interrelationship of variables, derive the topological
structures for network and identify sub-network communities.

Self-Organizing Variable Clustering
Spring-Electrical Model
Nodes − electrically charged particles
Edges − springs between nodes
The repulsive force exists between any pair of nodes
fr(i, j) = −
1
s(i) − s(j) 2
×
1
eα|Îx1x2 |
The attractive force exists only between nodes that have a relation of
nonlinear interdependence
fa(i, j) = s(i) − s(j) 2
× eγ|Îx1x2 |, Îx1x2
= 0
The combined force at a node i: f(i, s, α, γ)
i=j
−
(s(i) − s(j))
s(i) − s(j) 3
×
1
eα|Îx1x2 |
+
i↔j
s(i) − s(j) × (s(i) − s(j)) × eγ|Îx1x2 |

Self-organizing Variable Clustering
Minimal energy network: s∗ = arg mins i f(i, s, α, γ)2

Predictive Modeling with Grouped Variables
Gram-Schmidt orthonormalization
For variables x1, · · · , xk in one cluster
v1 = x1, w1 = v1
v1
v2 = x2 − x2, w1 w1, w2 = v2
v2
· · ·
vn = xn − n−1
i=1 xn, wi wi, wn = vn
vn
Group elastic-net model
min
β
−
n
i=1
[yi log (hβ (w, i)) + (1 − yi) log (1 − hβ (w, i))]
hβ (w, i) =
1
1 + exp − β0 + M
m=1
Km
k=1 wk(i)βk
s.t.
M
m=1
Km
k=1
αβ2
mk + (1 − α) |βmk| ≤ λ

Simulation Experiments
20 variables + b-spline basis expansion =⇒ 60 variables
Response variable ⇐= a sparse set of predictor variables
hβ (x) =
1
1 + exp − β0 + 20×ps
i=1 xiβi + cε

Experimental Results
Table I. Averages and standard deviation of prediction errors in the
experimental study with 100 replications

Case Study 1 - Nonlinear Recurrence Network
Poincar´e Recurrence Theorem
Let T be a measure-preserving transformation of a probability space
(X, P) and let A ⊂ X be a measurable set. Then, for any natural
number N ∈ N, the trajectory will eventually reappear at
neighborhood A of former states:
Pr({x ∈ A|{Tn
(x)}n≥N ⊂ XA}) = 0
(a) Stamping Machine, from Dr. Jianjun Shi (b) Biological System

Recurrence Plot
Recurrence dynamics of nonlinear and nonstationary systems
R(i, j) = Θ(ε − x(i) − x(j) )

Structures in Recurrence Plots
Small-scale structures: single dots, diagonal and vertical lines
Large-scale structures: homogenous, periodic and disrupted patterns

Recurrence Networks
K-nearest Neighbor Network [Small, 2008]
Directed network
Each node is connected to k nearest nodes in the network
A fixed number of neighbors
Recurrence Network [Marwan, 2008]
Undirected network
Each node may have a different number of links in the network
A fixed size of the neighborhood
Other Approaches
Transition networks [Nicolis, 2005], cycle networks [Zhang, 2006],
correlation networks [Yang, 2008], Visibility graphs [Lacasa, 2008].
Donner, Marwan et. al proposes recurrence networks for a unifying
framework to transform nonlinear time series into complex networks.

K−Nearest Neighbor Networks [Small, 2008]
Given a time series: X = {x1, x2, . . . , xN }
Sate space reconstruction: xi = xi, xi+τ , . . . , xi+τ(m−1)
A node xi is connected to its k nearest neighbors, but excluding the
nodes in the same strand of the trajectory.
Network representation:
Ai,j =
1, |j − i| > ∆t & j ∈ {k nearest neighbors of i}
0, otherwise

Recurrence Networks [Marwan, 2008]
Given a time series: X = {x1, x2, . . . , xN }
Sate space reconstruction: xi = xi, xi+τ , . . . , xi+τ(m−1)
The recurrences are treated as links in the network.
The adjacency matrixA is obtained from the recurrence matrix by
removing the diagonal identities:
Ai,j = Ri,j − Ii,j
Ri,j = Θ(ε − xi − xj )

Self-organizing Network

KNN Network vs. Recurrence Network

Case Study 2
Model-Driven Predictive Analytics of Space-time Vectorcardiogram Signals

VCG Modeling
Multiscale basis function modeling of 3D VCG signals

Model Parameters
Figure: The KS statistics for model-driven parametric features.

Variable Clustering
Figure: (a) Nonlinear interdependence matrix; (b) Self-organized clustering of
model-based parametric features.

Performance Comparison
Figure: Averages and standard deviation of prediction errors in the real-world case study
that extract a sparse set of model parameters from VCG signals for the identiﬁcation of
myocardial infarctions.

Summary
Challenges
Complex Systems =⇒ Advanced Sensing =⇒ Big Data
Large amounts of variables =⇒ curse of dimensionality and redundancy
Nonlinear and asymmetric interdependence =⇒ predictive analytics
Methodology - Self-organizing Variable Clustering
Nonlinear coupling analysis of variables
Network embedding of variables
Self-organizing derivation of network topology
Network community detection and predictive modeling
Results
Simulation experiments: outperform traditional clustering algorithms
VCG study =⇒ an average sensitivity of 96.80% and an average
speciﬁcity of 92.62% in the identiﬁcation of myocardial infarctions
Broad applications

References
G. Liu and H. Yang, “Self-organizing network for group variable selection and predictive
modeling,”Annals of Operations Research, 2017. DOI: 10.1007/s10479-017-2442-2
Y. Chen and H. Yang, “A novel information-theoretic approach for variable clustering and
predictive modeling using Dirichlet process mixtures,”Scientific Reports 6, 38913, 2016.
DOI: www.nature.com/articles/srep38913
H. Yang and G. Liu, “Self-organized topology of recurrence-based complex
network,”Chaos, Vol. 23, No. 4, 043116, 2013. DOI: 10.1063/1.4829877
G. Liu and H. Yang, “Multiscale adaptive basis function modeling of spatiotemporal
cardiac electrical signals,”IEEE Journal of Biomedical and Health Informatics, Vol. 17,
No. 2, p484-492, 2013. DOI: 10.1109/JBHI.2013.2243842
H. Yang, C. Kan, G. Liu and Y. Chen, “Spatiotemporal differentiation of myocardial
infarctions,”IEEE Transactions on Automation Science and Engineering, Vol. 10, No. 4,
p938-947, 2013. DOI: 10.1109/TASE.2013.2263497
G. Liu and H. Yang, “A Self-organizing Method for Predictive Modeling with
Highly-redundant Variables,”Proceedings of the 11th Annual IEEE Conference on
Automation Science and Engineering (CASE), August 24-28, 2015, Gothenberg, Sweden.
DOI: 10.1109/CoASE.2015.7294243
G. Liu and H. Yang, “Model-driven Parametric Monitoring of High-dimensional Nonlinear
Functional Profiles,”Proceedings of the 10th Annual IEEE Conference on Automation
Science and Engineering (CASE), August 18-22, 2014, Taipei, Taiwan. DOI:
10.1109/CoASE.2014.6899408

Acknowledgements
NSF CAREER Award
NSF CMMI-1454012
NSF IIP-1447289
NSF CMMI-1266331
NSF IOS-1146882
James A. Haley Veterans’ Hospital

Contact Information
Hui Yang, PhD
Associate Professor
Complex Systems Monitoring Modeling and Control Laboratory
Harold and Inge Marcus Department of Industrial and Manufacturing
Engineering
The Pennsylvania State University
Tel: (814) 865-7397
Fax: (814) 863-4745
Email: huy25@psu.edu
Web: http://www.personal.psu.edu/huy25/

Questions?

Self-organizing Network for Variable Clustering and Predictive Modeling

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Self-organizing Network for Variable Clustering and Predictive Modeling

Similar to Self-organizing Network for Variable Clustering and Predictive Modeling (20)

Recently uploaded

Recently uploaded (20)

Self-organizing Network for Variable Clustering and Predictive Modeling