Given a graph G and a set of attributes X for each
node, the task is to infer Y the class labels of the nodes
class
n1
n2
Training n3
.
.
.3 … B .1 … B .
Ignores temporal information! .1 … B .4 … A
.5 … A
Learning Model Inference
.3 … A .9 … A
.2 … A .4 … A
.1 … B .4 … A
? ?
? ?
.3 … A .9 … A
Testing
Problem
Attribute prediction in time-varying networks
Previous work
Contributions - only models temporal edges
- predicts only static attribute
Consider the space - only single-classifier
of temporal-relational representations
- limited representation!
1. Propose temporal-relational classification framework
2. Temporal-relational ensembles
Edges
Most real networks are dynamic
• social, information, biological,...
⋯ ⋯ ⋯
Time
Edges
Most real networks are dynamic Node Attributes
.3 … C .2 … C .3 … C
.1 … C .6 … L
.5 … L .2 … C .2 … C
.5 … L .3 … L .6 … L
.6 … L .5 … L .2 … C
.7 … L .4 … L .5 … L
⋯ ⋯ ⋯
Time
Edges
Most real networks are dynamic Node Attributes
Edge Attributes
- … 5 + … 2 + … 8
+ … 8
+ … 4
+ … 2 + … 1
⋯ ⋯ ⋯
Time
Attributes of linked nodes are correlated
Temporal locality: Recent events are more influential than
distant ones Time
u u u u u u u u
τ2 τ2 τ1 τ1 τ1 τ1 τ1
v v v v v v v v
Temporal recurrence: A regular series of events is more likely
to indicate a stronger relationship than an isolated event
Time
u u u u u u u u
v v v v v v v v
X1 X2 X3 X4 ... Xm
Input ...
Temporal-Relational ...
...
Classification Framework …
…
Dt= ...
…
...
Edges Attributes Nodes ⋮ ⋮ ⋮ ⋮ ⋱⋮
Gt Xt
where t = 1,...,tmax
- Single timestep
Temporal Granularity - Window of timesteps
- Union (all timesteps)
For each: edges, attrs,...
- Exponential 1. Select the past information
Temporal Influence
⋮ 2. Select weighting
- Uniform
3. Select relational classifier
- Relational Probability Trees
Relational Classifier
⋮
- Relational Bayes Classifier
Predictions
The attribute values are weighted by the product of their attribute weight and the
corresponding link weight
TVRC (weights only edges) TENC (edges & attributes)
Weighted mode feature:
TVRC red (τ3 ) TENC blue(τ2)
Learn model on Dt (or set of timesteps) apply it to Dt+1
Use k-fold cross-validation to learn “best” model (k=10)
Avg AUC over 10 trials
PyComm (Python Developer Communication Network)
• Extracted emails and bug discussions from the open-source python
development environment
Cora Citation Network
PyComm (Communication Network) Cora Citiation Network
Developers: 1,914 Papers: 16,153
Emails: 13,181 References: 29,603
Bug Messages: 69,435 Authors 21,976
Timespan: Feb 2007 – May 2008 Timespan: 1981 - 1998
In this talk we focus on the more difficult dynamic prediction task
Dataset Prediction Task
PyComm Is developer productive or not in t+1? (closes bug) [Dynamic]
Cora 1. Is paper AI or not?
2. Predict paper topic (1 out of 7 ML topics) [Static]
The simplest temporal
relational model still
outperforms the others
1.00
TVRC Most primitive
RPT
Intrinsic temporal-relational model
0.98
Int+time
Int+graph
Int+topics
AUC
0.96
Important to moderate the
relational information with
0.94
the temporal info!
0.92
TVRC RPT DT w/ additional attrs
Complexity in representation/accuracy Temporal influence of edges
and attributes
0.95
TENC
TVRC Temporal influence of edges
TVRC+Union
0.90
Window Model strictly based on
Union Model temporal-granularity
Very efficient
0.85
AUC
0.80
Main finding: Accuracy generally
0.75
increases as more temporal information is
included in the representation
0.70
0.65
T=1 T=2 T=3 T=4
Can we increase accuracy by creating ensembles
using temporal-relational information?
Key Idea: Introduce variance in both the temporal and
relational information
1. Transform the temporal-relational representation using
some process (e.g., randomization)
2. Learn models (on different training data)
3. Consider a weighted vote from model predictions
1. Transforming the Temporal Nodes and Links
• sampling nodes and edges from discrete timesteps
2. Transforming the Temporal Feature Space
• randomization of attributes (locally)
• varying temporal influence or granularity
• resample from temporal features
Many possibilities!
3. Adding Temporal Noise or Randomness
• randomly permute node feature values
• randomly permute links across time
4. Transforming Time-varying Labels
• randomly permuting previously learned labels at t-1 (or more distant) with
the true labels at t
5. Multiple Classification Algorithms
• sampling from a set of classifiers (RPT, RBC,...)
TIME
.3 … C .2 … C .3 … C
.2 … C .1 … C .6 … L
.5 … L .2 … C .2 … C
.5 … L .3 … L .6 … L
.6 … L .5 … L .2 … C
.7 … L .4 … L .5 … L
⋯ ⋯ ⋯
1. Select edge Randomization Edges
2. Randomly permute it over time
.3 … C .2 … C .3 … C
.2 … C .1 … C .6 … L
.5 … L .2 … C .2 … C
.5 … L .3 … L .6 … L
.6 … L .5 … L .2 … C
.7 … L .4 … L .5 … L
⋯ ⋯ ⋯
TIME
.3 … C .2 … C .3 … C
.2 … C .1 … C .6 … L
.5 … L .2 … C .2 … C
.5 … L .3 … L .6 … L
.6 … L .5 … L .2 … C
.7 … L .4 … L .5 … L
⋯ ⋯ ⋯
1. Select node and attribute Randomization Edges
2. Randomize its values over time Attributes
.3 … C .2 … C .3 … C
.2 … C .1 … C .6 … L
.5 … L .2 … C .2 … C
.5 … L .3 … L .6 … L
.6 … L .5 … L .2 … C
.5 … L .7 … L .4 … L
⋯ ⋯ ⋯
TVRC improvement significant at p < 0.05,
16% reduction in error Temporal-relational ensemble
Relational ensemble
1.00
Non-relational ensemble
TVRC
RPT
DT
0.98
Main Findings:
Temporal-relational ensembles can
AUC
0.96
improve over relational-ensembles,
even using the minimum temporal
information!
0.94
0.92
T=1 T=2 T=3 T=4 Avg
1.0
TVRC
RPT
Slight improvement DT
Significant improvement!
0.9
Main Finding:
Temporal-relational ensembles
AUC
0.8
perform significantly better
than the others when there
are more dynamics
0.7
0.6
Communication Team Centrality Topics
Relatively stationary attributes
Frequently changing attributes
Main Contributions
Proposed a general time-evolving relational classification
framework for modeling temporal edges, attributes, and
nodes
Proposed classes of temporal ensemble methods
Main Findings
Temporal-relational models are more accurate than relational
and non-relational models
Incorporating temporal information can increase accuracy of
classification for both static and dynamic prediction tasks
Temporal-relational ensembles yield better accuracy than
relational and non-relational ensembles
U. Sharan et al. (ICDM 2008)
Differences
Our methods generalize for any dynamics
• They model only edges
Propose and use temporal-relational ensembles
Static and dynamic prediction tasks
TIME
Window Window Window
G1 G2 G3 Summary Graph
2. Temporal Granularity 3. Temporal Influence
1. For each piece of relational data
• attributes, edges, or nodes
2. Select the amount of past information to use
3. Select how the past information is weighted
• Weight function and decay parameter
TIME
Window
Xi1 Xi2 Xi3 Summary Attributes
Select the timesteps for learning Temporal weighting
/summarization
We can also do this for each attribute!
where t = 1,...,tmax
Temporal-Relational Classification Framework
Relational Information Edges Attributes Nodes
Temporal Granularity Timestep Window Union
Temporal Influence Uniform Weighting
Relational Classification RPT ⋯ RBC
Predictions
Previous example was actually TVRC
Relational Information Edges Attributes Nodes
Temporal Granularity Timestep Window Union
Temporal Influence Uniform Weighting
Relational Classifier RPT ⋯ RBC
Let Gt = (Vt , Et ,Wt E ) be the relational graph at time step t
We define the summary graph GS(t ) = (VS(t ), ES(t ),WS(t) ) as the weighted
E
sum of graphs up to time t as follows:
VS(t ) = V1 ÈV2 ÈÈVt
ES(t ) = E1 È E2 ÈÈ Et t
WS(t ) = a1W1E + a 2W2E ++ atWt E = å K E (Gi ;t, q )
E
i=1
The α weights determine the contribution of each snapshot
We use the kernel function K with parameters θ to determines the influence of
each edge in the summary
Weights can be viewed as probabilities that a relationship (or attribute value) is
still active at the current time step t, given that it was observed at time (t-k)
Exponential
t1 t2 t3 t4 Weighting (1- l )t-i l
Node Attribute τ1 τ2 τ2 τ1 ⇒ τ1: (1−λ)3λ + λ
τ2: λ((1−λ)2 + (1−λ))
ω
Edge Attribute τ1 τ2 τ1
⇒ τ1: (1−λ)3λ + λ
τ2: (1−λ)λ
Edge ⇒ ω = θ(1 + (1−θ) + (1−θ)3)
Weights can be viewed as probabilities that a relationship (or attribute value) is still
active at the current time step t, given that it was observed at time (t-k)