Advertisement

Time-Evolving Relational Classification and Ensemble Methods

Research Scientist
Jun. 2, 2012
Advertisement

More Related Content

Advertisement

Time-Evolving Relational Classification and Ensemble Methods

  1. Time-evolving Relational Classification and Ensemble Methods Ryan A. Rossi Jennifer Neville
  2.  Given a graph G and a set of attributes X for each node, the task is to infer Y the class labels of the nodes class n1 n2 Training n3 . . .3 … B .1 … B . Ignores temporal information! .1 … B .4 … A .5 … A Learning Model Inference .3 … A .9 … A .2 … A .4 … A .1 … B .4 … A ? ? ? ? .3 … A .9 … A Testing
  3. Problem Attribute prediction in time-varying networks Previous work Contributions - only models temporal edges - predicts only static attribute Consider the space - only single-classifier of temporal-relational representations - limited representation! 1. Propose temporal-relational classification framework 2. Temporal-relational ensembles
  4. What types of information can change?
  5. Edges  Most real networks are dynamic • social, information, biological,... ⋯ ⋯ ⋯ Time
  6. Edges  Most real networks are dynamic Node Attributes .3 … C .2 … C .3 … C .1 … C .6 … L .5 … L .2 … C .2 … C .5 … L .3 … L .6 … L .6 … L .5 … L .2 … C .7 … L .4 … L .5 … L ⋯ ⋯ ⋯ Time
  7. Edges  Most real networks are dynamic Node Attributes Edge Attributes - … 5 + … 2 + … 8 + … 8 + … 4 + … 2 + … 1 ⋯ ⋯ ⋯ Time
  8.  Attributes of linked nodes are correlated  Temporal locality: Recent events are more influential than distant ones Time u u u u u u u u τ2 τ2 τ1 τ1 τ1 τ1 τ1 v v v v v v v v  Temporal recurrence: A regular series of events is more likely to indicate a stronger relationship than an isolated event Time u u u u u u u u v v v v v v v v
  9. How to represent the data with these in mind?
  10. X1 X2 X3 X4 ... Xm Input ... Temporal-Relational ... ... Classification Framework … … Dt= ... … ... Edges Attributes Nodes ⋮ ⋮ ⋮ ⋮ ⋱⋮ Gt Xt where t = 1,...,tmax - Single timestep Temporal Granularity - Window of timesteps - Union (all timesteps) For each: edges, attrs,... 1. Select the past information
  11. TIME Window Window Window G1 G2 G3
  12. X1 X2 X3 X4 ... Xm Input ... Temporal-Relational ... ... Classification Framework … … Dt= ... … ... Edges Attributes Nodes ⋮ ⋮ ⋮ ⋮ ⋱⋮ Gt Xt where t = 1,...,tmax - Single timestep Temporal Granularity - Window of timesteps - Union (all timesteps) For each: edges, attrs,... - Exponential 1. Select the past information Temporal Influence ⋮ 2. Select weighting - Uniform
  13. TIME Window G1 G2 G3 Summary Graph Temporal Granularity Temporal Influence 1. Create summary graph 2. Assigns weights to edges (recent/frequent events larger weights) • Weight functions: Exponential, linear, uniform,...
  14. X1 X2 X3 X4 ... Xm Input ... Temporal-Relational ... ... Classification Framework … … Dt= ... … ... Edges Attributes Nodes ⋮ ⋮ ⋮ ⋮ ⋱⋮ Gt Xt where t = 1,...,tmax - Single timestep Temporal Granularity - Window of timesteps - Union (all timesteps) For each: edges, attrs,... - Exponential 1. Select the past information Temporal Influence ⋮ 2. Select weighting - Uniform 3. Select relational classifier - Relational Probability Trees Relational Classifier ⋮ - Relational Bayes Classifier Predictions
  15. The attribute values are weighted by the product of their attribute weight and the corresponding link weight TVRC (weights only edges) TENC (edges & attributes) Weighted mode feature: TVRC  red (τ3 ) TENC  blue(τ2)
  16.  Learn model on Dt (or set of timesteps)  apply it to Dt+1  Use k-fold cross-validation to learn “best” model (k=10)  Avg AUC over 10 trials
  17.  PyComm (Python Developer Communication Network) • Extracted emails and bug discussions from the open-source python development environment  Cora Citation Network PyComm (Communication Network) Cora Citiation Network Developers: 1,914 Papers: 16,153 Emails: 13,181 References: 29,603 Bug Messages: 69,435 Authors 21,976 Timespan: Feb 2007 – May 2008 Timespan: 1981 - 1998 In this talk we focus on the more difficult dynamic prediction task Dataset Prediction Task PyComm Is developer productive or not in t+1? (closes bug) [Dynamic] Cora 1. Is paper AI or not? 2. Predict paper topic (1 out of 7 ML topics) [Static]
  18. The simplest temporal relational model still outperforms the others 1.00 TVRC Most primitive RPT Intrinsic temporal-relational model 0.98 Int+time Int+graph Int+topics AUC 0.96 Important to moderate the relational information with 0.94 the temporal info! 0.92 TVRC RPT DT w/ additional attrs
  19. Complexity in representation/accuracy Temporal influence of edges and attributes 0.95 TENC TVRC Temporal influence of edges TVRC+Union 0.90 Window Model strictly based on Union Model temporal-granularity  Very efficient 0.85 AUC 0.80 Main finding: Accuracy generally 0.75 increases as more temporal information is included in the representation 0.70 0.65 T=1 T=2 T=3 T=4
  20. Can we increase accuracy by creating ensembles using temporal-relational information?
  21. Key Idea: Introduce variance in both the temporal and relational information 1. Transform the temporal-relational representation using some process (e.g., randomization) 2. Learn models (on different training data) 3. Consider a weighted vote from model predictions
  22. 1. Transforming the Temporal Nodes and Links • sampling nodes and edges from discrete timesteps 2. Transforming the Temporal Feature Space • randomization of attributes (locally) • varying temporal influence or granularity • resample from temporal features Many possibilities! 3. Adding Temporal Noise or Randomness • randomly permute node feature values • randomly permute links across time 4. Transforming Time-varying Labels • randomly permuting previously learned labels at t-1 (or more distant) with the true labels at t 5. Multiple Classification Algorithms • sampling from a set of classifiers (RPT, RBC,...)
  23. TIME .3 … C .2 … C .3 … C .2 … C .1 … C .6 … L .5 … L .2 … C .2 … C .5 … L .3 … L .6 … L .6 … L .5 … L .2 … C .7 … L .4 … L .5 … L ⋯ ⋯ ⋯ 1. Select edge Randomization Edges 2. Randomly permute it over time .3 … C .2 … C .3 … C .2 … C .1 … C .6 … L .5 … L .2 … C .2 … C .5 … L .3 … L .6 … L .6 … L .5 … L .2 … C .7 … L .4 … L .5 … L ⋯ ⋯ ⋯
  24. TIME .3 … C .2 … C .3 … C .2 … C .1 … C .6 … L .5 … L .2 … C .2 … C .5 … L .3 … L .6 … L .6 … L .5 … L .2 … C .7 … L .4 … L .5 … L ⋯ ⋯ ⋯ 1. Select node and attribute Randomization Edges 2. Randomize its values over time Attributes .3 … C .2 … C .3 … C .2 … C .1 … C .6 … L .5 … L .2 … C .2 … C .5 … L .3 … L .6 … L .6 … L .5 … L .2 … C .5 … L .7 … L .4 … L ⋯ ⋯ ⋯
  25. TVRC improvement significant at p < 0.05, 16% reduction in error Temporal-relational ensemble Relational ensemble 1.00 Non-relational ensemble TVRC RPT DT 0.98 Main Findings: Temporal-relational ensembles can AUC 0.96 improve over relational-ensembles, even using the minimum temporal information! 0.94 0.92 T=1 T=2 T=3 T=4 Avg
  26. 1.0 TVRC RPT Slight improvement DT Significant improvement! 0.9 Main Finding: Temporal-relational ensembles AUC 0.8 perform significantly better than the others when there are more dynamics 0.7 0.6 Communication Team Centrality Topics Relatively stationary attributes Frequently changing attributes
  27. Main Contributions  Proposed a general time-evolving relational classification framework for modeling temporal edges, attributes, and nodes  Proposed classes of temporal ensemble methods Main Findings  Temporal-relational models are more accurate than relational and non-relational models  Incorporating temporal information can increase accuracy of classification for both static and dynamic prediction tasks  Temporal-relational ensembles yield better accuracy than relational and non-relational ensembles
  28.  Systematically investigate the ensemble methods and identify when or where to apply each strategy
  29. Thanks! Questions? rrossi@purdue.edu http://www.cs.purdue.edu/homes/rrossi
  30. U. Sharan et al. (ICDM 2008) Differences  Our methods generalize for any dynamics • They model only edges  Propose and use temporal-relational ensembles  Static and dynamic prediction tasks
  31. X1;t X2;t X3;t X4;t ... Xm;t … ... … ⇒ ... Dt = … ... ... ... ⋮ ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ Gt Xt Yt+1 E(Yt+1 | Gt ,Gt-1,..., Xt , Xt-1,...)
  32. TIME Window Window Window G1 G2 G3 Summary Graph 2. Temporal Granularity 3. Temporal Influence 1. For each piece of relational data • attributes, edges, or nodes 2. Select the amount of past information to use 3. Select how the past information is weighted • Weight function and decay parameter
  33. TIME Window Xi1 Xi2 Xi3 Summary Attributes Select the timesteps for learning Temporal weighting /summarization We can also do this for each attribute!
  34. where t = 1,...,tmax Temporal-Relational Classification Framework Relational Information Edges Attributes Nodes Temporal Granularity Timestep Window Union Temporal Influence Uniform Weighting Relational Classification RPT ⋯ RBC Predictions
  35. Temporal-Relational Model Relational Information Edges Attributes Nodes Temporal Granularity Timestep Window Union Temporal Influence Uniform Weighting Relational Classification RPT ⋯ RBC
  36. Previous example was actually TVRC Relational Information Edges Attributes Nodes Temporal Granularity Timestep Window Union Temporal Influence Uniform Weighting Relational Classifier RPT ⋯ RBC
  37. Let Gt = (Vt , Et ,Wt E ) be the relational graph at time step t We define the summary graph GS(t ) = (VS(t ), ES(t ),WS(t) ) as the weighted E sum of graphs up to time t as follows: VS(t ) = V1 ÈV2 ÈÈVt ES(t ) = E1 È E2 ÈÈ Et t WS(t ) = a1W1E + a 2W2E ++ atWt E = å K E (Gi ;t, q ) E i=1  The α weights determine the contribution of each snapshot  We use the kernel function K with parameters θ to determines the influence of each edge in the summary  Weights can be viewed as probabilities that a relationship (or attribute value) is still active at the current time step t, given that it was observed at time (t-k)
  38. Attribute Weighting and Summarization Graph Weighting and Summarization
  39. Exponential t1 t2 t3 t4 Weighting (1- l )t-i l Node Attribute τ1 τ2 τ2 τ1 ⇒ τ1: (1−λ)3λ + λ τ2: λ((1−λ)2 + (1−λ)) ω Edge Attribute τ1 τ2 τ1 ⇒ τ1: (1−λ)3λ + λ τ2: (1−λ)λ Edge ⇒ ω = θ(1 + (1−θ) + (1−θ)3) Weights can be viewed as probabilities that a relationship (or attribute value) is still active at the current time step t, given that it was observed at time (t-k)

Editor's Notes

  1. Now for an example....
Advertisement