Successfully reported this slideshow.

Modeling and mining complex networks with feature-rich nodes.

1

Share

Loading in …3
×
1 of 69
1 of 69

Modeling and mining complex networks with feature-rich nodes.

1

Share

Download to read offline

Slideshow for my PhD dissertation. The core of my work was to analyze the problems of link prediction, label prediction and graph modeling within a single framework of graphs with binary attributes on their nodes.

Slideshow for my PhD dissertation. The core of my work was to analyze the problems of link prediction, label prediction and graph modeling within a single framework of graphs with binary attributes on their nodes.

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Modeling and mining complex networks with feature-rich nodes.

  1. 1. Modeling and Mining Complex Networks with Feature-Rich Nodes PhD Candidate: Corrado Monti Advisor: Paolo Boldi Laboratory for Web Algorithmics 1
  2. 2. Complex networks with feature-rich nodes A common scenario: • Networks describe (directed or undirected) links between objects • Objects have properties and attributes 2
  3. 3. A word on wording • Networks = graphs • Some scientific communities use the former, some the latter • Complex networks • Small-diameter, scale-free, high clustering coefficient... • Closer to many real-world networks • We focus on features of nodes, also known as attributes 3
  4. 4. Complex networks with feature-rich nodes: examples • A citation network, with each author interests A C {Neural Network, Bayesian models}{Neural Network} {Statistics, Neural Network} {Neuropsychlogy, Statistics} {Statistics} {Statistics, Bayesian models} 4
  5. 5. Complex networks with feature-rich nodes: examples • Gene interaction network, features are associated diseases G5893 G6523 G5923 G9567 G8788 G9871{Breast Carcinoma, Colon adenocarcinoma} {Mesothelioma} {Skin Cutaneous Melanoma} {Breast carcinoma} {Breast carcinoma, Colon adenocarcinoma} {Skin Cutaneous Melanoma, Mesothelioma} 5
  6. 6. We need a model Base idea: links can be explained by the features of the nodes being connected. • E.g., “Actors” → “Directors” 6
  7. 7. Complex networks with feature-rich nodes • There is a plethora of models: • Stochastic Block Model (2001), Mixed-membership SBM (2008), Infinite relational model (2006), Multiplicative Attribute Graphs (2010...) • Some of these models... • ...consider only one feature per node • ...or, can only work with homophily • That is: feature h and feature k interact only if h = k • ...or, they require a great number of parameters and are unable to work with large networks. 7
  8. 8. We need a model • Able to go beyond homophily • Able to model overlapping features • The less hidden parameters, the better 8
  9. 9. We need a model • Able to go beyond homophily • Able to model overlapping features • The less hidden parameters, the better • Therefore, we built on: Miller-Griffiths-Jordan, Nonparametric latent feature models for link prediction. NIPS 2009. 8
  10. 10. We need a model • Able to go beyond homophily • Able to model overlapping features • The less hidden parameters, the better • Therefore, we built on: Miller-Griffiths-Jordan, Nonparametric latent feature models for link prediction. NIPS 2009. Challenges: 1. Explore how it can help in mining useful information from feature-rich networks 2. Adapt it to very large graphs • Millions of nodes, and beyond 8
  11. 11. Our framework • We have a set of nodes N of size n • Links L ⊆ N × N • We have a set of features F containing m features • A binary node-feature matrix Z of size n × m • Beware: L and Z can be represented as (binary) matrices or graphs. 9
  12. 12. Our framework P (i, j) ∈ L = φ h k Zi,hWh,kZj,k Where: • L is the set of links in the network • Z is the node-features matrix • φ is a monotonically increasing function R → [0, 1] • φ is a parameter of the model (e.g., a sigmoid function) • W is a m × m matrix: the latent feature-feature matrix 10
  13. 13. Our framework P (i, j) ∈ L = φ h k Zi,hWh,kZj,k The entries of W indicate how the co-presence of features on the two nodes will influence the presence/absence of a link: • Wh,k > 0 will foster the creation of links from nodes with feature h to nodes with feature k 10
  14. 14. Our framework P (i, j) ∈ L = φ h k Zi,hWh,kZj,k The entries of W indicate how the co-presence of features on the two nodes will influence the presence/absence of a link: • Wh,k > 0 will foster the creation of links from nodes with feature h to nodes with feature k • Wh,k < 0 will discourage the creation of links from nodes with feature h to nodes with feature k 10
  15. 15. Generative model 10
  16. 16. Generative model: node-feature association We have developed a model for realistic node-feature generation. 11
  17. 17. Generative model: node-feature association We have developed a model for realistic node-feature generation. A real dataset • A real node-feature association • Rows are nodes, columns are features • Matrix is considered left-ordered 1 5000 10000 15000 21933 1 10000 20000 27770 1 5000 10000 15000 21933 1 10000 20000 27770 Data 11
  18. 18. Generative model: node-feature association We have developed a model for realistic node-feature generation. A real dataset • A real node-feature association • Rows are nodes, columns are features • Matrix is considered left-ordered 1 5000 10000 15000 21933 1 10000 20000 27770 1 5000 10000 15000 21933 1 10000 20000 27770 Data A synthetic dataset • Estimate its parameters • Generate a new synthetic dataset • 1 5000 10000 15000 22179 1 10000 20000 27770 1 5000 10000 15000 22179 1 10000 20000 27770 Simulation 11
  19. 19. Generative model • Model based on Miller-Griffiths-Jordan • Intuition: features are generated in a rich-get-richer fashion • Specifically, a statistical process known as Indian Buffet 12
  20. 20. Generative model • Model based on Miller-Griffiths-Jordan • Intuition: features are generated in a rich-get-richer fashion • Specifically, a statistical process known as Indian Buffet • Novel aspects:1 1 P. Boldi, I. Crimaldi, C. Monti. “A network model characterized by a latent attribute structure with competition”. Information Sciences (Elsevier), 2016. 12
  21. 21. Generative model • Model based on Miller-Griffiths-Jordan • Intuition: features are generated in a rich-get-richer fashion • Specifically, a statistical process known as Indian Buffet • Novel aspects:1 1. Each node has a fitness value representing how much it can spread its features • We developed algorithms to rebuild these values from data 1 P. Boldi, I. Crimaldi, C. Monti. “A network model characterized by a latent attribute structure with competition”. Information Sciences (Elsevier), 2016. 12
  22. 22. Generative model • Model based on Miller-Griffiths-Jordan • Intuition: features are generated in a rich-get-richer fashion • Specifically, a statistical process known as Indian Buffet • Novel aspects:1 1. Each node has a fitness value representing how much it can spread its features • We developed algorithms to rebuild these values from data 2. A few, easy-to-interpret, estimable parameters 1 P. Boldi, I. Crimaldi, C. Monti. “A network model characterized by a latent attribute structure with competition”. Information Sciences (Elsevier), 2016. 12
  23. 23. Generative model • Model based on Miller-Griffiths-Jordan • Intuition: features are generated in a rich-get-richer fashion • Specifically, a statistical process known as Indian Buffet • Novel aspects:1 1. Each node has a fitness value representing how much it can spread its features • We developed algorithms to rebuild these values from data 2. A few, easy-to-interpret, estimable parameters 3. We investigate properties of the generated graphs 1 P. Boldi, I. Crimaldi, C. Monti. “A network model characterized by a latent attribute structure with competition”. Information Sciences (Elsevier), 2016. 12
  24. 24. Generative model: generating the network We can generate graphs with realistic degree and distance distributions: 1 10 100 1000 10-4 10-3 10-2 0.1 1 0 2 4 6 8 10 0. 0.2 0.4 0.6 1. 0.787221 13
  25. 25. Generative model: generating the network We can generate graphs with realistic degree and distance distributions: 1 10 100 1000 10-4 10-3 10-2 0.1 1 0 2 4 6 8 10 0. 0.2 0.4 0.6 1. 0.787221 Why is this useful? 13
  26. 26. Generative model: generating the network We can generate graphs with realistic degree and distance distributions: 1 10 100 1000 10-4 10-3 10-2 0.1 1 0 2 4 6 8 10 0. 0.2 0.4 0.6 1. 0.787221 Why is this useful? • We can generate synthetic feature-rich graphs, from scratch 13
  27. 27. Generative model: generating the network We can generate graphs with realistic degree and distance distributions: 1 10 100 1000 10-4 10-3 10-2 0.1 1 0 2 4 6 8 10 0. 0.2 0.4 0.6 1. 0.787221 Why is this useful? • We can generate synthetic feature-rich graphs, from scratch • They have realistic global properties 13
  28. 28. Generative model: generating the network We can generate graphs with realistic degree and distance distributions: 1 10 100 1000 10-4 10-3 10-2 0.1 1 0 2 4 6 8 10 0. 0.2 0.4 0.6 1. 0.787221 Why is this useful? • We can generate synthetic feature-rich graphs, from scratch • They have realistic global properties • We can use them for tests • If an algorithm works on them, we expect it to be useful whenever our model is valid 13
  29. 29. Predicting links to estimate feature-feature matrix 13
  30. 30. Inferring feature-feature interaction • In reality, W is latent and unobservable • We have developed algorithms to find W 14
  31. 31. Inferring feature-feature interaction: Na¨ıve Bayes A Na¨ıve Bayes approach. Assumptions: • Fix φ(x) = min(1, exp(x)) • Na¨ıve independence assumptions! • If Nk is the set of nodes with feature k then: Wh,k = log |{(Nh × Nk) ∩ L}| |Nh| · |Nk| • Independence assumptions unteneable in practice • We will see later the data 15
  32. 32. Inferring feature-feature interaction: perceptron with kernel • Node i → Node-feature vector zi , with 1 for its features • We feed as input to a perceptron the outer product zi ⊗ zj • We ask it to predict 1 ⇐⇒ (i, j) ∈ L and −1 ⇐⇒ (i, j) /∈ L • Then, the internal state W corresponds to the matrix W zi,1 zi,2 . . . zi,m                           zi (features of node i) zj,1 zj,2 . . . zj,m     zj (features of node j) zi,1zj,1 zi,1zj,2 . . . zi,1zj,m zi,2zj,1 zi,2zj,2 . . . zi,2zj,m ... ... ... ... zi,mzj,1 zi,mzj,2 . . . zi,mzj,m                                               zi ⊗ zj W11 W12 . . . W1m W21 W22 . . . W2m ... ... ... ... Wm1 Wm2 . . . Wmm                                     Latent feature-feature matrix W · 16
  33. 33. Inferring feature-feature interaction: synthetic data • We can test if it works on graphs simulated with our model • Each Wi,j is randomly generated with a Bernoulli distribution • φ is a sigmoid Na¨ıve 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 precision recall Perceptron 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 precision recall 17
  34. 34. Inferring feature-feature interaction: experiments • Test on real graph: a citation network • 18 939 155 papers, 189 465 540 citations • 47 269 fields of research each paper can be tagged with • on average, 3.88 fields per paper Na¨ıve 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 precision recall Perceptron 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 precision recall 18
  35. 35. Inferring feature-feature interaction: experiments Within our framework, we can answer to questions like: “Are the links in a citation graph explained by the fields of research of authors?” We can use the AUPR as a measure of this explainability. 19
  36. 36. Inferring feature-feature interaction: experiments • It depends on the feature we choose. • For the same citation net, we can use affiliations of authors: Na¨ıve 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 precision recall Learning 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 precision recall Affiliations Fields of research Both Perceptron-like .5925 ± .0050 .9175 ± .0002 .9210 ± .0012 Naive Bayes .5517 ± .0003 .6318 ± .0001 .6345 ± .0002 20
  37. 37. Inferring feature-feature interaction: experiments It’s fundamentally a perceptron =⇒ we can re-adapt... 21
  38. 38. Inferring feature-feature interaction: experiments It’s fundamentally a perceptron =⇒ we can re-adapt... • ...other perceptron-like techniques • E.g., passive aggressive algorithms • Efficient in practice: 4 µs/link 21
  39. 39. Inferring feature-feature interaction: experiments It’s fundamentally a perceptron =⇒ we can re-adapt... • ...other perceptron-like techniques • E.g., passive aggressive algorithms • Efficient in practice: 4 µs/link • ...error bounds from perceptron theoretical analysis. • E.g., we can prove2 that the bound on the number of errors is higher when max i,j |Fi | · |Fj | is higher (where |Fi | is the number of features of i). 2 Corrado Monti, Paolo Boldi. “Estimating latent feature-feature interactions in large feature-rich graphs”. Under review, 2017 (arXiv preprint arXiv:1612.00984). 21
  40. 40. Inferring feature-feature interaction: experiments on en.wiki • We also applied this model to the Wikipedia Link Network • Typical and open-source example of a semantic network 22
  41. 41. Inferring feature-feature interaction: experiments on en.wiki • We also applied this model to the Wikipedia Link Network • Typical and open-source example of a semantic network • Problem: Wikipedia categories are a mess 22
  42. 42. Inferring feature-feature interaction: experiments on en.wiki • We also applied this model to the Wikipedia Link Network • Typical and open-source example of a semantic network • Problem: Wikipedia categories are a mess • We developed a novel approach to cleanse this categorization3 3 Paolo Boldi, Corrado Monti. “Cleansing wikipedia categories using centrality”. Proceedings of the 25th International Conference Companion on World Wide Web (WWW companion), 2016. 22
  43. 43. Inferring feature-feature interaction: experiments on en.wiki • We also applied this model to the Wikipedia Link Network • Typical and open-source example of a semantic network • Problem: Wikipedia categories are a mess • We developed a novel approach to cleanse this categorization3 • Based on Harmonic Centrality, indipendent from our model • Tunable: we can choose the k “better” categories • Tested against expert-curated bibliographic classification • AUC ROC 94% 3 Paolo Boldi, Corrado Monti. “Cleansing wikipedia categories using centrality”. Proceedings of the 25th International Conference Companion on World Wide Web (WWW companion), 2016. 22
  44. 44. Inferring feature-feature interaction: experiments on en.wiki F1 Score versus number of considered categories 100 101 102 103 104 105 106 Number of features 0.0 0.2 0.4 0.6 0.8 1.0 FMeasure 400 Bytes 40 kB 4 MB 400 MB 40 GB Feature matrix size 23
  45. 45. Inferring feature-feature interaction: experiments on en.wiki • We settled for 20 000 categories 24
  46. 46. Inferring feature-feature interaction: experiments on en.wiki • We settled for 20 000 categories • It can process the 1.1 · 108 links of Wikipedia in 9 minutes • Resulting W • Explains 84% of links • On random pairs, has a precision of 90% 24
  47. 47. Inferring feature-feature interaction: experiments on en.wiki • W describes how Wikipedia categories interact between them • We can view this matrix as a network itself • A “latent network” between categories, that explains the links we are seeing between pages 25
  48. 48. Inferring feature-feature interaction: experiments on en.wiki Science fiction by nationalityScience fiction by nationality Science fiction book seriesScience fiction book series Science fiction by franchiseScience fiction by franchise RobotsRobots Science fiction novelsScience fiction novels SpaceflightSpaceflight Speculative fiction novelsSpeculative fiction novels Planets of the Solar SystemPlanets of the Solar System MaterialsMaterials Evolutionary biologyEvolutionary biology PredationPredation Technology systemsTechnology systems German cultureGerman culture Science fiction filmsScience fiction films Theory of relativityTheory of relativity Celestial mechanicsCelestial mechanics Production and manufacturingProduction and manufacturing SaurischiansSaurischians Prehistoric reptilesPrehistoric reptiles 26
  49. 49. Inferring feature-feature interaction: experiments on en.wiki Music-related listsMusic-related lists Catholic pilgrimage sitesCatholic pilgrimage sites Place namesPlace names London boroughsLondon boroughs ArtistsArtists Multinational companies in the U.S.Multinational companies in the U.S. KeyboardistsKeyboardists Buildings and structures by American architectsBuildings and structures by American architects Power metal albumsPower metal albums Human–machine interactionHuman–machine interaction AnimationAnimation British songsBritish songs Universities by countryUniversities by country Progressive rock albums by British artistsProgressive rock albums by British artists British awardsBritish awards English writersEnglish writers Music by nationalityMusic by nationality Short filmsShort films Electronic albums by American artistsElectronic albums by American artists 27
  50. 50. Finding unexpected relations Finally, we found out that the links unexpected by this model are unexpected for a reason!4 4 Paolo Boldi, Corrado Monti. “Llamafur: Learning latent category matrix to find unexpected relations in wikipedia”. Proceedings of the 8th ACM Conference on Web Science, 2016. 28
  51. 51. Finding unexpected relations Finally, we found out that the links unexpected by this model are unexpected for a reason!4 The most unexpected link in the page of Kim Jong-il... 4 Paolo Boldi, Corrado Monti. “Llamafur: Learning latent category matrix to find unexpected relations in wikipedia”. Proceedings of the 8th ACM Conference on Web Science, 2016. 28
  52. 52. Finding unexpected relations Finally, we found out that the links unexpected by this model are unexpected for a reason!4 The most unexpected link in the page of Kim Jong-il... ...is Elvis Presley. “Kim Jong-il was obsessed with Elvis Presley. His mansion was crammed with his idol’s records and his collection of 20,000 Hollywood movies.” The model described by W did not expect a link from a “North Korean communist” to a “Pioneer of music genres”. 4 Paolo Boldi, Corrado Monti. “Llamafur: Learning latent category matrix to find unexpected relations in wikipedia”. Proceedings of the 8th ACM Conference on Web Science, 2016. 28
  53. 53. Finding unexpected relation: results 0.0 0.2 0.4 0.6 0.8 1.0 Recall 0.15 0.20 0.25 0.30 0.35 0.40 0.45Precision M2 M4 AA Naive-LlamaFur LlamaFur LlamaFur+AA 29
  54. 54. Predicting features 29
  55. 55. Predicting features • So, predicting links from features and estimating W are intrinsecally connected problems • What about the dual problem? • That is: we completely know links and some of the features: can we guess the others? 30
  56. 56. Predicting features • This problem is known as label prediction • But, here as well: • We want to go beyond homophily • We want to handle large scale graphs 31
  57. 57. Predicting features: a neural network Supervised learning, every node is an instance: • Input: raw description of its in-neighborhood and its out-neighborhood ? r+ (i)k := Nk ∩ N+(i) Nk N+(i) , r− (i)k := Nk ∩ N−(i) Nk N−(i) • Output: likelihood of the features for the considered node 32
  58. 58. Predicting features: a neural network Input layer Hidden layer Max pooling Output layer W max z( i, 1) max z( i, 2) max z( i, 3) W T f (N − (i), 1) f (N − (i), 2) f (N − (i), 3) f (N + (i), 1) f (N + (i), 2) f (N + (i), 3) 33
  59. 59. Predicting features: a neural network Formally, if x is our output vector, the cost function is: (x, Fi ) = λ W + k∈Fi log φ(xk) − k∈Fi log 1 − φ(xk) where: • λ is a regularization parameter • φ(x) = (e−x + 1)−1, a standard sigmoid • F is the set of features of our node i, and Fi its complement We then can express x as a function of W and optimize W with gradient descent (e.g. AdaGrad). 34
  60. 60. Predicting features: synthetic data Evaluation (ROC) 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate 0.0 0.2 0.4 0.6 0.8 1.0Truepositiverate 35
  61. 61. Predicting features: synthetic data Baseline (ROC) 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate 0.0 0.2 0.4 0.6 0.8 1.0Truepositiverate 36
  62. 62. Conclusions Take-home message: feature-rich complex network models can be used to design, test and analyze graph data mining algorithms. 37
  63. 63. Conclusions Take-home message: feature-rich complex network models can be used to design, test and analyze graph data mining algorithms. Specifically: • Link prediction • Label prediction • Anomaly detection and serendipity recommendations 37
  64. 64. Conclusions Take-home message: feature-rich complex network models can be used to design, test and analyze graph data mining algorithms. Specifically: • Link prediction • Label prediction • Anomaly detection and serendipity recommendations On the way, we also found out that: • Feature-rich models lead to realistic complex networks 37
  65. 65. Conclusions Take-home message: feature-rich complex network models can be used to design, test and analyze graph data mining algorithms. Specifically: • Link prediction • Label prediction • Anomaly detection and serendipity recommendations On the way, we also found out that: • Feature-rich models lead to realistic complex networks • We can measure the explainability of different sets of features 37
  66. 66. Conclusions Take-home message: feature-rich complex network models can be used to design, test and analyze graph data mining algorithms. Specifically: • Link prediction • Label prediction • Anomaly detection and serendipity recommendations On the way, we also found out that: • Feature-rich models lead to realistic complex networks • We can measure the explainability of different sets of features • A (clean) typing system can be a powerful tool in semantic networks 37
  67. 67. Applications and future works We have seen applications to: • Semantic network: link prediction and serendipity finding • Scientific networks: citation analysis 38
  68. 68. Applications and future works We have seen applications to: • Semantic network: link prediction and serendipity finding • Scientific networks: citation analysis However the model is general and scalable. Possible applications: • Biological network: gene interactome, gene ontologies • Content networks: web pages and topics • Social networks: link prediction and recommendations 38
  69. 69. Thanks! email: monti@di.unimi.it 38

×