Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Inference and Validation of Networks
ECML/PKDD 2009
Ilias N. Flaounas1, Marco Turchi2,
Tijl De Bie2 and Nello Cristianini1...
Some Network Inference Examples
Karate Club (Zachary, 1972) Emails (Adamic et al, 2003)
Dating (Berman et al, 2004) Intern...
Some more difficult examples
Gene pathways Protein interactions
(Conti et al, 2007) (Jeong et al, 2001)
I. Flaounas et al. (...
Our problem: The News Media Network
But, is this network valid? How to validate a network inferred from data?
I. Flaounas ...
Pattern Validation
In general, patterns found in data can be validated in two different ways:
by assessing their significanc...
Network Validation
We argue that inferred network need to satisfy some key properties:
It should to be stable.
It should b...
Hypothesis Testing
The key idea is to quantify the probability that a the value of a test
statistic evaluated on observed ...
Test Statistic
As a test statistic we will use the similarity or distance of the inferred
network to a reference network. ...
Null Models
In the case of validating network patterns, we need to specify a model of
random network generation.
Erd¨os-R´...
Prediction of Topology
If one or more ground truth components are available, they are expected
to be able to ‘explain’ the...
The News Outlets case
We analysed a dataset of
1,017,348 news articles
over a period of 12 consecutive weeks (from October...
Comparison of three Inference methods
Method A: Connection of two outlets if they share a minimum
number of stories.
Metho...
Validation of Outlets Network
First property states that the network should be stable.
The reference network would be a ne...
Stability
We measure the distance of the network inferred from data of Week 1, to
the network inferred from data of the pr...
Stability
We measure the distance of the network inferred from data of Week 1, to
a random network constructed based on th...
Stability
We measure the distance of the network inferred from data of Week 1, to
a random network constructed based on th...
Stability
We measure the distance of the network inferred from data of Week 2, to
the network inferred from data of the pr...
Stability
We measure the distance of the network inferred from data of Week 2, to
the networks inferred from the Null mode...
Stability
Each week has three comparisons:
to the network of the previous week.
to an Erd¨os Random Graph
to a Switching R...
Stability
Each week has three comparisons:
to the network of the previous week.
to an Erd¨os Random Graph
to a Switching R...
Stability
We compare the inferred networks of 12 sequential weeks.
Method A Method B Method C
The networks inferred by eac...
Comparison to ground truth components
We compare to the corresponding networks of some available ground
truth networks. Fo...
Location
Method A Method B Method C
Each week has three comparisons:
to the real Location-Network
to the corresponding Erd...
Language
Method A Method B Method C
Each week has three comparisons:
to the real Language-Network
to the corresponding Erd...
Media Type
Method A Method B Method C
Each week has three comparisons:
to the real Media Type-Network
to the corresponding...
Selecting Inference Method
The selection of the inference method will be made based on their ability
to create significant ...
Prediction power
How well can we predict the existence of an edge given some ground truth
network?
We measure AUC under a ...
Prediction power
Using all ground truth networks together and each one seperately, we get:
Method A Method B Method C
I. F...
Prediction power
Using pairs of ground truth networks we get:
Method A Method B Method C
Overall, Method C gives the best ...
Conclusions
We showed how:
to select the inference algorithm
to validate the network in terms of stability, relation to gr...
Thank you!
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 31 / 31
Upcoming SlideShare
Loading in …5
×

Inference and validation of networks

192 views

Published on

We develop a statistical methodology to validate the result of network inference algorithms, based on principles of statistical testing and machine learning. The comparison of results with reference networks, by means of similarity measures and null models, allows us to measure the significance of results, as well as their predictive power. The use of Generalised Linear Models allows us to explain the results in terms of available ground truth which we expect to be partially relevant. We present these methods for the case of inferring a network of News Outlets based on their preference of stories to cover. We compare three simple network inference methods and show how our technique can be used to choose between them. All the methods presented here can be directly applied to other domains where network inference is used.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Inference and validation of networks

  1. 1. Inference and Validation of Networks ECML/PKDD 2009 Ilias N. Flaounas1, Marco Turchi2, Tijl De Bie2 and Nello Cristianini1,2 Department of Computer Science, Bristol University1 Department of Engineering Mathematics, Bristol University2 September 10, 2009 I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 1 / 31
  2. 2. Some Network Inference Examples Karate Club (Zachary, 1972) Emails (Adamic et al, 2003) Dating (Berman et al, 2004) Internet Nodes (Alvarez et al, 2007) I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 2 / 31
  3. 3. Some more difficult examples Gene pathways Protein interactions (Conti et al, 2007) (Jeong et al, 2001) I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 3 / 31
  4. 4. Our problem: The News Media Network But, is this network valid? How to validate a network inferred from data? I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 4 / 31
  5. 5. Pattern Validation In general, patterns found in data can be validated in two different ways: by assessing their significance by measuring their predictive power I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 5 / 31
  6. 6. Network Validation We argue that inferred network need to satisfy some key properties: It should to be stable. It should be related to any available ground truth. The topology should be predictable. The first two properties can be verified by testing if the inferred network is similar to a reference network - Hypothesis Testing. We verify the third property by using Generalised Linear Models. I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 6 / 31
  7. 7. Hypothesis Testing The key idea is to quantify the probability that a the value of a test statistic evaluated on observed data could have been found also in random data (p-value). The p-value is calculated by: p = PG∼H0 (tGR (G) ≤ tGR (GI )) (1) p ≈ #{G : tGR (G) ≤ tGR (GI )} + 1 K + 1 (2) A significance threshold α is chosen, and the null hypothesis is rejected if p < α. I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 7 / 31
  8. 8. Test Statistic As a test statistic we will use the similarity or distance of the inferred network to a reference network. In literature several approaches have been proposed for the measurement of similarity between networks. In the case of comparing two networks GA and GB that have the same set of nodes, we can use Jaccard Distance. Jaccard Distance JD(EA, EB) = |EA ∪ EB| − |EA ∩ EB| |EA ∪ EB| (3) A value of zero indicates identical networks, and a value of one indicates no shared edges. I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 8 / 31
  9. 9. Null Models In the case of validating network patterns, we need to specify a model of random network generation. Erd¨os-R´enyi Model A random graph G(n, p) is comprised of n nodes Two nodes have a probability p to be connected Switching Randomisation Starts from a given graph and randomises it by switching edges: I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 9 / 31
  10. 10. Prediction of Topology If one or more ground truth components are available, they are expected to be able to ‘explain’ the existence of some edges of the network. The combination of ground truth elements can be made using Generalized Linear Models (GLMs). To assess the ability of GLM to predict the network structure we use a methodology similar to this found in supervised classification. Area Under Curve (AUC) was used as the performance measure. I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 10 / 31
  11. 11. The News Outlets case We analysed a dataset of 1,017,348 news articles over a period of 12 consecutive weeks (from October 1st, 2008) 543 news outlets 32 different locations 7 different media types (e.g., newspapers, blogs, etc) 22 different languages 81,816 stories (clusters of articles) I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 11 / 31
  12. 12. Comparison of three Inference methods Method A: Connection of two outlets if they share a minimum number of stories. Method B: Based on TF-IDF scheme where outlets correspond to documents and clusters to terms. Method C: Use a weighting scheme for each cluster, based on how common it is among outlets. I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 12 / 31
  13. 13. Validation of Outlets Network First property states that the network should be stable. The reference network would be a network inferred based on independent data We use 12 different datasets, one for each of the 12 weeks K = 1000 random networks per null model I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 13 / 31
  14. 14. Stability We measure the distance of the network inferred from data of Week 1, to the network inferred from data of the previous week (Week 0). I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 14 / 31
  15. 15. Stability We measure the distance of the network inferred from data of Week 1, to a random network constructed based on the Erd¨os Model. I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 15 / 31
  16. 16. Stability We measure the distance of the network inferred from data of Week 1, to a random network constructed based on the Switching Model. I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 16 / 31
  17. 17. Stability We measure the distance of the network inferred from data of Week 2, to the network inferred from data of the previous week (Week 1). I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 17 / 31
  18. 18. Stability We measure the distance of the network inferred from data of Week 2, to the networks inferred from the Null models. I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 18 / 31
  19. 19. Stability Each week has three comparisons: to the network of the previous week. to an Erd¨os Random Graph to a Switching Randomized Graph I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 19 / 31
  20. 20. Stability Each week has three comparisons: to the network of the previous week. to an Erd¨os Random Graph to a Switching Randomized Graph I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 20 / 31
  21. 21. Stability We compare the inferred networks of 12 sequential weeks. Method A Method B Method C The networks inferred by each method are significantly similar (p < 0.001) to those inferred the previous week with the same method, under both null models. I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 21 / 31
  22. 22. Comparison to ground truth components We compare to the corresponding networks of some available ground truth networks. For the case of news outlets we compared to the networks of: Location Language Type Example The ‘Location’ network is comprised of cliques: I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 22 / 31
  23. 23. Location Method A Method B Method C Each week has three comparisons: to the real Location-Network to the corresponding Erd¨os Random Graph to the corresponding Switching Randomized Graph I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 23 / 31
  24. 24. Language Method A Method B Method C Each week has three comparisons: to the real Language-Network to the corresponding Erd¨os Random Graph to the corresponding Switching Randomized Graph I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 24 / 31
  25. 25. Media Type Method A Method B Method C Each week has three comparisons: to the real Media Type-Network to the corresponding Erd¨os Random Graph to the corresponding Switching Randomized Graph I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 25 / 31
  26. 26. Selecting Inference Method The selection of the inference method will be made based on their ability to create significant results. Table: The number of weeks that each Method presented significant results with p < 0.001. Previous week Location Language Media-Type ER SR ER SR ER SR ER SR Method A 11 11 1 11 0 9 11 11 Method B 11 11 11 11 11 11 2 11 Method C 11 11 11 11 11 11 11 11 Only Method C passes all tests successfully I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 26 / 31
  27. 27. Prediction power How well can we predict the existence of an edge given some ground truth network? We measure AUC under a 100-folds cross validation scheme. I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 27 / 31
  28. 28. Prediction power Using all ground truth networks together and each one seperately, we get: Method A Method B Method C I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 28 / 31
  29. 29. Prediction power Using pairs of ground truth networks we get: Method A Method B Method C Overall, Method C gives the best results of 77.1% using all three ground truth components. I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 29 / 31
  30. 30. Conclusions We showed how: to select the inference algorithm to validate the network in terms of stability, relation to ground truth components and topology prediction I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 30 / 31
  31. 31. Thank you! I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 31 / 31

×