SlideShare a Scribd company logo
1 of 31
Inference and Validation of Networks
ECML/PKDD 2009
Ilias N. Flaounas1, Marco Turchi2,
Tijl De Bie2 and Nello Cristianini1,2
Department of Computer Science, Bristol University1
Department of Engineering Mathematics, Bristol University2
September 10, 2009
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 1 / 31
Some Network Inference Examples
Karate Club (Zachary, 1972) Emails (Adamic et al, 2003)
Dating (Berman et al, 2004) Internet Nodes (Alvarez et al, 2007)
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 2 / 31
Some more difficult examples
Gene pathways Protein interactions
(Conti et al, 2007) (Jeong et al, 2001)
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 3 / 31
Our problem: The News Media Network
But, is this network valid? How to validate a network inferred from data?
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 4 / 31
Pattern Validation
In general, patterns found in data can be validated in two different ways:
by assessing their significance
by measuring their predictive power
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 5 / 31
Network Validation
We argue that inferred network need to satisfy some key properties:
It should to be stable.
It should be related to any available ground truth.
The topology should be predictable.
The first two properties can be verified by testing if the inferred network is
similar to a reference network - Hypothesis Testing. We verify the third
property by using Generalised Linear Models.
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 6 / 31
Hypothesis Testing
The key idea is to quantify the probability that a the value of a test
statistic evaluated on observed data could have been found also in
random data (p-value).
The p-value is calculated by:
p = PG∼H0 (tGR
(G) ≤ tGR
(GI )) (1)
p ≈
#{G : tGR
(G) ≤ tGR
(GI )} + 1
K + 1
(2)
A significance threshold α is chosen, and the null hypothesis is rejected if
p < α.
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 7 / 31
Test Statistic
As a test statistic we will use the similarity or distance of the inferred
network to a reference network. In literature several approaches have been
proposed for the measurement of similarity between networks.
In the case of comparing two networks GA and GB that have the same set
of nodes, we can use Jaccard Distance.
Jaccard Distance
JD(EA, EB) =
|EA ∪ EB| − |EA ∩ EB|
|EA ∪ EB|
(3)
A value of zero indicates identical networks, and a value of one indicates
no shared edges.
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 8 / 31
Null Models
In the case of validating network patterns, we need to specify a model of
random network generation.
Erd¨os-R´enyi Model
A random graph G(n, p) is comprised of n nodes
Two nodes have a probability p to be connected
Switching Randomisation
Starts from a given graph and randomises it by switching edges:
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 9 / 31
Prediction of Topology
If one or more ground truth components are available, they are expected
to be able to ‘explain’ the existence of some edges of the network.
The combination of ground truth elements can be made using
Generalized Linear Models (GLMs).
To assess the ability of GLM to predict the network structure we use
a methodology similar to this found in supervised classification.
Area Under Curve (AUC) was used as the performance measure.
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 10 / 31
The News Outlets case
We analysed a dataset of
1,017,348 news articles
over a period of 12 consecutive weeks (from October 1st, 2008)
543 news outlets
32 different locations
7 different media types (e.g., newspapers, blogs, etc)
22 different languages
81,816 stories (clusters of articles)
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 11 / 31
Comparison of three Inference methods
Method A: Connection of two outlets if they share a minimum
number of stories.
Method B: Based on TF-IDF scheme where outlets correspond to
documents and clusters to terms.
Method C: Use a weighting scheme for each cluster, based on how
common it is among outlets.
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 12 / 31
Validation of Outlets Network
First property states that the network should be stable.
The reference network would be a network inferred based on
independent data
We use 12 different datasets, one for each of the 12 weeks
K = 1000 random networks per null model
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 13 / 31
Stability
We measure the distance of the network inferred from data of Week 1, to
the network inferred from data of the previous week (Week 0).
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 14 / 31
Stability
We measure the distance of the network inferred from data of Week 1, to
a random network constructed based on the Erd¨os Model.
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 15 / 31
Stability
We measure the distance of the network inferred from data of Week 1, to
a random network constructed based on the Switching Model.
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 16 / 31
Stability
We measure the distance of the network inferred from data of Week 2, to
the network inferred from data of the previous week (Week 1).
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 17 / 31
Stability
We measure the distance of the network inferred from data of Week 2, to
the networks inferred from the Null models.
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 18 / 31
Stability
Each week has three comparisons:
to the network of the previous week.
to an Erd¨os Random Graph
to a Switching Randomized Graph
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 19 / 31
Stability
Each week has three comparisons:
to the network of the previous week.
to an Erd¨os Random Graph
to a Switching Randomized Graph
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 20 / 31
Stability
We compare the inferred networks of 12 sequential weeks.
Method A Method B Method C
The networks inferred by each method are significantly similar (p < 0.001)
to those inferred the previous week with the same method, under both null
models.
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 21 / 31
Comparison to ground truth components
We compare to the corresponding networks of some available ground
truth networks. For the case of news outlets we compared to the networks
of:
Location
Language
Type
Example
The ‘Location’ network is comprised of cliques:
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 22 / 31
Location
Method A Method B Method C
Each week has three comparisons:
to the real Location-Network
to the corresponding Erd¨os Random Graph
to the corresponding Switching Randomized Graph
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 23 / 31
Language
Method A Method B Method C
Each week has three comparisons:
to the real Language-Network
to the corresponding Erd¨os Random Graph
to the corresponding Switching Randomized Graph
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 24 / 31
Media Type
Method A Method B Method C
Each week has three comparisons:
to the real Media Type-Network
to the corresponding Erd¨os Random Graph
to the corresponding Switching Randomized Graph
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 25 / 31
Selecting Inference Method
The selection of the inference method will be made based on their ability
to create significant results.
Table: The number of weeks that each Method presented significant results with
p < 0.001.
Previous week Location Language Media-Type
ER SR ER SR ER SR ER SR
Method A 11 11 1 11 0 9 11 11
Method B 11 11 11 11 11 11 2 11
Method C 11 11 11 11 11 11 11 11
Only Method C passes all tests successfully
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 26 / 31
Prediction power
How well can we predict the existence of an edge given some ground truth
network?
We measure AUC under a 100-folds cross validation scheme.
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 27 / 31
Prediction power
Using all ground truth networks together and each one seperately, we get:
Method A Method B Method C
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 28 / 31
Prediction power
Using pairs of ground truth networks we get:
Method A Method B Method C
Overall, Method C gives the best results of 77.1% using all three ground
truth components.
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 29 / 31
Conclusions
We showed how:
to select the inference algorithm
to validate the network in terms of stability, relation to ground truth
components and topology prediction
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 30 / 31
Thank you!
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 31 / 31

More Related Content

Similar to Inference and validation of networks

Further Analysis Of A Framework To Analyze Network Performance Based On Infor...
Further Analysis Of A Framework To Analyze Network Performance Based On Infor...Further Analysis Of A Framework To Analyze Network Performance Based On Infor...
Further Analysis Of A Framework To Analyze Network Performance Based On Infor...CSCJournals
 
Redundancy elimination of_big_sensor_data_using_bayesian_networks (1)
Redundancy elimination of_big_sensor_data_using_bayesian_networks (1)Redundancy elimination of_big_sensor_data_using_bayesian_networks (1)
Redundancy elimination of_big_sensor_data_using_bayesian_networks (1)Crislanio Macedo
 
Report-de Bruijn Graph
Report-de Bruijn GraphReport-de Bruijn Graph
Report-de Bruijn GraphAshwani kumar
 
Clustered Network MIMO and Fractional Frequency Reuse for the Downlink in LTE...
Clustered Network MIMO and Fractional Frequency Reuse for the Downlink in LTE...Clustered Network MIMO and Fractional Frequency Reuse for the Downlink in LTE...
Clustered Network MIMO and Fractional Frequency Reuse for the Downlink in LTE...Ajay Thampi
 
Brema tarigan 09030581721015
Brema tarigan 09030581721015Brema tarigan 09030581721015
Brema tarigan 09030581721015ferdiandersen08
 
nmat4448 (1) (1)
nmat4448 (1) (1)nmat4448 (1) (1)
nmat4448 (1) (1)Petra Scudo
 
modularity Modularity Modularity Modularity Modularity.ppt
modularity Modularity Modularity Modularity Modularity.pptmodularity Modularity Modularity Modularity Modularity.ppt
modularity Modularity Modularity Modularity Modularity.pptSanjarMadraximov
 
On the identifiability of phylogenetic networks under a pseudolikelihood model
On the identifiability of phylogenetic networks under a pseudolikelihood modelOn the identifiability of phylogenetic networks under a pseudolikelihood model
On the identifiability of phylogenetic networks under a pseudolikelihood modelArrigo Coen
 
VBCA: A Virtual Forces Clustering Algorithm for Autonomous Aerial Drone Systems
VBCA: A Virtual Forces Clustering Algorithm for Autonomous Aerial Drone SystemsVBCA: A Virtual Forces Clustering Algorithm for Autonomous Aerial Drone Systems
VBCA: A Virtual Forces Clustering Algorithm for Autonomous Aerial Drone SystemsM. Ilhan Akbas
 
2013 03-20 zhang
2013 03-20 zhang2013 03-20 zhang
2013 03-20 zhangSCEE Team
 
Satellite imageclassification
Satellite imageclassificationSatellite imageclassification
Satellite imageclassificationMariam Musavi
 
Network Models.pptx
Network  Models.pptxNetwork  Models.pptx
Network Models.pptxAhmadAbba6
 
UNet-VGG16 with transfer learning for MRI-based brain tumor segmentation
UNet-VGG16 with transfer learning for MRI-based brain tumor segmentationUNet-VGG16 with transfer learning for MRI-based brain tumor segmentation
UNet-VGG16 with transfer learning for MRI-based brain tumor segmentationTELKOMNIKA JOURNAL
 
INFLUENCE OF PRIORS OVER MULTITYPED OBJECT IN EVOLUTIONARY CLUSTERING
INFLUENCE OF PRIORS OVER MULTITYPED OBJECT IN EVOLUTIONARY CLUSTERINGINFLUENCE OF PRIORS OVER MULTITYPED OBJECT IN EVOLUTIONARY CLUSTERING
INFLUENCE OF PRIORS OVER MULTITYPED OBJECT IN EVOLUTIONARY CLUSTERINGcscpconf
 
Influence of priors over multityped object in evolutionary clustering
Influence of priors over multityped object in evolutionary clusteringInfluence of priors over multityped object in evolutionary clustering
Influence of priors over multityped object in evolutionary clusteringcsandit
 
Radio propagation models in wireless
Radio propagation models in wirelessRadio propagation models in wireless
Radio propagation models in wirelessIJCNCJournal
 

Similar to Inference and validation of networks (20)

Calculation of Torsion Capacity of the Reinforced Concrete Beams Using Artifi...
Calculation of Torsion Capacity of the Reinforced Concrete Beams Using Artifi...Calculation of Torsion Capacity of the Reinforced Concrete Beams Using Artifi...
Calculation of Torsion Capacity of the Reinforced Concrete Beams Using Artifi...
 
Further Analysis Of A Framework To Analyze Network Performance Based On Infor...
Further Analysis Of A Framework To Analyze Network Performance Based On Infor...Further Analysis Of A Framework To Analyze Network Performance Based On Infor...
Further Analysis Of A Framework To Analyze Network Performance Based On Infor...
 
Node similarity
Node similarityNode similarity
Node similarity
 
Redundancy elimination of_big_sensor_data_using_bayesian_networks (1)
Redundancy elimination of_big_sensor_data_using_bayesian_networks (1)Redundancy elimination of_big_sensor_data_using_bayesian_networks (1)
Redundancy elimination of_big_sensor_data_using_bayesian_networks (1)
 
Report-de Bruijn Graph
Report-de Bruijn GraphReport-de Bruijn Graph
Report-de Bruijn Graph
 
Clustered Network MIMO and Fractional Frequency Reuse for the Downlink in LTE...
Clustered Network MIMO and Fractional Frequency Reuse for the Downlink in LTE...Clustered Network MIMO and Fractional Frequency Reuse for the Downlink in LTE...
Clustered Network MIMO and Fractional Frequency Reuse for the Downlink in LTE...
 
Brema tarigan 09030581721015
Brema tarigan 09030581721015Brema tarigan 09030581721015
Brema tarigan 09030581721015
 
nmat4448 (1) (1)
nmat4448 (1) (1)nmat4448 (1) (1)
nmat4448 (1) (1)
 
modularity Modularity Modularity Modularity Modularity.ppt
modularity Modularity Modularity Modularity Modularity.pptmodularity Modularity Modularity Modularity Modularity.ppt
modularity Modularity Modularity Modularity Modularity.ppt
 
On the identifiability of phylogenetic networks under a pseudolikelihood model
On the identifiability of phylogenetic networks under a pseudolikelihood modelOn the identifiability of phylogenetic networks under a pseudolikelihood model
On the identifiability of phylogenetic networks under a pseudolikelihood model
 
VBCA: A Virtual Forces Clustering Algorithm for Autonomous Aerial Drone Systems
VBCA: A Virtual Forces Clustering Algorithm for Autonomous Aerial Drone SystemsVBCA: A Virtual Forces Clustering Algorithm for Autonomous Aerial Drone Systems
VBCA: A Virtual Forces Clustering Algorithm for Autonomous Aerial Drone Systems
 
D05422835
D05422835D05422835
D05422835
 
2013 03-20 zhang
2013 03-20 zhang2013 03-20 zhang
2013 03-20 zhang
 
Satellite imageclassification
Satellite imageclassificationSatellite imageclassification
Satellite imageclassification
 
Network Models.pptx
Network  Models.pptxNetwork  Models.pptx
Network Models.pptx
 
UNet-VGG16 with transfer learning for MRI-based brain tumor segmentation
UNet-VGG16 with transfer learning for MRI-based brain tumor segmentationUNet-VGG16 with transfer learning for MRI-based brain tumor segmentation
UNet-VGG16 with transfer learning for MRI-based brain tumor segmentation
 
INFLUENCE OF PRIORS OVER MULTITYPED OBJECT IN EVOLUTIONARY CLUSTERING
INFLUENCE OF PRIORS OVER MULTITYPED OBJECT IN EVOLUTIONARY CLUSTERINGINFLUENCE OF PRIORS OVER MULTITYPED OBJECT IN EVOLUTIONARY CLUSTERING
INFLUENCE OF PRIORS OVER MULTITYPED OBJECT IN EVOLUTIONARY CLUSTERING
 
Influence of priors over multityped object in evolutionary clustering
Influence of priors over multityped object in evolutionary clusteringInfluence of priors over multityped object in evolutionary clustering
Influence of priors over multityped object in evolutionary clustering
 
Apt thomas kelly
Apt thomas kellyApt thomas kelly
Apt thomas kelly
 
Radio propagation models in wireless
Radio propagation models in wirelessRadio propagation models in wireless
Radio propagation models in wireless
 

More from Ilias Flaounas

The story of the Product Growth team at Atlassian
The story of the Product Growth team at AtlassianThe story of the Product Growth team at Atlassian
The story of the Product Growth team at AtlassianIlias Flaounas
 
Readability and Linguistic Subjectivity of News
Readability and Linguistic Subjectivity of NewsReadability and Linguistic Subjectivity of News
Readability and Linguistic Subjectivity of NewsIlias Flaounas
 
Detecting macro-patterns in the EU media sphere
Detecting macro-patterns in the EU media sphereDetecting macro-patterns in the EU media sphere
Detecting macro-patterns in the EU media sphereIlias Flaounas
 
Detecting Patterns in News Media Content
Detecting Patterns in News Media ContentDetecting Patterns in News Media Content
Detecting Patterns in News Media ContentIlias Flaounas
 
Celebrity Watch: Browsing News Content by Exploiting Social Intelligence
Celebrity Watch: Browsing News Content by Exploiting Social IntelligenceCelebrity Watch: Browsing News Content by Exploiting Social Intelligence
Celebrity Watch: Browsing News Content by Exploiting Social IntelligenceIlias Flaounas
 
ECML/PKDD 2009: Found in translation
ECML/PKDD 2009: Found in translationECML/PKDD 2009: Found in translation
ECML/PKDD 2009: Found in translationIlias Flaounas
 
Data Science at Atlassian: 
The transition towards a data-driven organisation
Data Science at Atlassian: 
The transition towards a data-driven organisationData Science at Atlassian: 
The transition towards a data-driven organisation
Data Science at Atlassian: 
The transition towards a data-driven organisationIlias Flaounas
 

More from Ilias Flaounas (8)

The story of the Product Growth team at Atlassian
The story of the Product Growth team at AtlassianThe story of the Product Growth team at Atlassian
The story of the Product Growth team at Atlassian
 
On Storing Big Data
On Storing Big DataOn Storing Big Data
On Storing Big Data
 
Readability and Linguistic Subjectivity of News
Readability and Linguistic Subjectivity of NewsReadability and Linguistic Subjectivity of News
Readability and Linguistic Subjectivity of News
 
Detecting macro-patterns in the EU media sphere
Detecting macro-patterns in the EU media sphereDetecting macro-patterns in the EU media sphere
Detecting macro-patterns in the EU media sphere
 
Detecting Patterns in News Media Content
Detecting Patterns in News Media ContentDetecting Patterns in News Media Content
Detecting Patterns in News Media Content
 
Celebrity Watch: Browsing News Content by Exploiting Social Intelligence
Celebrity Watch: Browsing News Content by Exploiting Social IntelligenceCelebrity Watch: Browsing News Content by Exploiting Social Intelligence
Celebrity Watch: Browsing News Content by Exploiting Social Intelligence
 
ECML/PKDD 2009: Found in translation
ECML/PKDD 2009: Found in translationECML/PKDD 2009: Found in translation
ECML/PKDD 2009: Found in translation
 
Data Science at Atlassian: 
The transition towards a data-driven organisation
Data Science at Atlassian: 
The transition towards a data-driven organisationData Science at Atlassian: 
The transition towards a data-driven organisation
Data Science at Atlassian: 
The transition towards a data-driven organisation
 

Recently uploaded

Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 

Recently uploaded (20)

Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 

Inference and validation of networks

  • 1. Inference and Validation of Networks ECML/PKDD 2009 Ilias N. Flaounas1, Marco Turchi2, Tijl De Bie2 and Nello Cristianini1,2 Department of Computer Science, Bristol University1 Department of Engineering Mathematics, Bristol University2 September 10, 2009 I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 1 / 31
  • 2. Some Network Inference Examples Karate Club (Zachary, 1972) Emails (Adamic et al, 2003) Dating (Berman et al, 2004) Internet Nodes (Alvarez et al, 2007) I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 2 / 31
  • 3. Some more difficult examples Gene pathways Protein interactions (Conti et al, 2007) (Jeong et al, 2001) I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 3 / 31
  • 4. Our problem: The News Media Network But, is this network valid? How to validate a network inferred from data? I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 4 / 31
  • 5. Pattern Validation In general, patterns found in data can be validated in two different ways: by assessing their significance by measuring their predictive power I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 5 / 31
  • 6. Network Validation We argue that inferred network need to satisfy some key properties: It should to be stable. It should be related to any available ground truth. The topology should be predictable. The first two properties can be verified by testing if the inferred network is similar to a reference network - Hypothesis Testing. We verify the third property by using Generalised Linear Models. I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 6 / 31
  • 7. Hypothesis Testing The key idea is to quantify the probability that a the value of a test statistic evaluated on observed data could have been found also in random data (p-value). The p-value is calculated by: p = PG∼H0 (tGR (G) ≤ tGR (GI )) (1) p ≈ #{G : tGR (G) ≤ tGR (GI )} + 1 K + 1 (2) A significance threshold α is chosen, and the null hypothesis is rejected if p < α. I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 7 / 31
  • 8. Test Statistic As a test statistic we will use the similarity or distance of the inferred network to a reference network. In literature several approaches have been proposed for the measurement of similarity between networks. In the case of comparing two networks GA and GB that have the same set of nodes, we can use Jaccard Distance. Jaccard Distance JD(EA, EB) = |EA ∪ EB| − |EA ∩ EB| |EA ∪ EB| (3) A value of zero indicates identical networks, and a value of one indicates no shared edges. I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 8 / 31
  • 9. Null Models In the case of validating network patterns, we need to specify a model of random network generation. Erd¨os-R´enyi Model A random graph G(n, p) is comprised of n nodes Two nodes have a probability p to be connected Switching Randomisation Starts from a given graph and randomises it by switching edges: I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 9 / 31
  • 10. Prediction of Topology If one or more ground truth components are available, they are expected to be able to ‘explain’ the existence of some edges of the network. The combination of ground truth elements can be made using Generalized Linear Models (GLMs). To assess the ability of GLM to predict the network structure we use a methodology similar to this found in supervised classification. Area Under Curve (AUC) was used as the performance measure. I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 10 / 31
  • 11. The News Outlets case We analysed a dataset of 1,017,348 news articles over a period of 12 consecutive weeks (from October 1st, 2008) 543 news outlets 32 different locations 7 different media types (e.g., newspapers, blogs, etc) 22 different languages 81,816 stories (clusters of articles) I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 11 / 31
  • 12. Comparison of three Inference methods Method A: Connection of two outlets if they share a minimum number of stories. Method B: Based on TF-IDF scheme where outlets correspond to documents and clusters to terms. Method C: Use a weighting scheme for each cluster, based on how common it is among outlets. I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 12 / 31
  • 13. Validation of Outlets Network First property states that the network should be stable. The reference network would be a network inferred based on independent data We use 12 different datasets, one for each of the 12 weeks K = 1000 random networks per null model I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 13 / 31
  • 14. Stability We measure the distance of the network inferred from data of Week 1, to the network inferred from data of the previous week (Week 0). I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 14 / 31
  • 15. Stability We measure the distance of the network inferred from data of Week 1, to a random network constructed based on the Erd¨os Model. I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 15 / 31
  • 16. Stability We measure the distance of the network inferred from data of Week 1, to a random network constructed based on the Switching Model. I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 16 / 31
  • 17. Stability We measure the distance of the network inferred from data of Week 2, to the network inferred from data of the previous week (Week 1). I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 17 / 31
  • 18. Stability We measure the distance of the network inferred from data of Week 2, to the networks inferred from the Null models. I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 18 / 31
  • 19. Stability Each week has three comparisons: to the network of the previous week. to an Erd¨os Random Graph to a Switching Randomized Graph I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 19 / 31
  • 20. Stability Each week has three comparisons: to the network of the previous week. to an Erd¨os Random Graph to a Switching Randomized Graph I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 20 / 31
  • 21. Stability We compare the inferred networks of 12 sequential weeks. Method A Method B Method C The networks inferred by each method are significantly similar (p < 0.001) to those inferred the previous week with the same method, under both null models. I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 21 / 31
  • 22. Comparison to ground truth components We compare to the corresponding networks of some available ground truth networks. For the case of news outlets we compared to the networks of: Location Language Type Example The ‘Location’ network is comprised of cliques: I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 22 / 31
  • 23. Location Method A Method B Method C Each week has three comparisons: to the real Location-Network to the corresponding Erd¨os Random Graph to the corresponding Switching Randomized Graph I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 23 / 31
  • 24. Language Method A Method B Method C Each week has three comparisons: to the real Language-Network to the corresponding Erd¨os Random Graph to the corresponding Switching Randomized Graph I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 24 / 31
  • 25. Media Type Method A Method B Method C Each week has three comparisons: to the real Media Type-Network to the corresponding Erd¨os Random Graph to the corresponding Switching Randomized Graph I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 25 / 31
  • 26. Selecting Inference Method The selection of the inference method will be made based on their ability to create significant results. Table: The number of weeks that each Method presented significant results with p < 0.001. Previous week Location Language Media-Type ER SR ER SR ER SR ER SR Method A 11 11 1 11 0 9 11 11 Method B 11 11 11 11 11 11 2 11 Method C 11 11 11 11 11 11 11 11 Only Method C passes all tests successfully I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 26 / 31
  • 27. Prediction power How well can we predict the existence of an edge given some ground truth network? We measure AUC under a 100-folds cross validation scheme. I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 27 / 31
  • 28. Prediction power Using all ground truth networks together and each one seperately, we get: Method A Method B Method C I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 28 / 31
  • 29. Prediction power Using pairs of ground truth networks we get: Method A Method B Method C Overall, Method C gives the best results of 77.1% using all three ground truth components. I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 29 / 31
  • 30. Conclusions We showed how: to select the inference algorithm to validate the network in terms of stability, relation to ground truth components and topology prediction I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 30 / 31
  • 31. Thank you! I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 31 / 31