SlideShare a Scribd company logo
Link Analysis of Life Science Linked Data
1
Wei Hu1, Honglei Qiu1, and Michel Dumontier2
1State Key Laboratory for Novel Software Technology, Nanjing University, China
2Center for Biomedical Informatics Research, Stanford University
@micheldumontier::ISWC 2015
Linked Data offers links between
datasets, but they are often
incomplete and may contain
errors.
@micheldumontier::ISWC 20152
Network Analysis
• Network analysis has long been
used to study link structures
– The structure of the Web
– Network medicine: cellular
networks and implications
@micheldumontier::ISWC 20153
Power law is scale free
A graph demonstrates the small world
phenomenon, if its clustering coefficient is
significantly higher than that of a random
graph on the same node set, and if the graph
has a shorter average distance.
BTC2010
The clustering coefficient quantifies how close
its neighbors are to be a clique. The average
distance is the average shortest path length
between all nodes in the graph.
Dataset link analysis
(using RDF data model)
Entity link analysis
(using cross-references)
Term link analysis
(using ontology matching)
@micheldumontier::ISWC 20154
@micheldumontier::ISWC 2015
Linked Data for the Life Sciences
5
Bio2RDF is an open source project to unify the
representation and interlinking of biological data using RDF.
chemicals/drugs/formulations,
genomes/genes/proteins, domains
Interactions, complexes & pathways
animal models and phenotypes
Disease, genetic markers, treatments
Terminologies & publications
• Release 3 (June 2014)
• 35 datasets
• 11B RDF triples
• 1B entities
• 2K classes
• 4K properties
Dataset Links
@micheldumontier::ISWC 20156
Network Properties
1. Well linked
2. Hubs and authorities
3. small-world phenomenon
Average distance = 2.77 vs 6
Clustering coefficient = 0.22 vs
0.13
4. robust on systematic removal
of nodes
Entity Link Analysis
How well do entities link to each other?
• 76% entity links involve a special kind of RDF triples
– e.g. <kegg:D03455, kegg:x-drugbank, drugbank:DB00002>
– x-relations have under-specified semantics
• May be truly identical, may refer to another related entity …
• Degree distribution
– Some do not follow power law
• Exponent is too large (close to 5)
7
BTC2010
@micheldumontier::ISWC 2015
symmetry of entity links varies
between different pairs of datasets
• Over 99% of links are reciprocated in DrugBank-PharmGKB and
OMIM-HGNC
– Suggests link sharing and synchronization
• Only 58% of links in DrugBank-KEGG and 51% of OMIM-Orphanet
links are reciprocal
– Suggests incomplete mapping
• 28% of OMIM-Orphanet links are malposed
– Suggests variation in model (omim:Phenotype to orphanet:Disorder)
8 @micheldumontier::ISWC 2015
Transitivity Analysis:
Find mismatches and discover new links
@micheldumontier::ISWC 20159
Evaluation of Entity Matching
How accurate are current entity matching approaches?
• Built a benchmark from the reciprocal links between similarly-typed
entities
• Evaluated several entity matching approaches
– Label similarity: Levenstein, Jaro-Winkler, N-gram, Jaccard
– Machine learning: Linear regression, logistic regression with 5 properties
• Many-to-one links are difficult to be discovered
10 @micheldumontier::ISWC 2015
Term Link Analysis
How similar are the topics in the data network?
• Use ontology matching to generate term link graph
– Falcon-AO (linguistic matchers + structural matcher + synonyms)
• Created 83K class mappings, 1.5K object property mappings, and 858 data
property mappings
– Similarity threshold = 0.9
– Top-5 popular labels for classes and properties
• Significant overlap in topics, does not follow power law as in broader SW
11 @micheldumontier::ISWC 2015
Correlation of Link Graphs
To what degree are each of the three link graphs are correlated?
• Spearman’s rank correlation coefficient:
– Entity link graph  dataset pairs: entity links / entities
– Term link graph  dataset pairs: term mappings / terms
– Dataset link graph  dataset pairs: shortest path length
• All positively correlated
– Closer datasets in distance have more linked entities and terms
– Number of linked entities contributes little to overlap of topics
12 @micheldumontier::ISWC 2015
Summary of Findings
• Dataset, entity and term link graphs do not necessarily share the same
characteristics with the Hypertext / Semantic Web
– Degree distribution of entity links does not follow power law
– Data hubs
• A significant number of entities have been linked using x-relations, but
their intended semantics differs
– Classes are identical or equivalent  entity links represent logical equivalence
• Symmetric and transitive entity links do exist, but their utility is weakened
due to their small number
– Meanings of entity links may shift during transitive closure
• Only matching the labels of entities may fail, while combining different
properties and using simple learning algorithms achieve good accuracy
13 @micheldumontier::ISWC 2015
dumontierlab.com
michel.dumontier@stanford.edu
Website: http://dumontierlab.com
Presentations: http://slideshare.com/micheldumontier
14 @micheldumontier::ISWC 2015

More Related Content

What's hot

Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
Michel Dumontier
 
Towards a gold standard and regarding quality in public domain chemistry data...
Towards a gold standard and regarding quality in public domain chemistry data...Towards a gold standard and regarding quality in public domain chemistry data...
Towards a gold standard and regarding quality in public domain chemistry data...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
CEDAR work bench for metadata management
CEDAR work bench for metadata managementCEDAR work bench for metadata management
CEDAR work bench for metadata management
Pistoia Alliance
 
Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...
FAIRDOM
 
Generating Biomedical Hypotheses Using Semantic Web Technologies
Generating Biomedical Hypotheses Using Semantic Web TechnologiesGenerating Biomedical Hypotheses Using Semantic Web Technologies
Generating Biomedical Hypotheses Using Semantic Web Technologies
Michel Dumontier
 
Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244Yasel Cruz
 
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)Gregor Hagedorn
 
dkNET Poster Experimental Biology 2019
dkNET Poster Experimental Biology 2019dkNET Poster Experimental Biology 2019
dkNET Poster Experimental Biology 2019
dkNET
 
OpenTox - an open community and framework supporting predictive toxicology an...
OpenTox - an open community and framework supporting predictive toxicology an...OpenTox - an open community and framework supporting predictive toxicology an...
OpenTox - an open community and framework supporting predictive toxicology an...
Barry Hardy
 
BioNLPSADI
BioNLPSADIBioNLPSADI
Hosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry dataHosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry data
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
FedCentric_Presentation
FedCentric_PresentationFedCentric_Presentation
FedCentric_PresentationYatpang Cheung
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -Tutorial
Dmitry Grapov
 
Nonadaptive mastermind algorithms for string and vector databases, with case ...
Nonadaptive mastermind algorithms for string and vector databases, with case ...Nonadaptive mastermind algorithms for string and vector databases, with case ...
Nonadaptive mastermind algorithms for string and vector databases, with case ...Ecway Technologies
 
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ... Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
Syed Ahmad Chan Bukhari, PhD
 
Open PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesOpen PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future Challenges
SciBite Limited
 
CINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIRCINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIR
CINECAProject
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture Data
Paul Groth
 

What's hot (19)

Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
 
Towards a gold standard and regarding quality in public domain chemistry data...
Towards a gold standard and regarding quality in public domain chemistry data...Towards a gold standard and regarding quality in public domain chemistry data...
Towards a gold standard and regarding quality in public domain chemistry data...
 
CEDAR work bench for metadata management
CEDAR work bench for metadata managementCEDAR work bench for metadata management
CEDAR work bench for metadata management
 
Canadian health census to lod
Canadian health census to lodCanadian health census to lod
Canadian health census to lod
 
Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...
 
Generating Biomedical Hypotheses Using Semantic Web Technologies
Generating Biomedical Hypotheses Using Semantic Web TechnologiesGenerating Biomedical Hypotheses Using Semantic Web Technologies
Generating Biomedical Hypotheses Using Semantic Web Technologies
 
Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244
 
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
 
dkNET Poster Experimental Biology 2019
dkNET Poster Experimental Biology 2019dkNET Poster Experimental Biology 2019
dkNET Poster Experimental Biology 2019
 
OpenTox - an open community and framework supporting predictive toxicology an...
OpenTox - an open community and framework supporting predictive toxicology an...OpenTox - an open community and framework supporting predictive toxicology an...
OpenTox - an open community and framework supporting predictive toxicology an...
 
BioNLPSADI
BioNLPSADIBioNLPSADI
BioNLPSADI
 
Hosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry dataHosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry data
 
FedCentric_Presentation
FedCentric_PresentationFedCentric_Presentation
FedCentric_Presentation
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -Tutorial
 
Nonadaptive mastermind algorithms for string and vector databases, with case ...
Nonadaptive mastermind algorithms for string and vector databases, with case ...Nonadaptive mastermind algorithms for string and vector databases, with case ...
Nonadaptive mastermind algorithms for string and vector databases, with case ...
 
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ... Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 
Open PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesOpen PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future Challenges
 
CINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIRCINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIR
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture Data
 

Similar to Link Analysis of Life Sciences Linked Data

Hamalt genetics based peer to-peer network architecture to encourage the coo...
Hamalt  genetics based peer to-peer network architecture to encourage the coo...Hamalt  genetics based peer to-peer network architecture to encourage the coo...
Hamalt genetics based peer to-peer network architecture to encourage the coo...
csandit
 
HAMALT : GENETICS BASED PEER-TOPEER NETWORK ARCHITECTURE TO ENCOURAGE THE COO...
HAMALT : GENETICS BASED PEER-TOPEER NETWORK ARCHITECTURE TO ENCOURAGE THE COO...HAMALT : GENETICS BASED PEER-TOPEER NETWORK ARCHITECTURE TO ENCOURAGE THE COO...
HAMALT : GENETICS BASED PEER-TOPEER NETWORK ARCHITECTURE TO ENCOURAGE THE COO...
cscpconf
 
HAMALT : GENETICS BASED PEER-TOPEER NETWORK ARCHITECTURE TO ENCOURAGE THE COO...
HAMALT : GENETICS BASED PEER-TOPEER NETWORK ARCHITECTURE TO ENCOURAGE THE COO...HAMALT : GENETICS BASED PEER-TOPEER NETWORK ARCHITECTURE TO ENCOURAGE THE COO...
HAMALT : GENETICS BASED PEER-TOPEER NETWORK ARCHITECTURE TO ENCOURAGE THE COO...
csandit
 
Keynote at AImWD
Keynote at AImWDKeynote at AImWD
Keynote at AImWD
Stefan Schlobach
 
G5234552
G5234552G5234552
G5234552
IOSR-JEN
 
Distributed Link Prediction in Large Scale Graphs using Apache Spark
Distributed Link Prediction in Large Scale Graphs using Apache SparkDistributed Link Prediction in Large Scale Graphs using Apache Spark
Distributed Link Prediction in Large Scale Graphs using Apache Spark
Anastasios Theodosiou
 
An approach for transforming of relational databases to owl ontology
An approach for transforming of relational databases to owl ontologyAn approach for transforming of relational databases to owl ontology
An approach for transforming of relational databases to owl ontology
IJwest
 
IRJET- A Survey on Link Prediction Techniques
IRJET-  	  A Survey on Link Prediction TechniquesIRJET-  	  A Survey on Link Prediction Techniques
IRJET- A Survey on Link Prediction Techniques
IRJET Journal
 
A Survey On Link Prediction In Social Networks
A Survey On Link Prediction In Social NetworksA Survey On Link Prediction In Social Networks
A Survey On Link Prediction In Social Networks
April Smith
 
survey of different data dependence analysis techniques
 survey of different data dependence analysis techniques survey of different data dependence analysis techniques
survey of different data dependence analysis techniques
INFOGAIN PUBLICATION
 
Poster Abstracts
Poster AbstractsPoster Abstracts
Poster Abstractsbutest
 
Iaetsd similarity search in information networks using
Iaetsd similarity search in information networks usingIaetsd similarity search in information networks using
Iaetsd similarity search in information networks using
Iaetsd Iaetsd
 
Content-based link prediction
Content-based link predictionContent-based link prediction
Content-based link prediction
Carlos Castillo (ChaTo)
 
992 sms10 social_media_services
992 sms10 social_media_services992 sms10 social_media_services
992 sms10 social_media_services
siyaza
 
Scale-Free Networks to Search in Unstructured Peer-To-Peer Networks
Scale-Free Networks to Search in Unstructured Peer-To-Peer NetworksScale-Free Networks to Search in Unstructured Peer-To-Peer Networks
Scale-Free Networks to Search in Unstructured Peer-To-Peer Networks
IOSR Journals
 
M033059064
M033059064M033059064
M033059064
ijceronline
 
IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...
IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...
IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...
IRJET Journal
 
01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures
dnac
 
01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)
Duke Network Analysis Center
 
IRJET- Link Prediction in Social Networks
IRJET- Link Prediction in Social NetworksIRJET- Link Prediction in Social Networks
IRJET- Link Prediction in Social Networks
IRJET Journal
 

Similar to Link Analysis of Life Sciences Linked Data (20)

Hamalt genetics based peer to-peer network architecture to encourage the coo...
Hamalt  genetics based peer to-peer network architecture to encourage the coo...Hamalt  genetics based peer to-peer network architecture to encourage the coo...
Hamalt genetics based peer to-peer network architecture to encourage the coo...
 
HAMALT : GENETICS BASED PEER-TOPEER NETWORK ARCHITECTURE TO ENCOURAGE THE COO...
HAMALT : GENETICS BASED PEER-TOPEER NETWORK ARCHITECTURE TO ENCOURAGE THE COO...HAMALT : GENETICS BASED PEER-TOPEER NETWORK ARCHITECTURE TO ENCOURAGE THE COO...
HAMALT : GENETICS BASED PEER-TOPEER NETWORK ARCHITECTURE TO ENCOURAGE THE COO...
 
HAMALT : GENETICS BASED PEER-TOPEER NETWORK ARCHITECTURE TO ENCOURAGE THE COO...
HAMALT : GENETICS BASED PEER-TOPEER NETWORK ARCHITECTURE TO ENCOURAGE THE COO...HAMALT : GENETICS BASED PEER-TOPEER NETWORK ARCHITECTURE TO ENCOURAGE THE COO...
HAMALT : GENETICS BASED PEER-TOPEER NETWORK ARCHITECTURE TO ENCOURAGE THE COO...
 
Keynote at AImWD
Keynote at AImWDKeynote at AImWD
Keynote at AImWD
 
G5234552
G5234552G5234552
G5234552
 
Distributed Link Prediction in Large Scale Graphs using Apache Spark
Distributed Link Prediction in Large Scale Graphs using Apache SparkDistributed Link Prediction in Large Scale Graphs using Apache Spark
Distributed Link Prediction in Large Scale Graphs using Apache Spark
 
An approach for transforming of relational databases to owl ontology
An approach for transforming of relational databases to owl ontologyAn approach for transforming of relational databases to owl ontology
An approach for transforming of relational databases to owl ontology
 
IRJET- A Survey on Link Prediction Techniques
IRJET-  	  A Survey on Link Prediction TechniquesIRJET-  	  A Survey on Link Prediction Techniques
IRJET- A Survey on Link Prediction Techniques
 
A Survey On Link Prediction In Social Networks
A Survey On Link Prediction In Social NetworksA Survey On Link Prediction In Social Networks
A Survey On Link Prediction In Social Networks
 
survey of different data dependence analysis techniques
 survey of different data dependence analysis techniques survey of different data dependence analysis techniques
survey of different data dependence analysis techniques
 
Poster Abstracts
Poster AbstractsPoster Abstracts
Poster Abstracts
 
Iaetsd similarity search in information networks using
Iaetsd similarity search in information networks usingIaetsd similarity search in information networks using
Iaetsd similarity search in information networks using
 
Content-based link prediction
Content-based link predictionContent-based link prediction
Content-based link prediction
 
992 sms10 social_media_services
992 sms10 social_media_services992 sms10 social_media_services
992 sms10 social_media_services
 
Scale-Free Networks to Search in Unstructured Peer-To-Peer Networks
Scale-Free Networks to Search in Unstructured Peer-To-Peer NetworksScale-Free Networks to Search in Unstructured Peer-To-Peer Networks
Scale-Free Networks to Search in Unstructured Peer-To-Peer Networks
 
M033059064
M033059064M033059064
M033059064
 
IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...
IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...
IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...
 
01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures
 
01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)
 
IRJET- Link Prediction in Social Networks
IRJET- Link Prediction in Social NetworksIRJET- Link Prediction in Social Networks
IRJET- Link Prediction in Social Networks
 

More from Michel Dumontier

FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
Michel Dumontier
 
A metadata standard for Knowledge Graphs
A metadata standard for Knowledge GraphsA metadata standard for Knowledge Graphs
A metadata standard for Knowledge Graphs
Michel Dumontier
 
Data-Driven Discovery Science with FAIR Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge GraphsData-Driven Discovery Science with FAIR Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge Graphs
Michel Dumontier
 
Evaluating FAIRness
Evaluating FAIRnessEvaluating FAIRness
Evaluating FAIRness
Michel Dumontier
 
The Role of the FAIR Guiding Principles for an effective Learning Health System
The Role of the FAIR Guiding Principles for an effective Learning Health SystemThe Role of the FAIR Guiding Principles for an effective Learning Health System
The Role of the FAIR Guiding Principles for an effective Learning Health System
Michel Dumontier
 
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
Michel Dumontier
 
The role of the FAIR Guiding Principles in a Learning Health System
The role of the FAIR Guiding Principles in a Learning Health SystemThe role of the FAIR Guiding Principles in a Learning Health System
The role of the FAIR Guiding Principles in a Learning Health System
Michel Dumontier
 
Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...
Michel Dumontier
 
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
Michel Dumontier
 
Are we FAIR yet? And will it be worth it?
Are we FAIR yet? And will it be worth it?Are we FAIR yet? And will it be worth it?
Are we FAIR yet? And will it be worth it?
Michel Dumontier
 
The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...
Michel Dumontier
 
Keynote at the 2018 Maastricht University Dinner
Keynote at the 2018 Maastricht University DinnerKeynote at the 2018 Maastricht University Dinner
Keynote at the 2018 Maastricht University Dinner
Michel Dumontier
 
The future of science and business - a UM Star Lecture
The future of science and business - a UM Star LectureThe future of science and business - a UM Star Lecture
The future of science and business - a UM Star Lecture
Michel Dumontier
 
Are we FAIR yet?
Are we FAIR yet?Are we FAIR yet?
Are we FAIR yet?
Michel Dumontier
 
Developing and assessing FAIR digital resources
Developing and assessing FAIR digital resourcesDeveloping and assessing FAIR digital resources
Developing and assessing FAIR digital resources
Michel Dumontier
 
Advancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIRAdvancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIR
Michel Dumontier
 
A Framework to develop the FAIR Metrics
A Framework to develop the FAIR MetricsA Framework to develop the FAIR Metrics
A Framework to develop the FAIR Metrics
Michel Dumontier
 
FAIR principles and metrics for evaluation
FAIR principles and metrics for evaluationFAIR principles and metrics for evaluation
FAIR principles and metrics for evaluation
Michel Dumontier
 
Towards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRnessTowards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRness
Michel Dumontier
 
Ontologies
OntologiesOntologies
Ontologies
Michel Dumontier
 

More from Michel Dumontier (20)

FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
 
A metadata standard for Knowledge Graphs
A metadata standard for Knowledge GraphsA metadata standard for Knowledge Graphs
A metadata standard for Knowledge Graphs
 
Data-Driven Discovery Science with FAIR Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge GraphsData-Driven Discovery Science with FAIR Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge Graphs
 
Evaluating FAIRness
Evaluating FAIRnessEvaluating FAIRness
Evaluating FAIRness
 
The Role of the FAIR Guiding Principles for an effective Learning Health System
The Role of the FAIR Guiding Principles for an effective Learning Health SystemThe Role of the FAIR Guiding Principles for an effective Learning Health System
The Role of the FAIR Guiding Principles for an effective Learning Health System
 
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
 
The role of the FAIR Guiding Principles in a Learning Health System
The role of the FAIR Guiding Principles in a Learning Health SystemThe role of the FAIR Guiding Principles in a Learning Health System
The role of the FAIR Guiding Principles in a Learning Health System
 
Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...
 
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
 
Are we FAIR yet? And will it be worth it?
Are we FAIR yet? And will it be worth it?Are we FAIR yet? And will it be worth it?
Are we FAIR yet? And will it be worth it?
 
The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...
 
Keynote at the 2018 Maastricht University Dinner
Keynote at the 2018 Maastricht University DinnerKeynote at the 2018 Maastricht University Dinner
Keynote at the 2018 Maastricht University Dinner
 
The future of science and business - a UM Star Lecture
The future of science and business - a UM Star LectureThe future of science and business - a UM Star Lecture
The future of science and business - a UM Star Lecture
 
Are we FAIR yet?
Are we FAIR yet?Are we FAIR yet?
Are we FAIR yet?
 
Developing and assessing FAIR digital resources
Developing and assessing FAIR digital resourcesDeveloping and assessing FAIR digital resources
Developing and assessing FAIR digital resources
 
Advancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIRAdvancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIR
 
A Framework to develop the FAIR Metrics
A Framework to develop the FAIR MetricsA Framework to develop the FAIR Metrics
A Framework to develop the FAIR Metrics
 
FAIR principles and metrics for evaluation
FAIR principles and metrics for evaluationFAIR principles and metrics for evaluation
FAIR principles and metrics for evaluation
 
Towards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRnessTowards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRness
 
Ontologies
OntologiesOntologies
Ontologies
 

Recently uploaded

Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
aishnasrivastava
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
ossaicprecious19
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
subedisuryaofficial
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
anitaento25
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
insect morphology and physiology of insect
insect morphology and physiology of insectinsect morphology and physiology of insect
insect morphology and physiology of insect
anitaento25
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
IvanMallco1
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
AADYARAJPANDEY1
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
NathanBaughman3
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 

Recently uploaded (20)

Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
insect morphology and physiology of insect
insect morphology and physiology of insectinsect morphology and physiology of insect
insect morphology and physiology of insect
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 

Link Analysis of Life Sciences Linked Data

  • 1. Link Analysis of Life Science Linked Data 1 Wei Hu1, Honglei Qiu1, and Michel Dumontier2 1State Key Laboratory for Novel Software Technology, Nanjing University, China 2Center for Biomedical Informatics Research, Stanford University @micheldumontier::ISWC 2015
  • 2. Linked Data offers links between datasets, but they are often incomplete and may contain errors. @micheldumontier::ISWC 20152
  • 3. Network Analysis • Network analysis has long been used to study link structures – The structure of the Web – Network medicine: cellular networks and implications @micheldumontier::ISWC 20153 Power law is scale free A graph demonstrates the small world phenomenon, if its clustering coefficient is significantly higher than that of a random graph on the same node set, and if the graph has a shorter average distance. BTC2010 The clustering coefficient quantifies how close its neighbors are to be a clique. The average distance is the average shortest path length between all nodes in the graph.
  • 4. Dataset link analysis (using RDF data model) Entity link analysis (using cross-references) Term link analysis (using ontology matching) @micheldumontier::ISWC 20154
  • 5. @micheldumontier::ISWC 2015 Linked Data for the Life Sciences 5 Bio2RDF is an open source project to unify the representation and interlinking of biological data using RDF. chemicals/drugs/formulations, genomes/genes/proteins, domains Interactions, complexes & pathways animal models and phenotypes Disease, genetic markers, treatments Terminologies & publications • Release 3 (June 2014) • 35 datasets • 11B RDF triples • 1B entities • 2K classes • 4K properties
  • 6. Dataset Links @micheldumontier::ISWC 20156 Network Properties 1. Well linked 2. Hubs and authorities 3. small-world phenomenon Average distance = 2.77 vs 6 Clustering coefficient = 0.22 vs 0.13 4. robust on systematic removal of nodes
  • 7. Entity Link Analysis How well do entities link to each other? • 76% entity links involve a special kind of RDF triples – e.g. <kegg:D03455, kegg:x-drugbank, drugbank:DB00002> – x-relations have under-specified semantics • May be truly identical, may refer to another related entity … • Degree distribution – Some do not follow power law • Exponent is too large (close to 5) 7 BTC2010 @micheldumontier::ISWC 2015
  • 8. symmetry of entity links varies between different pairs of datasets • Over 99% of links are reciprocated in DrugBank-PharmGKB and OMIM-HGNC – Suggests link sharing and synchronization • Only 58% of links in DrugBank-KEGG and 51% of OMIM-Orphanet links are reciprocal – Suggests incomplete mapping • 28% of OMIM-Orphanet links are malposed – Suggests variation in model (omim:Phenotype to orphanet:Disorder) 8 @micheldumontier::ISWC 2015
  • 9. Transitivity Analysis: Find mismatches and discover new links @micheldumontier::ISWC 20159
  • 10. Evaluation of Entity Matching How accurate are current entity matching approaches? • Built a benchmark from the reciprocal links between similarly-typed entities • Evaluated several entity matching approaches – Label similarity: Levenstein, Jaro-Winkler, N-gram, Jaccard – Machine learning: Linear regression, logistic regression with 5 properties • Many-to-one links are difficult to be discovered 10 @micheldumontier::ISWC 2015
  • 11. Term Link Analysis How similar are the topics in the data network? • Use ontology matching to generate term link graph – Falcon-AO (linguistic matchers + structural matcher + synonyms) • Created 83K class mappings, 1.5K object property mappings, and 858 data property mappings – Similarity threshold = 0.9 – Top-5 popular labels for classes and properties • Significant overlap in topics, does not follow power law as in broader SW 11 @micheldumontier::ISWC 2015
  • 12. Correlation of Link Graphs To what degree are each of the three link graphs are correlated? • Spearman’s rank correlation coefficient: – Entity link graph  dataset pairs: entity links / entities – Term link graph  dataset pairs: term mappings / terms – Dataset link graph  dataset pairs: shortest path length • All positively correlated – Closer datasets in distance have more linked entities and terms – Number of linked entities contributes little to overlap of topics 12 @micheldumontier::ISWC 2015
  • 13. Summary of Findings • Dataset, entity and term link graphs do not necessarily share the same characteristics with the Hypertext / Semantic Web – Degree distribution of entity links does not follow power law – Data hubs • A significant number of entities have been linked using x-relations, but their intended semantics differs – Classes are identical or equivalent  entity links represent logical equivalence • Symmetric and transitive entity links do exist, but their utility is weakened due to their small number – Meanings of entity links may shift during transitive closure • Only matching the labels of entities may fail, while combining different properties and using simple learning algorithms achieve good accuracy 13 @micheldumontier::ISWC 2015