SlideShare a Scribd company logo
Link Prediction in Linked Data of
Interspecies Interactions using
Hybrid Recommendation Approach
Hideaki TAKEDA
Professor
Chiang Mai, Thailand JIST 2014 November 10th, 2014
Tsuyoshi HOSOYA
Mycologist
Rathachai CHAWUTHAI
rathachai.c@gmail.com
Linked Open Data for ACadamiaLODAC
“Salix pierotii”
lodac:Salix
species:
hasSuperTaxon
lodac:
Salix_ pierotii
National Museum of Nature and Science
30,000 Interactions
4,000 Fungi
7,000 Hosts
Let’s find
the Missing Links
between speciesLPII
Link Prediction
on Interspecies Interactions
Objective:
To predict missing links between fungi and hosts
Agenda
•Dataset
•Introduction
•Hybrid Recommendation
• Collaborative Filtering
• Community Structure
• Biological Classification
•Evaluation
•Summary
•Future work
lodac:Melampsora_yezoensis
rdfs:label “Melampsora yezoensis”@la ;
species:hasTaxonRank species:Species ;
species:hasSuperTaxon lodac:Melampsora .
lodac:Melampsora species:hasTaxonRank species:Genus.
lodac:Salix_pierotii
rdfs:label “Salix pierotii”@la ;
rdf:type species:ScientificName ;
species:hasSuperTaxon lodac:Salix .
lodac:Salix species:hasTaxonRank species:Genus.
lodac:Melampsora_yezoensis species:growsOn lodac:Salix_pierotii.
Dataset
6
Host
Fungus
Link
lodac:
Melampsora
lodac:
Salix
species:
hasSuperTaxon
species:
hasSuperTaxon
species:
growsOn
lodac:
Melampsora_
yezoensis
lodac:
Salix_
pierotii
7
903 Rust Fungi 2,001 Hosts
2,966 Links
Biological
Classification
of Fungi
Biological
Classification
of Hosts
Selected
8
List of
Fungus-Host
interaction with
predictive scores
DATA PREPARATION LPII APPROACH
RESULT
transform data using
a Weight Function
BIOLOGIST
Making Observation
Collaborative
Filtering
Finding
Missing
Links
Combine
Score Score Score
1 2
3
4
Introduction
9
Community
Structure
Biological
Classification
Fungus-Host
Interaction
Dataset
Generate Result
Collaborative Filtering
Some fungi found at the same host
are common neighbors.
If some close neighbors of the fungus f
are found at a host h,
the fungus f may be found at the host h.
10
1
f1
f2
f3
f4
f5
h1
h2
h3
h4
h5
Fungi Hosts
11
f1
f2
h1
h2
P
CF
( f1,h2 ) = ?
Collaborative Filtering for Link Prediction
Sum of similarities between fungi with common hosts
12
f1
f2
f3
f4
f5
h1
h2
h3
h4
h5
w = ?
Jaccard Index
13
f1
f2
f3
f4
f5
h1
h2
h3
h4
h5
w = 0.50
w = 0.33
14
Predictive Score using
Collaborative Filtering
PCF( f1,h2 ) = 0.50
PCF( f2,h3 ) = 0.33
PCF( f1,h3 ) = ???
PCF( f4,h3 ) = ???
f1
f2
f3
f4
f5
h1
h2
h3
h4
h5
w = 0.50
w = 0.33
PCF( f4,h5 ) = ???
etc. 15
( Dash red lines are predicted links)
Community Structure
If a host h is commonly found
in the community of the fungus f,
the fungus f may be found at the
host h.
16
2
f1
f2
f3
f4
f5
h1
h2
h3
h4
h5
0.50
0.33
f4
f5
0.50
0.33
Bipartite GraphProjection of Fungi
f2
f1
f3
17
Community
Structure
of
Rust Fungi
18
Using Modularity
with Random Walk
f4
f5
0.50
0.33
Projection of Fungi
f2
f1
f3
CommunityStructureh1
h2
h3
h4
h5
Community #1
Community #2
Community #3
PCS( f,h ) =
Number of links between
the community of the
fungus f and the host h
Number of all links
given by the community
of the fungus f
PCS( f3,h1 ) =
2
5
= 0.40
19
20
How to
deal with
many
very small
communities?
Biological Classification
If a host h is commonly found
in the biological classification of
the fungus f,
the fungus f may be found at the
host h.
21
3
BIOLOGICAL CLASSIFICATION (TAXONOMY)
 Domain e.g. Eukaryota
 Kingdom e.g. Fungi
 Phylum e.g. Basidiomycota
 Class e.g. Urediniomycetes
 Order e.g. Uredinales
 Family e.g. Melampsoraceae
 Genus e.g. Melampsora
 Species e.g. Melampsora Yezoensis
Classification Example
22
f1
f2
f3
f4
f5
h1
h2
h3
h4
h5
with Biological Classification
G1
G2
Biological Classification
23
PBC( f,h ) =
Number of links between the
biological classification of the
fungus f and the host h
Number of all links given by
the biological classification of
the fungus f
PBC( f4,h2 ) =
1
4
= 0.25
PCF( f,h )
PII( f,h )
Hybrid Recommender Approach
PCS( f,h )
PBC( f,h )
Collaborative
Filtering
Community
Structure
Biological
Classification
24
Combination of
Evaluation
25
Training set
(2,500 links)
Test set
(500 links)
Candidates
(400,000 links)
f1
f2
f3
f4
f5
h1
h2
h3
h4
h5
f1
f2
f3
f4
f5
h1
h2
h3
h4
h5
Learning and Testing
f1
f2
f3
f4
f5
h1
h2
h3
h4
h5
All Possible
Links
Existent Links Missing Links
0.421
0.864
0.466
0.490
0.366
0.515
0.313
0.076
0.362
0.902
0.069
0.524
0.876
0.464
0.839
0.504
26
AUC Area Under the receiver operating characteristic Curve
① PII( f1,h2 ) = 0.70
② PII( f2,h3 ) = 0.60
③ PII( f1,h3 ) = 0.50
④ PII( f4,h3 ) = 0.40
⑤ PII( f2,h2 ) = 0.30
⑥ PII( f3,h3 ) = 0.20
⑦ PII( f4,h3 ) = 0.10
① PII( f1,h2 ) = 0.70
② PII( f2,h2 ) = 0.60
③ PII( f3,h3 ) = 0.50
④ PII( f2,h3 ) = 0.50
⑤ PII( f1,h3 ) = 0.40
⑥ PII( f4,h3 ) = 0.30
⑦ PII( f4,h3 ) = 0.10
Predicted List #1
(sorted by predictive score)
Low AUCHigh AUC
For n comparisons,
• n' is number of times when
the test links have higher
score than the missing links.
• n" is number of times when
the test links have same
score as the missing links.
Predicted List #2
(sorted by predictive score)
27( Red scores are test links)
AUC Area Under the receiver operating characteristic Curve
Combination Scoring Function(s) AUC
Stand-alone function
PCF 0.859
PCS 0.823
PBC 0.680
Summation of functions
PCF + PCS 0.867
PCF + PBC 0.876
PCS + PBC 0.865
PCF + PCS + PBC 0.892
Multiplication of functions
PCF × PCS 0.817
PCF × PBC 0.862
PCS × PBC 0.827
PCF × PCS × PBC 0.818
28
RDF data of
Interspecies
Interactions
Projection
of Fungi
Collaborative
Filtering
Community
Structure
Biological
Classification
SPARQL
querying
being input of
Scoring Functions
ranking
predictions
in decreasing
order
Predicted Missing Links
of Fungus-Host together with
prediction scores
DATA PREPARATION LPII APPROACH
RESULT
Bipartite Graph
Missing
Links
Community
Detection Method
transform data using
a Weight Function
DOMAIN
EXPERT
found?
yes
update
knowledgebase
NOTE
select
connected fungi
clustering using
Biological
Classification
make
observation
Data
Process
Third party method
Scoring Function
Input argument
Linear Operation
Decision
Dataflow
+
find
missing
linkssharing
LOD
Cloud
PII(f,h) +
PCF(f,h) PCS(f,h) PBC(f,h)
1 2
3
4
29
Overall
PCF( f,h )PII( f,h )
Hybrid Recommender Approach
PCS( f,h )
PBC( f,h )
α
β
γγshould be very
low as about
0.1 and 0.2.
30
Conclusion
Informatics Biology
• RDF Model for Interspecies Interaction
• Improve the use of Collaborative filtering
with sparse dataset using
• Community Structure
• and Biological Classification
• It has been found that
• In general case, PCF + PCS is enough.
• But when a node
• having a few common neighbors
• and locating in a small community,
• PBC becomes a key player for
making link prediction.
• This model supports the view that most
fungi under the same genus have similar
parasite behavior.
• Some predicted links having high
predictive score, such as,
• Phragmidium mucronatum  ハマナス
• Phragmidium fusiforme  ハマナス
• Phragmidium potentillae  イワキンバイ
have been discovered from other
literatures.
• Next enhancement is to analyze fungal
species into fungal spore types.
31
PCF( f,h )PII( f,h )
Future Work
PCS( f,h )
PBC( f,h )
α
β
γ
x1 (f,h)
x2 (f,h)
x3 (f,h)
32
RDF data of
Interspecies
Interactions
NFungi-Projection
or GProjFungi
Collaborative
Filtering
Community
Structure
Biological
Classification
SPARQL
querying
being input of
Scoring Functions
ranking
predictions
in decreasing
order
Predicted Missing Links
of Fungus-Host together with
prediction scores
DATA PREPARATION LPII APPROACH
RESULT
Bipartite Graph
GBipt
including
LExist
Missing
Links
Or
LMiss
clustering using
a Community
Detection Method
transform data using
a Weight Function
DOMAIN
EXPERT
found?
yes
update
knowledgebase
NOTE
select
connected fungi
clustering using
Biological
Classification
make
observation
Data
Process
Third party method
Scoring Function
Input argument
Linear Operation
Decision
Dataflow
+
find
missing
linkssharing
LOD
Cloud
PII(f,h) +
PCF(f,h) PCS(f,h) PBC(f,h)
1 2
3
4
Overall
α β γ
33
Any idea for improvement?

More Related Content

What's hot

BioDIP - a proposed infrastructure to link the taxonomic to the genomic and o...
BioDIP - a proposed infrastructure to link the taxonomic to the genomic and o...BioDIP - a proposed infrastructure to link the taxonomic to the genomic and o...
BioDIP - a proposed infrastructure to link the taxonomic to the genomic and o...
agosti
 
Cochrane workshop 2016
Cochrane workshop 2016Cochrane workshop 2016
Cochrane workshop 2016
TheContentMine
 
Text and Data Mining explained at FTDM
Text and Data Mining explained at FTDMText and Data Mining explained at FTDM
Text and Data Mining explained at FTDM
petermurrayrust
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)
TheContentMine
 
Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literatureAutomatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature
petermurrayrust
 
Can machines understand the scientific literature
Can machines understand the scientific literatureCan machines understand the scientific literature
Can machines understand the scientific literature
petermurrayrust
 
Content Mining of Science in Europe
Content Mining of Science in EuropeContent Mining of Science in Europe
Content Mining of Science in Europe
petermurrayrust
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literatureHigh throughput mining of the scholarly literature
High throughput mining of the scholarly literature
petermurrayrust
 
Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016 Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016
TheContentMine
 
Digital Scholarship: Enlightenment or Devastated Landscape?
Digital Scholarship: Enlightenment or Devastated Landscape? Digital Scholarship: Enlightenment or Devastated Landscape?
Digital Scholarship: Enlightenment or Devastated Landscape?
TheContentMine
 
Cochrane workshop2016
Cochrane workshop2016Cochrane workshop2016
Cochrane workshop2016
petermurrayrust
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literatureAmanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature
petermurrayrust
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literature
petermurrayrust
 
Biovision2017 Accessing the scientific literature
Biovision2017 Accessing the scientific literatureBiovision2017 Accessing the scientific literature
Biovision2017 Accessing the scientific literature
petermurrayrust
 
The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData
TheContentMine
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literature
TheContentMine
 
Open software and knowledge for MIOSS
Open software and knowledge for MIOSS Open software and knowledge for MIOSS
Open software and knowledge for MIOSS
TheContentMine
 
Open software and knowledge for MIOSS
Open software and knowledge for MIOSSOpen software and knowledge for MIOSS
Open software and knowledge for MIOSS
petermurrayrust
 
Nothing in taxonomy makes sense except in the light of Open Access
Nothing in taxonomy makes sense except in the light of Open Access Nothing in taxonomy makes sense except in the light of Open Access
Nothing in taxonomy makes sense except in the light of Open Access
agosti
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature
TheContentMine
 

What's hot (20)

BioDIP - a proposed infrastructure to link the taxonomic to the genomic and o...
BioDIP - a proposed infrastructure to link the taxonomic to the genomic and o...BioDIP - a proposed infrastructure to link the taxonomic to the genomic and o...
BioDIP - a proposed infrastructure to link the taxonomic to the genomic and o...
 
Cochrane workshop 2016
Cochrane workshop 2016Cochrane workshop 2016
Cochrane workshop 2016
 
Text and Data Mining explained at FTDM
Text and Data Mining explained at FTDMText and Data Mining explained at FTDM
Text and Data Mining explained at FTDM
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)
 
Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literatureAutomatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature
 
Can machines understand the scientific literature
Can machines understand the scientific literatureCan machines understand the scientific literature
Can machines understand the scientific literature
 
Content Mining of Science in Europe
Content Mining of Science in EuropeContent Mining of Science in Europe
Content Mining of Science in Europe
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literatureHigh throughput mining of the scholarly literature
High throughput mining of the scholarly literature
 
Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016 Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016
 
Digital Scholarship: Enlightenment or Devastated Landscape?
Digital Scholarship: Enlightenment or Devastated Landscape? Digital Scholarship: Enlightenment or Devastated Landscape?
Digital Scholarship: Enlightenment or Devastated Landscape?
 
Cochrane workshop2016
Cochrane workshop2016Cochrane workshop2016
Cochrane workshop2016
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literatureAmanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literature
 
Biovision2017 Accessing the scientific literature
Biovision2017 Accessing the scientific literatureBiovision2017 Accessing the scientific literature
Biovision2017 Accessing the scientific literature
 
The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literature
 
Open software and knowledge for MIOSS
Open software and knowledge for MIOSS Open software and knowledge for MIOSS
Open software and knowledge for MIOSS
 
Open software and knowledge for MIOSS
Open software and knowledge for MIOSSOpen software and knowledge for MIOSS
Open software and knowledge for MIOSS
 
Nothing in taxonomy makes sense except in the light of Open Access
Nothing in taxonomy makes sense except in the light of Open Access Nothing in taxonomy makes sense except in the light of Open Access
Nothing in taxonomy makes sense except in the light of Open Access
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature
 

Similar to A Logical Model for Taxonomic Concepts for Expanding Knowledge using Linked Open Data

2 donat agosti-1
2 donat agosti-12 donat agosti-1
2 donat agosti-1
agosti
 
20140623 swets agosti_final
20140623 swets agosti_final20140623 swets agosti_final
20140623 swets agosti_final
agosti
 
Ondex: Data integration and visualisation
Ondex: Data integration and visualisationOndex: Data integration and visualisation
Ondex: Data integration and visualisation
Biogeeks
 
Do pitcher plants control the assembly of pitcher microbiomes?
Do pitcher plants control the assembly of pitcher microbiomes?Do pitcher plants control the assembly of pitcher microbiomes?
Do pitcher plants control the assembly of pitcher microbiomes?
Leonora Bittleston
 
phylosmith
phylosmithphylosmith
phylosmith
sdsmith1390
 
Sense and Similarity: making sense of similarity for ontologies
Sense and Similarity: making sense of similarity for ontologiesSense and Similarity: making sense of similarity for ontologies
Sense and Similarity: making sense of similarity for ontologies
Catia Pesquita
 
Bioinformatics in a Nutshell
Bioinformatics in a NutshellBioinformatics in a Nutshell
Bioinformatics in a Nutshell
Data Science Thailand
 
Science Seminar Series 4 Norman Johnson
Science Seminar Series 4 Norman JohnsonScience Seminar Series 4 Norman Johnson
Science Seminar Series 4 Norman Johnson
University of Adelaide
 
Ap Chapter 26 Evolutionary History Of Biological Diversity
Ap Chapter 26 Evolutionary History Of Biological DiversityAp Chapter 26 Evolutionary History Of Biological Diversity
Ap Chapter 26 Evolutionary History Of Biological Diversitysmithbio
 
Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysis
Josh Neufeld
 
Scholar2Scholar presentation
Scholar2Scholar presentationScholar2Scholar presentation
Scholar2Scholar presentation
Jean-Claude Bradley
 
20140317 pi b_nmbe_journal_club
20140317 pi b_nmbe_journal_club20140317 pi b_nmbe_journal_club
20140317 pi b_nmbe_journal_club
agosti
 
Rapid Impact Assessment of Climatic and Physio-graphic Changes on Flagship G...
Rapid Impact Assessment of Climatic and Physio-graphic Changes  on Flagship G...Rapid Impact Assessment of Climatic and Physio-graphic Changes  on Flagship G...
Rapid Impact Assessment of Climatic and Physio-graphic Changes on Flagship G...
Arvinder Singh
 
Microbial Phylogenomics (EVE161) Class 5
Microbial Phylogenomics (EVE161) Class 5Microbial Phylogenomics (EVE161) Class 5
Microbial Phylogenomics (EVE161) Class 5
Jonathan Eisen
 
A Step Towards (From) Read to Write Access to Taxonomic Publications
A Step Towards  (From) Read to Write Access to Taxonomic PublicationsA Step Towards  (From) Read to Write Access to Taxonomic Publications
A Step Towards (From) Read to Write Access to Taxonomic Publications
agosti
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
GigaScience, BGI Hong Kong
 
PNAS-2013-Barr-10771-6
PNAS-2013-Barr-10771-6PNAS-2013-Barr-10771-6
PNAS-2013-Barr-10771-6Rita Auro
 
Reading Group 2014 (Insight NUIG)
Reading Group 2014 (Insight NUIG)Reading Group 2014 (Insight NUIG)
Reading Group 2014 (Insight NUIG)
Bianca Pereira
 
Is microbial ecology driven by roaming genes?
Is microbial ecology driven by roaming genes?Is microbial ecology driven by roaming genes?
Is microbial ecology driven by roaming genes?
beiko
 
Host community structure determines pathogen outbreak potential
Host community structure determines pathogen outbreak potentialHost community structure determines pathogen outbreak potential
Host community structure determines pathogen outbreak potential
ILRI
 

Similar to A Logical Model for Taxonomic Concepts for Expanding Knowledge using Linked Open Data (20)

2 donat agosti-1
2 donat agosti-12 donat agosti-1
2 donat agosti-1
 
20140623 swets agosti_final
20140623 swets agosti_final20140623 swets agosti_final
20140623 swets agosti_final
 
Ondex: Data integration and visualisation
Ondex: Data integration and visualisationOndex: Data integration and visualisation
Ondex: Data integration and visualisation
 
Do pitcher plants control the assembly of pitcher microbiomes?
Do pitcher plants control the assembly of pitcher microbiomes?Do pitcher plants control the assembly of pitcher microbiomes?
Do pitcher plants control the assembly of pitcher microbiomes?
 
phylosmith
phylosmithphylosmith
phylosmith
 
Sense and Similarity: making sense of similarity for ontologies
Sense and Similarity: making sense of similarity for ontologiesSense and Similarity: making sense of similarity for ontologies
Sense and Similarity: making sense of similarity for ontologies
 
Bioinformatics in a Nutshell
Bioinformatics in a NutshellBioinformatics in a Nutshell
Bioinformatics in a Nutshell
 
Science Seminar Series 4 Norman Johnson
Science Seminar Series 4 Norman JohnsonScience Seminar Series 4 Norman Johnson
Science Seminar Series 4 Norman Johnson
 
Ap Chapter 26 Evolutionary History Of Biological Diversity
Ap Chapter 26 Evolutionary History Of Biological DiversityAp Chapter 26 Evolutionary History Of Biological Diversity
Ap Chapter 26 Evolutionary History Of Biological Diversity
 
Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysis
 
Scholar2Scholar presentation
Scholar2Scholar presentationScholar2Scholar presentation
Scholar2Scholar presentation
 
20140317 pi b_nmbe_journal_club
20140317 pi b_nmbe_journal_club20140317 pi b_nmbe_journal_club
20140317 pi b_nmbe_journal_club
 
Rapid Impact Assessment of Climatic and Physio-graphic Changes on Flagship G...
Rapid Impact Assessment of Climatic and Physio-graphic Changes  on Flagship G...Rapid Impact Assessment of Climatic and Physio-graphic Changes  on Flagship G...
Rapid Impact Assessment of Climatic and Physio-graphic Changes on Flagship G...
 
Microbial Phylogenomics (EVE161) Class 5
Microbial Phylogenomics (EVE161) Class 5Microbial Phylogenomics (EVE161) Class 5
Microbial Phylogenomics (EVE161) Class 5
 
A Step Towards (From) Read to Write Access to Taxonomic Publications
A Step Towards  (From) Read to Write Access to Taxonomic PublicationsA Step Towards  (From) Read to Write Access to Taxonomic Publications
A Step Towards (From) Read to Write Access to Taxonomic Publications
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
 
PNAS-2013-Barr-10771-6
PNAS-2013-Barr-10771-6PNAS-2013-Barr-10771-6
PNAS-2013-Barr-10771-6
 
Reading Group 2014 (Insight NUIG)
Reading Group 2014 (Insight NUIG)Reading Group 2014 (Insight NUIG)
Reading Group 2014 (Insight NUIG)
 
Is microbial ecology driven by roaming genes?
Is microbial ecology driven by roaming genes?Is microbial ecology driven by roaming genes?
Is microbial ecology driven by roaming genes?
 
Host community structure determines pathogen outbreak potential
Host community structure determines pathogen outbreak potentialHost community structure determines pathogen outbreak potential
Host community structure determines pathogen outbreak potential
 

Recently uploaded

Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 

Recently uploaded (20)

Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 

A Logical Model for Taxonomic Concepts for Expanding Knowledge using Linked Open Data

  • 1. Link Prediction in Linked Data of Interspecies Interactions using Hybrid Recommendation Approach Hideaki TAKEDA Professor Chiang Mai, Thailand JIST 2014 November 10th, 2014 Tsuyoshi HOSOYA Mycologist Rathachai CHAWUTHAI rathachai.c@gmail.com
  • 2. Linked Open Data for ACadamiaLODAC “Salix pierotii” lodac:Salix species: hasSuperTaxon lodac: Salix_ pierotii
  • 3. National Museum of Nature and Science 30,000 Interactions 4,000 Fungi 7,000 Hosts
  • 4. Let’s find the Missing Links between speciesLPII Link Prediction on Interspecies Interactions Objective: To predict missing links between fungi and hosts
  • 5. Agenda •Dataset •Introduction •Hybrid Recommendation • Collaborative Filtering • Community Structure • Biological Classification •Evaluation •Summary •Future work
  • 6. lodac:Melampsora_yezoensis rdfs:label “Melampsora yezoensis”@la ; species:hasTaxonRank species:Species ; species:hasSuperTaxon lodac:Melampsora . lodac:Melampsora species:hasTaxonRank species:Genus. lodac:Salix_pierotii rdfs:label “Salix pierotii”@la ; rdf:type species:ScientificName ; species:hasSuperTaxon lodac:Salix . lodac:Salix species:hasTaxonRank species:Genus. lodac:Melampsora_yezoensis species:growsOn lodac:Salix_pierotii. Dataset 6 Host Fungus Link
  • 8. 903 Rust Fungi 2,001 Hosts 2,966 Links Biological Classification of Fungi Biological Classification of Hosts Selected 8
  • 9. List of Fungus-Host interaction with predictive scores DATA PREPARATION LPII APPROACH RESULT transform data using a Weight Function BIOLOGIST Making Observation Collaborative Filtering Finding Missing Links Combine Score Score Score 1 2 3 4 Introduction 9 Community Structure Biological Classification Fungus-Host Interaction Dataset Generate Result
  • 10. Collaborative Filtering Some fungi found at the same host are common neighbors. If some close neighbors of the fungus f are found at a host h, the fungus f may be found at the host h. 10 1
  • 12. f1 f2 h1 h2 P CF ( f1,h2 ) = ? Collaborative Filtering for Link Prediction Sum of similarities between fungi with common hosts 12
  • 15. Predictive Score using Collaborative Filtering PCF( f1,h2 ) = 0.50 PCF( f2,h3 ) = 0.33 PCF( f1,h3 ) = ??? PCF( f4,h3 ) = ??? f1 f2 f3 f4 f5 h1 h2 h3 h4 h5 w = 0.50 w = 0.33 PCF( f4,h5 ) = ??? etc. 15 ( Dash red lines are predicted links)
  • 16. Community Structure If a host h is commonly found in the community of the fungus f, the fungus f may be found at the host h. 16 2
  • 19. f4 f5 0.50 0.33 Projection of Fungi f2 f1 f3 CommunityStructureh1 h2 h3 h4 h5 Community #1 Community #2 Community #3 PCS( f,h ) = Number of links between the community of the fungus f and the host h Number of all links given by the community of the fungus f PCS( f3,h1 ) = 2 5 = 0.40 19
  • 20. 20 How to deal with many very small communities?
  • 21. Biological Classification If a host h is commonly found in the biological classification of the fungus f, the fungus f may be found at the host h. 21 3
  • 22. BIOLOGICAL CLASSIFICATION (TAXONOMY)  Domain e.g. Eukaryota  Kingdom e.g. Fungi  Phylum e.g. Basidiomycota  Class e.g. Urediniomycetes  Order e.g. Uredinales  Family e.g. Melampsoraceae  Genus e.g. Melampsora  Species e.g. Melampsora Yezoensis Classification Example 22
  • 23. f1 f2 f3 f4 f5 h1 h2 h3 h4 h5 with Biological Classification G1 G2 Biological Classification 23 PBC( f,h ) = Number of links between the biological classification of the fungus f and the host h Number of all links given by the biological classification of the fungus f PBC( f4,h2 ) = 1 4 = 0.25
  • 24. PCF( f,h ) PII( f,h ) Hybrid Recommender Approach PCS( f,h ) PBC( f,h ) Collaborative Filtering Community Structure Biological Classification 24 Combination of
  • 26. Training set (2,500 links) Test set (500 links) Candidates (400,000 links) f1 f2 f3 f4 f5 h1 h2 h3 h4 h5 f1 f2 f3 f4 f5 h1 h2 h3 h4 h5 Learning and Testing f1 f2 f3 f4 f5 h1 h2 h3 h4 h5 All Possible Links Existent Links Missing Links 0.421 0.864 0.466 0.490 0.366 0.515 0.313 0.076 0.362 0.902 0.069 0.524 0.876 0.464 0.839 0.504 26
  • 27. AUC Area Under the receiver operating characteristic Curve ① PII( f1,h2 ) = 0.70 ② PII( f2,h3 ) = 0.60 ③ PII( f1,h3 ) = 0.50 ④ PII( f4,h3 ) = 0.40 ⑤ PII( f2,h2 ) = 0.30 ⑥ PII( f3,h3 ) = 0.20 ⑦ PII( f4,h3 ) = 0.10 ① PII( f1,h2 ) = 0.70 ② PII( f2,h2 ) = 0.60 ③ PII( f3,h3 ) = 0.50 ④ PII( f2,h3 ) = 0.50 ⑤ PII( f1,h3 ) = 0.40 ⑥ PII( f4,h3 ) = 0.30 ⑦ PII( f4,h3 ) = 0.10 Predicted List #1 (sorted by predictive score) Low AUCHigh AUC For n comparisons, • n' is number of times when the test links have higher score than the missing links. • n" is number of times when the test links have same score as the missing links. Predicted List #2 (sorted by predictive score) 27( Red scores are test links)
  • 28. AUC Area Under the receiver operating characteristic Curve Combination Scoring Function(s) AUC Stand-alone function PCF 0.859 PCS 0.823 PBC 0.680 Summation of functions PCF + PCS 0.867 PCF + PBC 0.876 PCS + PBC 0.865 PCF + PCS + PBC 0.892 Multiplication of functions PCF × PCS 0.817 PCF × PBC 0.862 PCS × PBC 0.827 PCF × PCS × PBC 0.818 28
  • 29. RDF data of Interspecies Interactions Projection of Fungi Collaborative Filtering Community Structure Biological Classification SPARQL querying being input of Scoring Functions ranking predictions in decreasing order Predicted Missing Links of Fungus-Host together with prediction scores DATA PREPARATION LPII APPROACH RESULT Bipartite Graph Missing Links Community Detection Method transform data using a Weight Function DOMAIN EXPERT found? yes update knowledgebase NOTE select connected fungi clustering using Biological Classification make observation Data Process Third party method Scoring Function Input argument Linear Operation Decision Dataflow + find missing linkssharing LOD Cloud PII(f,h) + PCF(f,h) PCS(f,h) PBC(f,h) 1 2 3 4 29 Overall
  • 30. PCF( f,h )PII( f,h ) Hybrid Recommender Approach PCS( f,h ) PBC( f,h ) α β γγshould be very low as about 0.1 and 0.2. 30
  • 31. Conclusion Informatics Biology • RDF Model for Interspecies Interaction • Improve the use of Collaborative filtering with sparse dataset using • Community Structure • and Biological Classification • It has been found that • In general case, PCF + PCS is enough. • But when a node • having a few common neighbors • and locating in a small community, • PBC becomes a key player for making link prediction. • This model supports the view that most fungi under the same genus have similar parasite behavior. • Some predicted links having high predictive score, such as, • Phragmidium mucronatum  ハマナス • Phragmidium fusiforme  ハマナス • Phragmidium potentillae  イワキンバイ have been discovered from other literatures. • Next enhancement is to analyze fungal species into fungal spore types. 31
  • 32. PCF( f,h )PII( f,h ) Future Work PCS( f,h ) PBC( f,h ) α β γ x1 (f,h) x2 (f,h) x3 (f,h) 32
  • 33. RDF data of Interspecies Interactions NFungi-Projection or GProjFungi Collaborative Filtering Community Structure Biological Classification SPARQL querying being input of Scoring Functions ranking predictions in decreasing order Predicted Missing Links of Fungus-Host together with prediction scores DATA PREPARATION LPII APPROACH RESULT Bipartite Graph GBipt including LExist Missing Links Or LMiss clustering using a Community Detection Method transform data using a Weight Function DOMAIN EXPERT found? yes update knowledgebase NOTE select connected fungi clustering using Biological Classification make observation Data Process Third party method Scoring Function Input argument Linear Operation Decision Dataflow + find missing linkssharing LOD Cloud PII(f,h) + PCF(f,h) PCS(f,h) PBC(f,h) 1 2 3 4 Overall α β γ 33
  • 34. Any idea for improvement?