SlideShare a Scribd company logo
1 of 34
Link Prediction in Linked Data of 
Interspecies Interactions using 
Hybrid Recommendation Approach 
Hideaki TAKEDA 
Professor 
Tsuyoshi HOSOYA 
Mycologist 
Rathachai CHAWUTHAI 
rathachai.c@gmail.com 
Chiang Mai, Thailand JIST 2014 November 10th, 2014
LODAC Linked Open Data for ACadamia 
“Salix pierotii” 
lodac:Salix 
species: 
hasSuperTaxon 
lodac: 
Salix_ pierotii
National Museum of Nature and Science 
30,000 Interactions 
4,000 Fungi 
7,000 Hosts
Let’s find 
the Missing Links 
between species LPII 
Link Prediction 
on Interspecies Interactions 
Objective: 
To predict missing links between fungi and hosts
Agenda 
• Dataset 
• Introduction 
• Hybrid Recommendation 
• Collaborative Filtering 
• Community Structure 
• Biological Classification 
• Evaluation 
• Summary 
• Future work
Dataset 
lodac:Melampsora_yezoensis 
rdfs:label “Melampsora yezoensis”@la ; 
species:hasTaxonRank species:Species ; 
species:hasSuperTaxon lodac:Melampsora . 
lodac:Melampsora species:hasTaxonRank species:Genus. 
lodac:Salix_pierotii 
rdfs:label “Salix pierotii”@la ; 
rdf:type species:ScientificName ; 
species:hasSuperTaxon lodac:Salix . 
lodac:Salix species:hasTaxonRank species:Genus. 
lodac:Melampsora_yezoensis species:growsOn lodac:Salix_pierotii. 
6 
Fungus 
Host 
Link
lodac: 
Melampsora 
lodac: 
Salix 
species: 
hasSuperTaxon 
species: 
hasSuperTaxon 
species: 
growsOn 
lodac: 
Melampsora_ 
yezoensis 
lodac: 
Salix_ 
pierotii 
7
903 Rust Fungi 2,001 Hosts 
2,966 Links 
Biological 
Classification 
of Fungi 
Biological 
Classification 
of Hosts 
Selected 
8
DATA PREPARATION LPII APPROACH 
List of 
Fungus-Host 
interaction with 
predictive scores 
RESULT 
transform data using 
a Weight Function 
BIOLOGIST 
Making Observation 
Finding 
Missing 
Links 
Collaborative 
Filtering 
Score Score Score 
Combine 
1 2 
4 
3 
Introduction 
9 
Community 
Structure 
Biological 
Classification 
Fungus-Host 
Interaction 
Dataset 
Generate Result
Collaborative Filtering 
Some fungi found at the same host 
are common neighbors. 
If some close neighbors of the fungus f 
are found at a host h, 
the fungus f may be found at the host h. 
10 
1
f1 
f2 
Fungi Hosts 
f3 
f4 
f5 
h1 
h2 
h3 
h4 
h5 
11
f1 
f2 
h1 
h2 
PCF 
( f1,h2 ) = ? 
Collaborative Filtering for Link Prediction 
Sum of similarities between fungi with common hosts 
12
f1 
f2 
f3 
f4 
f5 
h1 
h2 
h3 
h4 
h5 
w = ? 
Jaccard Index 
13
f1 
f2 
f3 
f4 
f5 
h1 
h2 
h3 
h4 
h5 
w = 0.50 
w = 0.33 
14
Predictive Score using 
Collaborative Filtering 
PCF( f1,h2 ) = 0.50 
PCF( f2,h3 ) = 0.33 
PCF( f1,h3 ) = ??? 
PCF( f4,h3 ) = ??? 
f1 
f2 
f3 
f4 
f5 
h1 
h2 
h3 
h4 
h5 
w = 0.50 
w = 0.33 
PCF( f4,h5 ) = ??? 
etc. 
15 
( Dash red lines are predicted links)
Community Structure 
If a host h is commonly found 
in the community of the fungus f, 
the fungus f may be found at the 
host h. 
16 
2
f1 
f2 
f3 
f4 
f5 
h1 
h2 
h3 
h4 
h5 
0.50 
0.33 
f2 
f4 
f1 
f5 
0.50 
0.33 
f3 
Projection of Fungi Bipartite Graph 
17
Community 
Structure 
of 
Rust Fungi 
Using Modularity 
with Random Walk 
18
f2 
f4 
f1 
f5 
0.50 
0.33 
f3 
Projection of Fungi 
CommunityStructure h1 
h2 
h3 
h4 
h5 
Community #1 
Community #2 
Community #3 
PCS( f,h ) = 
Number of links between 
the community of the 
fungus f and the host h 
Number of all links 
given by the community 
of the fungus f 
PCS( f3,h1 ) = 
2 
5 
= 0.40 
19
How to 
deal wi th 
many 
very smal l 
communi t ies? 
20
Biological Classification 
If a host h is commonly found 
in the biological classification of 
the fungus f, 
the fungus f may be found at the 
host h. 
21 
3
BIOLOGICAL CLASSIFICATION (TAXONOMY) 
Classification Example 
 Domain e.g. Eukaryota 
 Kingdom e.g. Fungi 
 Phylum e.g. Basidiomycota 
 Class e.g. Urediniomycetes 
 Order e.g. Uredinales 
 Family e.g. Melampsoraceae 
 Genus e.g. Melampsora 
 Species e.g. Melampsora Yezoensis 
22
f1 
f2 
f3 
f4 
f5 
h1 
h2 
h3 
h4 
h5 
G1 
G2 
with Biological Classification 
Biological Classification 
23 
PBC( f,h ) = 
Number of links between the 
biological classification of the 
fungus f and the host h 
Number of all links given by 
the biological classification of 
the fungus f 
PBC( f4,h2 ) = 
1 
4 
= 0.25
Hybrid Recommender Approach 
PCF( f,h ) 
PII( f,h ) 
PCS( f,h ) 
PBC( f,h ) 
Collaborative 
Filtering 
Community 
Structure 
Biological 
Classification 
24 
Combination of
Evaluation 
25
Learning and Testing 
Training set 
(2,500 links) 
Test set 
(500 links) 
Candidates 
(400,000 links) 
f1 
f2 
f3 
f4 
f5 
h1 
h2 
h3 
h4 
h5 
f1 
f2 
f3 
f4 
f5 
h1 
h2 
h3 
h4 
h5 
f1 
f2 
f3 
f4 
f5 
h1 
h2 
h3 
h4 
h5 
All Possible 
Links 
Existent Links Missing Links 
0.421 
0.864 
0.466 
0.490 
0.366 
0.515 
0.313 
0.076 
0.362 
0.902 
0.069 
0.524 
0.876 
0.464 
0.839 
0.504 
26
AUC Area Under the receiver operating characteristic Curve 
① PII( f1,h2 ) = 0.70 
② PII( f2,h3 ) = 0.60 
③ PII( f1,h3 ) = 0.50 
④ PII( f4,h3 ) = 0.40 
⑤ PII( f2,h2 ) = 0.30 
⑥ PII( f3,h3 ) = 0.20 
⑦ PII( f4,h3 ) = 0.10 
① PII( f1,h2 ) = 0.70 
② PII( f2,h2 ) = 0.60 
③ PII( f3,h3 ) = 0.50 
④ PII( f2,h3 ) = 0.50 
⑤ PII( f1,h3 ) = 0.40 
⑥ PII( f4,h3 ) = 0.30 
⑦ PII( f4,h3 ) = 0.10 
Predicted List #1 
(sorted by predictive score) 
High AUC Low AUC 
For n comparisons, 
• n' is number of times when 
the test links have higher 
score than the missing links. 
• n" is number of times when 
the test links have same 
score as the missing links. 
Predicted List #2 
(sorted by predictive score) 
27 ( Red scores are test links)
AUC Area Under the receiver operating characteristic Curve 
Combination Scoring Function(s) AUC 
Stand-alone function 
PCF 0.859 
PCS 0.823 
PBC 0.680 
Summation of functions 
PCF + PCS 0.867 
PCF + PBC 0.876 
PCS + PBC 0.865 
PCF + PCS + PBC 0.892 
Multiplication of functions 
PCF × PCS 0.817 
PCF × PBC 0.862 
PCS × PBC 0.827 
PCF × PCS × PBC 0.818 
28
DATA PREPARATION LPII APPROACH 
LOD 
Cloud 
RDF data of 
Interspecies 
Interactions 
Projection 
of Fungi 
Collaborative 
Filtering 
transform data using 
a Weight Function 
Community 
Structure 
Biological 
Classification 
SPARQL 
querying 
being input of 
Scoring Functions 
ranking 
predictions 
in decreasing 
order 
Bipartite Graph 
update 
knowledgebase 
Predicted Missing Links 
of Fungus-Host together with 
prediction scores 
RESULT 
Missing 
Links 
Community 
Detection Method 
DOMAIN 
EXPERT 
found? 
yes 
NOTE 
select 
connected fungi 
clustering using 
Biological 
Classification 
make 
observation 
Data 
Process 
Third party method 
Scoring Function 
Input argument 
Linear Operation 
Decision 
Dataflow 
+ 
find 
missing 
sharing links 
PII 
PCF 
(f,h) + 
(f,h) PCS 
(f,h) PBC 
(f,h) 
1 2 
4 
3 
29 
Overall
Hybrid Recommender Approach 
PII( f,h ) PCF( f,h ) 
PCS( f,h ) 
PBC( f,h ) 
α 
β 
γ should be very γ 
low as about 
0.1 and 0.2. 
30
Conclusion 
Informatics Biology 
• RDF Model for Interspecies Interaction 
• Improve the use of Collaborative filtering 
with sparse dataset using 
• Community Structure 
• and Biological Classification 
• It has been found that 
• In general case, PCF + PCS is enough. 
• But when a node 
• having a few common neighbors 
• and locating in a small community, 
• PBC becomes a key player for 
making link prediction. 
• This model supports the view that most 
fungi under the same genus have similar 
parasite behavior. 
• Some predicted links having high 
predictive score, such as, 
• Phragmidium mucronatum  ハマナス 
• Phragmidium fusiforme  ハマナス 
• Phragmidium potentillae  イワキンバイ 
have been discovered from other 
literatures. 
• Next enhancement is to analyze fungal 
species into fungal spore types. 
31
Future Work 
PII( f,h ) PCF( f,h ) 
PCS( f,h ) 
PBC( f,h ) 
α 
β 
γ 
x1 (f,h) 
x2 (f,h) 
x3 (f,h) 
32
DATA PREPARATION LPII APPROACH 
LOD 
Cloud 
RDF data of 
Interspecies 
Interactions 
transform data using 
a Weight Function 
NFungi-Projection 
or GProjFungi 
Collaborative 
Filtering 
Community 
Structure 
Biological 
Classification 
SPARQL 
querying 
being input of 
Scoring Functions 
ranking 
predictions 
in decreasing 
order 
Bipartite Graph 
GBipt 
including 
LExist 
update 
knowledgebase 
Predicted Missing Links 
of Fungus-Host together with 
prediction scores 
RESULT 
Missing 
Links 
Or 
LMiss 
clustering using 
a Community 
Detection Method 
DOMAIN 
EXPERT 
found? 
yes 
NOTE 
select 
connected fungi 
clustering using 
Biological 
Classification 
make 
observation 
Data 
Process 
Third party method 
Scoring Function 
Input argument 
Linear Operation 
Decision 
Dataflow 
+ 
find 
missing 
sharing links 
PII 
PCF 
(f,h) + 
(f,h) PCS 
(f,h) PBC 
(f,h) 
1 2 
4 
3 
Overall 
α β γ 
33
Any idea for improvement?

More Related Content

Recently uploaded

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 

Recently uploaded (20)

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Link Prediction in Linked Data of Interspecies Interactions using Hybrid Recommendation Approach

  • 1. Link Prediction in Linked Data of Interspecies Interactions using Hybrid Recommendation Approach Hideaki TAKEDA Professor Tsuyoshi HOSOYA Mycologist Rathachai CHAWUTHAI rathachai.c@gmail.com Chiang Mai, Thailand JIST 2014 November 10th, 2014
  • 2. LODAC Linked Open Data for ACadamia “Salix pierotii” lodac:Salix species: hasSuperTaxon lodac: Salix_ pierotii
  • 3. National Museum of Nature and Science 30,000 Interactions 4,000 Fungi 7,000 Hosts
  • 4. Let’s find the Missing Links between species LPII Link Prediction on Interspecies Interactions Objective: To predict missing links between fungi and hosts
  • 5. Agenda • Dataset • Introduction • Hybrid Recommendation • Collaborative Filtering • Community Structure • Biological Classification • Evaluation • Summary • Future work
  • 6. Dataset lodac:Melampsora_yezoensis rdfs:label “Melampsora yezoensis”@la ; species:hasTaxonRank species:Species ; species:hasSuperTaxon lodac:Melampsora . lodac:Melampsora species:hasTaxonRank species:Genus. lodac:Salix_pierotii rdfs:label “Salix pierotii”@la ; rdf:type species:ScientificName ; species:hasSuperTaxon lodac:Salix . lodac:Salix species:hasTaxonRank species:Genus. lodac:Melampsora_yezoensis species:growsOn lodac:Salix_pierotii. 6 Fungus Host Link
  • 7. lodac: Melampsora lodac: Salix species: hasSuperTaxon species: hasSuperTaxon species: growsOn lodac: Melampsora_ yezoensis lodac: Salix_ pierotii 7
  • 8. 903 Rust Fungi 2,001 Hosts 2,966 Links Biological Classification of Fungi Biological Classification of Hosts Selected 8
  • 9. DATA PREPARATION LPII APPROACH List of Fungus-Host interaction with predictive scores RESULT transform data using a Weight Function BIOLOGIST Making Observation Finding Missing Links Collaborative Filtering Score Score Score Combine 1 2 4 3 Introduction 9 Community Structure Biological Classification Fungus-Host Interaction Dataset Generate Result
  • 10. Collaborative Filtering Some fungi found at the same host are common neighbors. If some close neighbors of the fungus f are found at a host h, the fungus f may be found at the host h. 10 1
  • 11. f1 f2 Fungi Hosts f3 f4 f5 h1 h2 h3 h4 h5 11
  • 12. f1 f2 h1 h2 PCF ( f1,h2 ) = ? Collaborative Filtering for Link Prediction Sum of similarities between fungi with common hosts 12
  • 13. f1 f2 f3 f4 f5 h1 h2 h3 h4 h5 w = ? Jaccard Index 13
  • 14. f1 f2 f3 f4 f5 h1 h2 h3 h4 h5 w = 0.50 w = 0.33 14
  • 15. Predictive Score using Collaborative Filtering PCF( f1,h2 ) = 0.50 PCF( f2,h3 ) = 0.33 PCF( f1,h3 ) = ??? PCF( f4,h3 ) = ??? f1 f2 f3 f4 f5 h1 h2 h3 h4 h5 w = 0.50 w = 0.33 PCF( f4,h5 ) = ??? etc. 15 ( Dash red lines are predicted links)
  • 16. Community Structure If a host h is commonly found in the community of the fungus f, the fungus f may be found at the host h. 16 2
  • 17. f1 f2 f3 f4 f5 h1 h2 h3 h4 h5 0.50 0.33 f2 f4 f1 f5 0.50 0.33 f3 Projection of Fungi Bipartite Graph 17
  • 18. Community Structure of Rust Fungi Using Modularity with Random Walk 18
  • 19. f2 f4 f1 f5 0.50 0.33 f3 Projection of Fungi CommunityStructure h1 h2 h3 h4 h5 Community #1 Community #2 Community #3 PCS( f,h ) = Number of links between the community of the fungus f and the host h Number of all links given by the community of the fungus f PCS( f3,h1 ) = 2 5 = 0.40 19
  • 20. How to deal wi th many very smal l communi t ies? 20
  • 21. Biological Classification If a host h is commonly found in the biological classification of the fungus f, the fungus f may be found at the host h. 21 3
  • 22. BIOLOGICAL CLASSIFICATION (TAXONOMY) Classification Example  Domain e.g. Eukaryota  Kingdom e.g. Fungi  Phylum e.g. Basidiomycota  Class e.g. Urediniomycetes  Order e.g. Uredinales  Family e.g. Melampsoraceae  Genus e.g. Melampsora  Species e.g. Melampsora Yezoensis 22
  • 23. f1 f2 f3 f4 f5 h1 h2 h3 h4 h5 G1 G2 with Biological Classification Biological Classification 23 PBC( f,h ) = Number of links between the biological classification of the fungus f and the host h Number of all links given by the biological classification of the fungus f PBC( f4,h2 ) = 1 4 = 0.25
  • 24. Hybrid Recommender Approach PCF( f,h ) PII( f,h ) PCS( f,h ) PBC( f,h ) Collaborative Filtering Community Structure Biological Classification 24 Combination of
  • 26. Learning and Testing Training set (2,500 links) Test set (500 links) Candidates (400,000 links) f1 f2 f3 f4 f5 h1 h2 h3 h4 h5 f1 f2 f3 f4 f5 h1 h2 h3 h4 h5 f1 f2 f3 f4 f5 h1 h2 h3 h4 h5 All Possible Links Existent Links Missing Links 0.421 0.864 0.466 0.490 0.366 0.515 0.313 0.076 0.362 0.902 0.069 0.524 0.876 0.464 0.839 0.504 26
  • 27. AUC Area Under the receiver operating characteristic Curve ① PII( f1,h2 ) = 0.70 ② PII( f2,h3 ) = 0.60 ③ PII( f1,h3 ) = 0.50 ④ PII( f4,h3 ) = 0.40 ⑤ PII( f2,h2 ) = 0.30 ⑥ PII( f3,h3 ) = 0.20 ⑦ PII( f4,h3 ) = 0.10 ① PII( f1,h2 ) = 0.70 ② PII( f2,h2 ) = 0.60 ③ PII( f3,h3 ) = 0.50 ④ PII( f2,h3 ) = 0.50 ⑤ PII( f1,h3 ) = 0.40 ⑥ PII( f4,h3 ) = 0.30 ⑦ PII( f4,h3 ) = 0.10 Predicted List #1 (sorted by predictive score) High AUC Low AUC For n comparisons, • n' is number of times when the test links have higher score than the missing links. • n" is number of times when the test links have same score as the missing links. Predicted List #2 (sorted by predictive score) 27 ( Red scores are test links)
  • 28. AUC Area Under the receiver operating characteristic Curve Combination Scoring Function(s) AUC Stand-alone function PCF 0.859 PCS 0.823 PBC 0.680 Summation of functions PCF + PCS 0.867 PCF + PBC 0.876 PCS + PBC 0.865 PCF + PCS + PBC 0.892 Multiplication of functions PCF × PCS 0.817 PCF × PBC 0.862 PCS × PBC 0.827 PCF × PCS × PBC 0.818 28
  • 29. DATA PREPARATION LPII APPROACH LOD Cloud RDF data of Interspecies Interactions Projection of Fungi Collaborative Filtering transform data using a Weight Function Community Structure Biological Classification SPARQL querying being input of Scoring Functions ranking predictions in decreasing order Bipartite Graph update knowledgebase Predicted Missing Links of Fungus-Host together with prediction scores RESULT Missing Links Community Detection Method DOMAIN EXPERT found? yes NOTE select connected fungi clustering using Biological Classification make observation Data Process Third party method Scoring Function Input argument Linear Operation Decision Dataflow + find missing sharing links PII PCF (f,h) + (f,h) PCS (f,h) PBC (f,h) 1 2 4 3 29 Overall
  • 30. Hybrid Recommender Approach PII( f,h ) PCF( f,h ) PCS( f,h ) PBC( f,h ) α β γ should be very γ low as about 0.1 and 0.2. 30
  • 31. Conclusion Informatics Biology • RDF Model for Interspecies Interaction • Improve the use of Collaborative filtering with sparse dataset using • Community Structure • and Biological Classification • It has been found that • In general case, PCF + PCS is enough. • But when a node • having a few common neighbors • and locating in a small community, • PBC becomes a key player for making link prediction. • This model supports the view that most fungi under the same genus have similar parasite behavior. • Some predicted links having high predictive score, such as, • Phragmidium mucronatum  ハマナス • Phragmidium fusiforme  ハマナス • Phragmidium potentillae  イワキンバイ have been discovered from other literatures. • Next enhancement is to analyze fungal species into fungal spore types. 31
  • 32. Future Work PII( f,h ) PCF( f,h ) PCS( f,h ) PBC( f,h ) α β γ x1 (f,h) x2 (f,h) x3 (f,h) 32
  • 33. DATA PREPARATION LPII APPROACH LOD Cloud RDF data of Interspecies Interactions transform data using a Weight Function NFungi-Projection or GProjFungi Collaborative Filtering Community Structure Biological Classification SPARQL querying being input of Scoring Functions ranking predictions in decreasing order Bipartite Graph GBipt including LExist update knowledgebase Predicted Missing Links of Fungus-Host together with prediction scores RESULT Missing Links Or LMiss clustering using a Community Detection Method DOMAIN EXPERT found? yes NOTE select connected fungi clustering using Biological Classification make observation Data Process Third party method Scoring Function Input argument Linear Operation Decision Dataflow + find missing sharing links PII PCF (f,h) + (f,h) PCS (f,h) PBC (f,h) 1 2 4 3 Overall α β γ 33
  • 34. Any idea for improvement?