SlideShare a Scribd company logo
1 of 22
Download to read offline
How well can embeddings represent the
biology of genes related to the complex
pathophysiology of insulin resistance?
Identification of Insulin
Resistance-Related Genes
with Biomedical Knowledge
Graphs Topology and
Embeddings
M. Lisandra Zepeda Mendoza,
Tankred Ott, Marc Boubnovski, Viktor Sandberg, Ramneek Gupta
Executive summary
M. Lisandra Zepeda M.
Identification of Insulin Resistance-Related Genes
with Biomedical Knowledge Graphs Topology and
Embeddings
It is difficult to identify the entire set of
genes associated with IR (insulin
resistance) due to its complexity and
multifactorial nature.
Knowledge graphs (KGs) model relevant
biomedical entities (proteins, diseases,
pathways, etc.) in many different ways.
The specific data model can impact the
results.
Various different algorithms available.
Challenge
How well can embeddings represent the
biology of genes related to the complex
pathophysiology of insulin resistance?
Question
Understand the complexity of insulin
resistance
Identify genes related to insulin resistance
using knowledge graph embeddings using a
data-driven approach.
Goal
Specialist in Biomedical
Knowledge
Representation, NNRCO
Page 2
Appendix
3
Novo Nordisk company presentation
Neetima Bhardwaj &
Veleena Nisha Lobo
Product Supply
US
Mandy Marquardt
Team Novo Nordisk
Professional track cyclist
Background
What is insulin resistance?
Page 4
The insulin signaling pathway
Picture from https://www.nature.com/articles/s41392-022-01073-0
No, wait… actually, it’s tissue-specific
Page 5
A unified concept of insulin resistance in humans
Picture from https://www.nature.com/articles/s41586-019-1797-8
Picture from https://www.nature.com/articles/s41392-022-01073-0
Insulin resistance related diseases in human
Developing a framework to explore the IR
landscape using biomedical KG
What to consider?
Page 6
o KG schema
o Information within the knowledge graph:
o Quality
o Amount
o Relevance for task
o Methods used to predict IR-related genes/proteins
Methods
Our heritage enables us to
defeat diabetes and other
serious chronic diseases
Novo Nordisk company
presentation
7
Otávio Domingos da Costa
Otávio has type 2 diabetes and obesity
Brazil
Which KGs to use? Enriched benchmarking KGs
Page 8
OpenBiolink
(IR node present as phenotype, 55 Gene-IR links)
Hetionet
(IR node absent)
Picture from https://doi.org/10.1093/bioinformatics/btaa274 Picture from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5640425/
• We use general-purpose biomedical knowledge graphs and
want to update them using selected information.
• Add a link between the genes predicted to be related to IR by a
bioinformatics study from Gao et al. [PMID: 32651353]
• This added 624 Gene-IR links to the KGs (i.e. improved our
training set)
Zoomed-in details of the framework
Train:test:val data splits
Page 9
The developed framework involves graph
data modeling and feature engineering
Methods Overview
• OpenBioLink
• Hetionet
Biomedical KG
• Topological features
• Embeddings
• Link prediction
• Outlier detection & PU
• RFs ensemble model
• GSEA
• Euclidean distance for
clustering
• MSI CMD drugs’ MoAs
1 2 3 4
Feature Engineering Models Biological context
Page 11
Exploring
IR
Diffusion profiles | Potential drug’s MoAs from
the public Multiscale Interactome KG
Page 12
https://doi.org/10.1038/s41467-021-21770-8
• Diffusion profile: The path of most
relevance connecting a drug and a disease.
Gives insights into the drug’s possible MoA.
• Implement inhouse the MSI KG and the
methodology to calculate diffusion profiles
of CMD-related drugs
• Identify which genes and biological
functions of those genes are significantly
high in the diffusion profiles of CMD drugs
Novo Nordisk company presentation
13
Results David Lozano and Peter Kusztor
David and Peter have type 1 diabetes and are
professional Team Novo Nordisk riders.
They are racing with 100 on their jersey to
celebrate the 100-year anniversary of the
discovery of insulin.
Novo Nordisk
OpenBiolink @100 predictions
Top Performers:
• Topology-based approaches on both enriched and
non-enriched OpenBioLink datasets, utilizing large
training sets, outperformed other models.
Close Contenders:
• Elkanoto with XGBoost model, applied to OpenBioLink
with large training sets and employing embeddings
from RotatE link prediction on the same biomedical
knowledge graph (biomedKG), nearly matched the top
topology-based models.
Underperformers:
• Models based on Local Outlier Factor (LOF) were
among the least effective.
Page 14
Models vary in the consistency of the top predictions
• Consistency of the @100 predictions across
10 replicates for each modelling
• Topology model very precise and small
variance
• All other models are significantly more
variance; as expected the worst performing
model is the most variance
Page 15
Which features are most relevant in the topology
modelling approach?
Page 16
Small Large
Euclidean distances known vs unknown IR-related gene
Embedding Quality:
• Lowest-quality embeddings for TransE,
IR-related genes from the positive
training set furthest to the IR node.
Training Set Impact:
• In most models, enriched training sets
decrease distances for both known
and unknown IR-related genes to the
IR node.
Page 17
GSEA Top 100 genes predicted and positive set
• Best models matched known IR pathways
and discovered new aspects.
• Worst models identified broad or organ-
specific pathways, not IR-related.
• The training set had the known IR pathways
and unexpected links to Chagas disease
pathway and cancer.
Page 18
Biological Context MSI
• The best-performing link prediction
method (RotatE on the enriched
OpenBioLink) found more genes
associated with impaired glucose
tolerance - generalize better than the
other good-scoring topology-based
and PUL-based models.
• Models could also identify obesity-
related diseases
Page 19
Novo Nordisk company presentation
20
Perspectives
We transform scientific
ideas into life-saving
medicines for patients
Perspectives
Tissue-specific KG
How would the
embeddings look like if
instead we explored the
disease in a tissue-specific
manner, rather than in a
systemic manner (all
diseases, all tissues, all
genes, in a single schema)
Foundational models
Explore the possibility of
using foundational models
on KG to perform few/zero-
shot inductive inference.
Complex queries
Use more complex
reasoning KG-querying
approaches to identify the
relations/connections
between each found gene
and the IR node to
facilitate interpretability
Validation of results
Inhouse in vitro validation
of results
Page 21
Thank you for your attention
Page 22

More Related Content

Similar to Identification of insulin-resistance genes with Knowledge Graphs topology and embeddings

GENE-GENE INTERACTION ANALYSIS IN ALZHEIMER
GENE-GENE INTERACTION ANALYSIS IN ALZHEIMERGENE-GENE INTERACTION ANALYSIS IN ALZHEIMER
GENE-GENE INTERACTION ANALYSIS IN ALZHEIMERijcsit
 
Amia tbi-14-final
Amia tbi-14-finalAmia tbi-14-final
Amia tbi-14-finalRuss Altman
 
Next Generation Data and Opportunities for Clinical Pharmacologists
Next Generation Data and Opportunities for Clinical PharmacologistsNext Generation Data and Opportunities for Clinical Pharmacologists
Next Generation Data and Opportunities for Clinical PharmacologistsPhilip Bourne
 
Stephen Friend MIT 2011-10-20
Stephen Friend MIT 2011-10-20Stephen Friend MIT 2011-10-20
Stephen Friend MIT 2011-10-20Sage Base
 
Comparative study of artificial neural network based classification for liver...
Comparative study of artificial neural network based classification for liver...Comparative study of artificial neural network based classification for liver...
Comparative study of artificial neural network based classification for liver...Alexander Decker
 
Amia tb-review-13
Amia tb-review-13Amia tb-review-13
Amia tb-review-13Russ Altman
 
Role of bioinformatics of drug designing
Role of bioinformatics of drug designingRole of bioinformatics of drug designing
Role of bioinformatics of drug designingDr NEETHU ASOKAN
 
Report- Genome wide association studies.
Report- Genome wide association studies.Report- Genome wide association studies.
Report- Genome wide association studies.Varsha Gayatonde
 
provenance of microarray experiments
provenance of microarray experimentsprovenance of microarray experiments
provenance of microarray experimentsHelena Deus
 
Technology R&D Theme 2: From Descriptive to Predictive Networks
Technology R&D Theme 2: From Descriptive to Predictive NetworksTechnology R&D Theme 2: From Descriptive to Predictive Networks
Technology R&D Theme 2: From Descriptive to Predictive NetworksAlexander Pico
 
Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...
Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...
Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...rahulmonikasharma
 
Presentation july 31_2015
Presentation july 31_2015Presentation july 31_2015
Presentation july 31_2015gkoytiger
 
SNOMED CT concept model for molecular pathology_final.pptx
SNOMED CT concept model for molecular pathology_final.pptxSNOMED CT concept model for molecular pathology_final.pptx
SNOMED CT concept model for molecular pathology_final.pptxHariHaran685388
 
Whole Genome Trait Association in SVS
Whole Genome Trait Association in SVSWhole Genome Trait Association in SVS
Whole Genome Trait Association in SVSGolden Helix
 
Establishment and analysis of a disease risk prediction model for chronic kid...
Establishment and analysis of a disease risk prediction model for chronic kid...Establishment and analysis of a disease risk prediction model for chronic kid...
Establishment and analysis of a disease risk prediction model for chronic kid...KrishMendapara1
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencingIncedo
 
Network embedding in biomedical data science
Network embedding in biomedical data scienceNetwork embedding in biomedical data science
Network embedding in biomedical data scienceArindam Ghosh
 
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...Chris Southan
 
Introduction to data integration in bioinformatics
Introduction to data integration in bioinformaticsIntroduction to data integration in bioinformatics
Introduction to data integration in bioinformaticsYan Xu
 

Similar to Identification of insulin-resistance genes with Knowledge Graphs topology and embeddings (20)

GENE-GENE INTERACTION ANALYSIS IN ALZHEIMER
GENE-GENE INTERACTION ANALYSIS IN ALZHEIMERGENE-GENE INTERACTION ANALYSIS IN ALZHEIMER
GENE-GENE INTERACTION ANALYSIS IN ALZHEIMER
 
GENE-GENE INTERACTION ANALYSIS IN ALZHEIMER
GENE-GENE INTERACTION ANALYSIS IN ALZHEIMERGENE-GENE INTERACTION ANALYSIS IN ALZHEIMER
GENE-GENE INTERACTION ANALYSIS IN ALZHEIMER
 
Amia tbi-14-final
Amia tbi-14-finalAmia tbi-14-final
Amia tbi-14-final
 
Next Generation Data and Opportunities for Clinical Pharmacologists
Next Generation Data and Opportunities for Clinical PharmacologistsNext Generation Data and Opportunities for Clinical Pharmacologists
Next Generation Data and Opportunities for Clinical Pharmacologists
 
Stephen Friend MIT 2011-10-20
Stephen Friend MIT 2011-10-20Stephen Friend MIT 2011-10-20
Stephen Friend MIT 2011-10-20
 
Comparative study of artificial neural network based classification for liver...
Comparative study of artificial neural network based classification for liver...Comparative study of artificial neural network based classification for liver...
Comparative study of artificial neural network based classification for liver...
 
Amia tb-review-13
Amia tb-review-13Amia tb-review-13
Amia tb-review-13
 
Role of bioinformatics of drug designing
Role of bioinformatics of drug designingRole of bioinformatics of drug designing
Role of bioinformatics of drug designing
 
Report- Genome wide association studies.
Report- Genome wide association studies.Report- Genome wide association studies.
Report- Genome wide association studies.
 
provenance of microarray experiments
provenance of microarray experimentsprovenance of microarray experiments
provenance of microarray experiments
 
Technology R&D Theme 2: From Descriptive to Predictive Networks
Technology R&D Theme 2: From Descriptive to Predictive NetworksTechnology R&D Theme 2: From Descriptive to Predictive Networks
Technology R&D Theme 2: From Descriptive to Predictive Networks
 
Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...
Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...
Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...
 
Presentation july 31_2015
Presentation july 31_2015Presentation july 31_2015
Presentation july 31_2015
 
SNOMED CT concept model for molecular pathology_final.pptx
SNOMED CT concept model for molecular pathology_final.pptxSNOMED CT concept model for molecular pathology_final.pptx
SNOMED CT concept model for molecular pathology_final.pptx
 
Whole Genome Trait Association in SVS
Whole Genome Trait Association in SVSWhole Genome Trait Association in SVS
Whole Genome Trait Association in SVS
 
Establishment and analysis of a disease risk prediction model for chronic kid...
Establishment and analysis of a disease risk prediction model for chronic kid...Establishment and analysis of a disease risk prediction model for chronic kid...
Establishment and analysis of a disease risk prediction model for chronic kid...
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
Network embedding in biomedical data science
Network embedding in biomedical data scienceNetwork embedding in biomedical data science
Network embedding in biomedical data science
 
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
 
Introduction to data integration in bioinformatics
Introduction to data integration in bioinformaticsIntroduction to data integration in bioinformatics
Introduction to data integration in bioinformatics
 

More from Neo4j

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansNeo4j
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...Neo4j
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosNeo4j
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Neo4j
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Neo4j
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeNeo4j
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsNeo4j
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j
 
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...Neo4j
 

More from Neo4j (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with Graph
 
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
 

Recently uploaded

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Recently uploaded (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

Identification of insulin-resistance genes with Knowledge Graphs topology and embeddings

  • 1. How well can embeddings represent the biology of genes related to the complex pathophysiology of insulin resistance? Identification of Insulin Resistance-Related Genes with Biomedical Knowledge Graphs Topology and Embeddings M. Lisandra Zepeda Mendoza, Tankred Ott, Marc Boubnovski, Viktor Sandberg, Ramneek Gupta
  • 2. Executive summary M. Lisandra Zepeda M. Identification of Insulin Resistance-Related Genes with Biomedical Knowledge Graphs Topology and Embeddings It is difficult to identify the entire set of genes associated with IR (insulin resistance) due to its complexity and multifactorial nature. Knowledge graphs (KGs) model relevant biomedical entities (proteins, diseases, pathways, etc.) in many different ways. The specific data model can impact the results. Various different algorithms available. Challenge How well can embeddings represent the biology of genes related to the complex pathophysiology of insulin resistance? Question Understand the complexity of insulin resistance Identify genes related to insulin resistance using knowledge graph embeddings using a data-driven approach. Goal Specialist in Biomedical Knowledge Representation, NNRCO Page 2
  • 3. Appendix 3 Novo Nordisk company presentation Neetima Bhardwaj & Veleena Nisha Lobo Product Supply US Mandy Marquardt Team Novo Nordisk Professional track cyclist Background
  • 4. What is insulin resistance? Page 4 The insulin signaling pathway Picture from https://www.nature.com/articles/s41392-022-01073-0
  • 5. No, wait… actually, it’s tissue-specific Page 5 A unified concept of insulin resistance in humans Picture from https://www.nature.com/articles/s41586-019-1797-8 Picture from https://www.nature.com/articles/s41392-022-01073-0 Insulin resistance related diseases in human
  • 6. Developing a framework to explore the IR landscape using biomedical KG What to consider? Page 6 o KG schema o Information within the knowledge graph: o Quality o Amount o Relevance for task o Methods used to predict IR-related genes/proteins
  • 7. Methods Our heritage enables us to defeat diabetes and other serious chronic diseases Novo Nordisk company presentation 7 Otávio Domingos da Costa Otávio has type 2 diabetes and obesity Brazil
  • 8. Which KGs to use? Enriched benchmarking KGs Page 8 OpenBiolink (IR node present as phenotype, 55 Gene-IR links) Hetionet (IR node absent) Picture from https://doi.org/10.1093/bioinformatics/btaa274 Picture from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5640425/ • We use general-purpose biomedical knowledge graphs and want to update them using selected information. • Add a link between the genes predicted to be related to IR by a bioinformatics study from Gao et al. [PMID: 32651353] • This added 624 Gene-IR links to the KGs (i.e. improved our training set)
  • 9. Zoomed-in details of the framework Train:test:val data splits Page 9
  • 10. The developed framework involves graph data modeling and feature engineering
  • 11. Methods Overview • OpenBioLink • Hetionet Biomedical KG • Topological features • Embeddings • Link prediction • Outlier detection & PU • RFs ensemble model • GSEA • Euclidean distance for clustering • MSI CMD drugs’ MoAs 1 2 3 4 Feature Engineering Models Biological context Page 11 Exploring IR
  • 12. Diffusion profiles | Potential drug’s MoAs from the public Multiscale Interactome KG Page 12 https://doi.org/10.1038/s41467-021-21770-8 • Diffusion profile: The path of most relevance connecting a drug and a disease. Gives insights into the drug’s possible MoA. • Implement inhouse the MSI KG and the methodology to calculate diffusion profiles of CMD-related drugs • Identify which genes and biological functions of those genes are significantly high in the diffusion profiles of CMD drugs
  • 13. Novo Nordisk company presentation 13 Results David Lozano and Peter Kusztor David and Peter have type 1 diabetes and are professional Team Novo Nordisk riders. They are racing with 100 on their jersey to celebrate the 100-year anniversary of the discovery of insulin. Novo Nordisk
  • 14. OpenBiolink @100 predictions Top Performers: • Topology-based approaches on both enriched and non-enriched OpenBioLink datasets, utilizing large training sets, outperformed other models. Close Contenders: • Elkanoto with XGBoost model, applied to OpenBioLink with large training sets and employing embeddings from RotatE link prediction on the same biomedical knowledge graph (biomedKG), nearly matched the top topology-based models. Underperformers: • Models based on Local Outlier Factor (LOF) were among the least effective. Page 14
  • 15. Models vary in the consistency of the top predictions • Consistency of the @100 predictions across 10 replicates for each modelling • Topology model very precise and small variance • All other models are significantly more variance; as expected the worst performing model is the most variance Page 15
  • 16. Which features are most relevant in the topology modelling approach? Page 16 Small Large
  • 17. Euclidean distances known vs unknown IR-related gene Embedding Quality: • Lowest-quality embeddings for TransE, IR-related genes from the positive training set furthest to the IR node. Training Set Impact: • In most models, enriched training sets decrease distances for both known and unknown IR-related genes to the IR node. Page 17
  • 18. GSEA Top 100 genes predicted and positive set • Best models matched known IR pathways and discovered new aspects. • Worst models identified broad or organ- specific pathways, not IR-related. • The training set had the known IR pathways and unexpected links to Chagas disease pathway and cancer. Page 18
  • 19. Biological Context MSI • The best-performing link prediction method (RotatE on the enriched OpenBioLink) found more genes associated with impaired glucose tolerance - generalize better than the other good-scoring topology-based and PUL-based models. • Models could also identify obesity- related diseases Page 19
  • 20. Novo Nordisk company presentation 20 Perspectives We transform scientific ideas into life-saving medicines for patients
  • 21. Perspectives Tissue-specific KG How would the embeddings look like if instead we explored the disease in a tissue-specific manner, rather than in a systemic manner (all diseases, all tissues, all genes, in a single schema) Foundational models Explore the possibility of using foundational models on KG to perform few/zero- shot inductive inference. Complex queries Use more complex reasoning KG-querying approaches to identify the relations/connections between each found gene and the IR node to facilitate interpretability Validation of results Inhouse in vitro validation of results Page 21
  • 22. Thank you for your attention Page 22