SlideShare a Scribd company logo
1 of 39
Download to read offline
CINF – ACS National Meeting – 20 March 2022
Dr Frederik van den Broek – Elsevier Professional Services
De-siloing data and building
knowledge graphs outside of
drug discovery: Opportunities
and challenges
• A Quest for the Holy Grail?
Image: https://www.defense.gov/observe/photo-gallery/igphoto/2002024337/
“There are […]
known knowns, […]
known unknowns, […] and
unknown unknowns”
Also see:
https://en.wikipedia.org/wiki/There_are_known_knowns
https://www.youtube.com/watch?v=REWeBzGuzCc
General problem with data
https://sgnm.nl/wp-content/uploads/2019/11/datasilos.jpg
Knowledge or information / data silos
https://www.gene.com/scientists/our-scientists/dana-caulder
Linking data to retrieve insights
• Getting data sets out of a silo into a single data warehouse / data lake /
knowledgebase is not enough
• As long as the sets are not connected and/or normalised, you will have
only turned data silos into data islands
• So the challenge will be to build bridges between the islands
Image: https://commons.wikimedia.org/wiki/File:ORESUNDBRIDGE_WIDE.jpg
Example (knowledge) project question:
• Consumers associate a cool sensation in the mouth with a “fresh” feeling
• How can we have that cool sensation from a toothpaste or mouthwash?
“Cooling sensation knowledge”
Fresh
feeling
Cooling
sensation
Is associated with
Consumer perception
knowledge base
“Cooling sensation knowledge”
Fresh
feeling
Cooling
sensation
TRPM8
protein
Increases
Biology knowledge base
“Cooling sensation knowledge”
Fresh
feeling
Cooling
sensation
TRPM8
protein
1374760-96-9 N-ethyl-N-(thiophen-
2-ylmethyl)-2-(p-
tolyloxy)acetamide
Binds to
Has chemical name
Has CAS
number
Chemical &
Bioactivity
knowledge
base
“Cooling sensation knowledge”
Fresh
feeling
Cooling
sensation
TRPM8
protein
1374760-96-9 N-ethyl-N-(thiophen-
2-ylmethyl)-2-(p-
tolyloxy)acetamide
Is associated with
Increases
Binds to
Has chemical name
Has CAS
number
Consumer perception
knowledge base
Biology knowledge base
Chemical &
Bioactivity
knowledge
base
“Cooling sensation knowledge”
Fresh
feeling
Cooling
sensation
TRPM8
protein
1374760-96-9 N-ethyl-N-(thiophen-
2-ylmethyl)-2-(p-
tolyloxy)acetamide
Is associated with
Increases
Binds to
Has chemical name
Has CAS
number
Consumer perception
knowledge base
Biology knowledge base
Chemical &
Bioactivity
knowledge
base
Linking will only be possible if concepts are
mapped across the siloes
“Cooling sensation knowledge graph”
Fresh
feeling
Cooling
sensation
TRPM8
protein
1374760-96-9 N-ethyl-N-(thiophen-
2-ylmethyl)-2-(p-
tolyloxy)acetamide
Is associated with
Increases
Binds to
Has chemical name
Has CAS
number
“Traversing” the graph allows logical inference
for retrieving implicit knowledge from data that
could have come from various sources
Knowledge graphs can be created from all kinds
of data sources
From: https://www.stardog.com/blog/what-is-a-knowledge-graph/
So how do we get to a knowledge graph?
Internal Data
Search
AI/ML
Knowledge Graphs
Analytics
Enterprise search
So how do we get to a knowledge graph?
Internal Data
Search
AI/ML
Knowledge Graphs
Analytics
Enterprise search
Map and normalise
concepts with / into
ontologies
Named Entity
Recognition
The need to map and normalise concepts
From: https://www.panadol.com/de-ch/products/adult-product/other-pain-n-fever/panadol-s-with-optizorb.html
How to represent this concept?
The need to map and normalise concepts
Paracetamol
Acetaminophen
N-acetyl-para-
aminophenol
Tylenol
CAS: 103-90-2
Panadol
InChI=1S/C8H9NO2/c1-6(10)9-7-2-
4-8(11)5-3-7/h2-5,11H,1H3,(H,9,10)
CC(=O)Nc1ccc(cc1)O
DrugBank:
DB00316
Ingredients
Active ingredient: Paracetamol 500 mg.
Also contains : Pregelatinised starch, calcium carbonate,
alginic acid, crospovidone, povidone, magnesium stearate,
colloidal anhydrous silica and sodium methyl (E 219),
sodium ethyl (E 215), and sodium propyl (E 217)
parahydroxybenzoates.
The need to map and normalise concepts
Paracetamol
Acetaminophen
N-acetyl-para-
aminophenol
Tylenol
CAS: 103-90-2
Panadol
InChI=1S/C8H9NO2/c1-6(10)9-7-2-
4-8(11)5-3-7/h2-5,11H,1H3,(H,9,10)
CC(=O)Nc1ccc(cc1)O
DrugBank:
DB00316
Ingredients
Active ingredient: Paracetamol 500 mg.
Also contains : Pregelatinised starch, calcium carbonate,
alginic acid, crospovidone, povidone, magnesium stearate,
colloidal anhydrous silica and sodium methyl (E 219),
sodium ethyl (E 215), and sodium propyl (E 217)
parahydroxybenzoates.
The need to map and normalise concepts
Paracetamol
Acetaminophen
N-acetyl-para-
aminophenol
Tylenol
CAS: 103-90-2
Panadol
InChI=1S/C8H9NO2/c1-6(10)9-7-2-
4-8(11)5-3-7/h2-5,11H,1H3,(H,9,10)
CC(=O)Nc1ccc(cc1)O
DrugBank:
DB00316
Ingredients
Active ingredient: Paracetamol 500 mg.
Also contains : Pregelatinised starch, calcium carbonate,
alginic acid, crospovidone, povidone, magnesium stearate,
colloidal anhydrous silica and sodium methyl (E 219),
sodium ethyl (E 215), and sodium propyl (E 217)
parahydroxybenzoates.
Names
(Common) Identifiers
Chemical
structures
Formulations
Trade names
Trying to put concepts and synonyms in a structure
Named Entity Recognition
From: https://research.zalando.com/welcome/mission/research-projects/flair-nlp/
The task of identifying and categorizing key information (entities) in text
Named Entity Recognition
From: https://research.zalando.com/welcome/mission/research-projects/flair-nlp/
The task of identifying and categorizing key information (entities) in text
The context also matters!
When to use which concept?
• Use different ontologies / taxonomies for different contexts
• But when the ontologies are mapped and linked, the knowledge graph will
follow….
Is this Knowledge Graph thing just another hype?
In 2019…
https://www.gartner.com/smarterwithgartner/top-trends-on-the-gartner-hype-cycle-for-artificial-intelligence-2019/
Is this Knowledge Graph thing just another hype?
In 2021…
https://www.gartner.com/en/articles/the-4-trends-that-prevail-on-the-gartner-hype-cycle-for-ai-2021
Is this Knowledge Graph thing just another hype?
In 2021…
Creating a knowledge
graph builds on tools
and output from
Semantic Search
https://www.gartner.com/en/articles/the-4-trends-that-prevail-on-the-gartner-hype-cycle-for-ai-2021
Daily use of a knowledge graph
Daily use of a knowledge graph
• Knowledge Panel next
to the Google search
results is powered by a
knowledge graph
• Information gathered
from a variety of
sources
• Also powers Google
Assistant and Google
Home
For more details, see:
https://support.google.com/knowledgepanel/answer/9787176?hl=en
https://en.wikipedia.org/wiki/Google_Knowledge_Graph
So can I then do Machine Learning?
https://xkcd.com/1838
Special branch of Machine Learning for graphs
• Social networks analytics
• Traffic network prediction
• Recommender systems
• NLP, text classification
• Chemical reaction prediction
https://doi.org/10.1021/acsomega.1c04017
https://doi.org/10.1093/bib/bbab159
https://doi.org/10.3389/fgene.2021.690049
Main challenges outside of drug discovery
• For pharma and life sciences there are many established and publicly
available ontologies
https://www.ebi.ac.uk/ols/index
Main challenges outside of drug discovery
• For pharma and life sciences there are many established and publicly
available ontologies
• Directly adjacent fields might make use of some life science ontologies
• For other fields (e.g. polymers) ontologies/taxonomies will need to be
created, which will require time, effort and some FAIRness
https://www.go-fair.org/fair-principles/
Findable
Accessible
Interoperable
Reusable
It can be done though…
• Created a small-ish taxonomy for petroleum engineering subject of
overpressure mechanisms
Open Access conference paper:
https://doi.org/10.3997/2214-4609.202113138
Take-home messages
• Getting data sets out of a silo into a single data
warehouse / data lake / knowledgebase is not enough
• The need to map and normalise concepts (ontologies / taxonomies)
• When the ontologies are mapped and linked, the knowledge graph will
follow….
• Use of Named Entity Recognition to extract from (semi-)structured data
• Biggest effort will be creating ontologies in fields adjacent to life sciences, but
Elsevier / Scibite have the expertise, technology and content to help build
knowledge graphs from literature and your internal data & documents
Photo by Alexander Schimmeck on Unsplash
Questions?
• Can also be asked later: f.broek@elsevier.com
• LinkedIn: https://www.linkedin.com/in/frederik-van-den-broek/
By Malis - https://commons.wikimedia.org/w/index.php?curid=2633354
Appendix
Taxonomy vs. Ontology
Ontologies specify Taxonomies classify
https://stangarfield.medium.com/whats-the-difference-between-an-ontology-and-a-taxonomy-c8da7c56fbea
Marrying the concepts
• By introducing parent-child relationships or class hierarchies, an ontology
can be transformed into a taxonomy for e.g. taxonomy-powered searches
Image: https://enterprise-knowledge.com/from-taxonomy-to-ontology/

More Related Content

What's hot

Big data supporting drug discovery - cautionary tales from the world of chemi...
Big data supporting drug discovery - cautionary tales from the world of chemi...Big data supporting drug discovery - cautionary tales from the world of chemi...
Big data supporting drug discovery - cautionary tales from the world of chemi...Valery Tkachenko
 
Automate your literature monitoring for more effective pharmacovigilance
Automate your literature monitoring for more effective pharmacovigilanceAutomate your literature monitoring for more effective pharmacovigilance
Automate your literature monitoring for more effective pharmacovigilanceAnn-Marie Roche
 
Embase introduction - 13 July 2014
Embase introduction - 13 July 2014Embase introduction - 13 July 2014
Embase introduction - 13 July 2014Ann-Marie Roche
 
Finding the right medical device information in embase 11 2016
Finding the right medical device information in embase 11 2016Finding the right medical device information in embase 11 2016
Finding the right medical device information in embase 11 2016Ann-Marie Roche
 
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...open_phacts
 
Using Healthcare Data for Research @ The Hyve - Campus Party 2016
Using Healthcare Data for Research @ The Hyve - Campus Party 2016Using Healthcare Data for Research @ The Hyve - Campus Party 2016
Using Healthcare Data for Research @ The Hyve - Campus Party 2016Kees van Bochove
 
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...Kees van Bochove
 
operationalizing asthma analytic plan using omop cdm brandt
operationalizing asthma analytic plan using omop cdm brandtoperationalizing asthma analytic plan using omop cdm brandt
operationalizing asthma analytic plan using omop cdm brandtMarion Sills
 
Exploring Chemical and Biological Knowledge Spaces with PubChem
Exploring Chemical and Biological Knowledge Spaces with PubChemExploring Chemical and Biological Knowledge Spaces with PubChem
Exploring Chemical and Biological Knowledge Spaces with PubChemPaul Thiessen
 
In silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentIn silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentChris Southan
 
Exhaustive Literature Searching (Systematic Reviews)
Exhaustive Literature Searching (Systematic Reviews)Exhaustive Literature Searching (Systematic Reviews)
Exhaustive Literature Searching (Systematic Reviews)markmac
 
Health Datapalooza 2013: Datalab - Steven Edwards
Health Datapalooza 2013: Datalab - Steven EdwardsHealth Datapalooza 2013: Datalab - Steven Edwards
Health Datapalooza 2013: Datalab - Steven EdwardsHealth Data Consortium
 
How to Create a Big Data Culture in Pharma
How to Create a Big Data Culture in PharmaHow to Create a Big Data Culture in Pharma
How to Create a Big Data Culture in PharmaChris Waller
 
Toxicological information in PubChem
Toxicological information in PubChemToxicological information in PubChem
Toxicological information in PubChemSunghwan Kim
 
Open PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesOpen PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesSciBite Limited
 
How predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinarHow predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinarAnn-Marie Roche
 
Human Genome and Big Data Challenges
Human Genome and Big Data ChallengesHuman Genome and Big Data Challenges
Human Genome and Big Data ChallengesPhilip Bourne
 
Aiding Computer Aided Drug Design
Aiding Computer Aided Drug DesignAiding Computer Aided Drug Design
Aiding Computer Aided Drug DesignShahir Shamsir
 
cBioPortal Webinar Slides (3/3)
cBioPortal Webinar Slides (3/3)cBioPortal Webinar Slides (3/3)
cBioPortal Webinar Slides (3/3)Pistoia Alliance
 

What's hot (20)

Big data supporting drug discovery - cautionary tales from the world of chemi...
Big data supporting drug discovery - cautionary tales from the world of chemi...Big data supporting drug discovery - cautionary tales from the world of chemi...
Big data supporting drug discovery - cautionary tales from the world of chemi...
 
Automate your literature monitoring for more effective pharmacovigilance
Automate your literature monitoring for more effective pharmacovigilanceAutomate your literature monitoring for more effective pharmacovigilance
Automate your literature monitoring for more effective pharmacovigilance
 
Embase introduction - 13 July 2014
Embase introduction - 13 July 2014Embase introduction - 13 July 2014
Embase introduction - 13 July 2014
 
Finding the right medical device information in embase 11 2016
Finding the right medical device information in embase 11 2016Finding the right medical device information in embase 11 2016
Finding the right medical device information in embase 11 2016
 
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
 
Using Healthcare Data for Research @ The Hyve - Campus Party 2016
Using Healthcare Data for Research @ The Hyve - Campus Party 2016Using Healthcare Data for Research @ The Hyve - Campus Party 2016
Using Healthcare Data for Research @ The Hyve - Campus Party 2016
 
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
 
operationalizing asthma analytic plan using omop cdm brandt
operationalizing asthma analytic plan using omop cdm brandtoperationalizing asthma analytic plan using omop cdm brandt
operationalizing asthma analytic plan using omop cdm brandt
 
Exploring Chemical and Biological Knowledge Spaces with PubChem
Exploring Chemical and Biological Knowledge Spaces with PubChemExploring Chemical and Biological Knowledge Spaces with PubChem
Exploring Chemical and Biological Knowledge Spaces with PubChem
 
In silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentIn silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug Development
 
Exhaustive Literature Searching (Systematic Reviews)
Exhaustive Literature Searching (Systematic Reviews)Exhaustive Literature Searching (Systematic Reviews)
Exhaustive Literature Searching (Systematic Reviews)
 
Health Datapalooza 2013: Datalab - Steven Edwards
Health Datapalooza 2013: Datalab - Steven EdwardsHealth Datapalooza 2013: Datalab - Steven Edwards
Health Datapalooza 2013: Datalab - Steven Edwards
 
How to Create a Big Data Culture in Pharma
How to Create a Big Data Culture in PharmaHow to Create a Big Data Culture in Pharma
How to Create a Big Data Culture in Pharma
 
New developments in delivering public access to data from the National Center...
New developments in delivering public access to data from the National Center...New developments in delivering public access to data from the National Center...
New developments in delivering public access to data from the National Center...
 
Toxicological information in PubChem
Toxicological information in PubChemToxicological information in PubChem
Toxicological information in PubChem
 
Open PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesOpen PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future Challenges
 
How predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinarHow predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinar
 
Human Genome and Big Data Challenges
Human Genome and Big Data ChallengesHuman Genome and Big Data Challenges
Human Genome and Big Data Challenges
 
Aiding Computer Aided Drug Design
Aiding Computer Aided Drug DesignAiding Computer Aided Drug Design
Aiding Computer Aided Drug Design
 
cBioPortal Webinar Slides (3/3)
cBioPortal Webinar Slides (3/3)cBioPortal Webinar Slides (3/3)
cBioPortal Webinar Slides (3/3)
 

Similar to De-siloing data and building knowledge graphs outside of drug discovery: Opportunities and challenges (CINF 3667313, ACS National Meeting 2022-03-20)

Data Strategies: Metadata, Open Data, Linked Data
Data Strategies: Metadata, Open Data, Linked DataData Strategies: Metadata, Open Data, Linked Data
Data Strategies: Metadata, Open Data, Linked DataSemantic Web Company
 
Meeting the Computational Challenges Associated with Human Health
Meeting the Computational Challenges Associated with Human HealthMeeting the Computational Challenges Associated with Human Health
Meeting the Computational Challenges Associated with Human HealthPhilip Bourne
 
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...CINECAProject
 
Taking the mystery out of Big Data - Berlin - Feb 2014
Taking the mystery out of Big Data - Berlin - Feb 2014Taking the mystery out of Big Data - Berlin - Feb 2014
Taking the mystery out of Big Data - Berlin - Feb 2014Claus Stie Kallesøe
 
Artificial Intelligence in Life Sciences and Agriculture.
Artificial Intelligence in Life Sciences and Agriculture.Artificial Intelligence in Life Sciences and Agriculture.
Artificial Intelligence in Life Sciences and Agriculture.Yannick Djoumbou
 
Section 7.2Problem SolvingIncremental CreativityWhile mo.docx
Section 7.2Problem SolvingIncremental CreativityWhile mo.docxSection 7.2Problem SolvingIncremental CreativityWhile mo.docx
Section 7.2Problem SolvingIncremental CreativityWhile mo.docxbagotjesusa
 
SC1 - Hangout 2: The Open PHACTS pilot
SC1 - Hangout 2: The Open PHACTS pilotSC1 - Hangout 2: The Open PHACTS pilot
SC1 - Hangout 2: The Open PHACTS pilotBigData_Europe
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...Dr. Haxel Consult
 
Meaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchMeaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchNolan Nichols
 
Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Towards Knowledge Graph based Representation, Augmentation and Exploration of...Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Towards Knowledge Graph based Representation, Augmentation and Exploration of...Sören Auer
 
Automating fetal heart monitor using machine learning
Automating fetal heart monitor using machine learningAutomating fetal heart monitor using machine learning
Automating fetal heart monitor using machine learningTamjid Rayhan
 
How to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceHow to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceJuuso Parkkinen
 
Computational Thinking in Bioinformatics
Computational Thinking in BioinformaticsComputational Thinking in Bioinformatics
Computational Thinking in BioinformaticsWhizThinkers
 
CDISC Presentation
CDISC PresentationCDISC Presentation
CDISC Presentationhoot72
 
Hacking for fun during COVID19
Hacking for fun during COVID19Hacking for fun during COVID19
Hacking for fun during COVID19VGG Consulting
 
GLOBAL EDITIONGLOBAL EDITION GLOBAL ED
GLOBAL EDITIONGLOBAL EDITION GLOBAL EDGLOBAL EDITIONGLOBAL EDITION GLOBAL ED
GLOBAL EDITIONGLOBAL EDITION GLOBAL EDMatthewTennant613
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataPhilip Cheung
 
Improving pharmaceutical marketing using big data solutions
Improving pharmaceutical marketing using big data solutionsImproving pharmaceutical marketing using big data solutions
Improving pharmaceutical marketing using big data solutionsPaul Grant
 

Similar to De-siloing data and building knowledge graphs outside of drug discovery: Opportunities and challenges (CINF 3667313, ACS National Meeting 2022-03-20) (20)

Data Strategies: Metadata, Open Data, Linked Data
Data Strategies: Metadata, Open Data, Linked DataData Strategies: Metadata, Open Data, Linked Data
Data Strategies: Metadata, Open Data, Linked Data
 
Meeting the Computational Challenges Associated with Human Health
Meeting the Computational Challenges Associated with Human HealthMeeting the Computational Challenges Associated with Human Health
Meeting the Computational Challenges Associated with Human Health
 
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
 
Taking the mystery out of Big Data - Berlin - Feb 2014
Taking the mystery out of Big Data - Berlin - Feb 2014Taking the mystery out of Big Data - Berlin - Feb 2014
Taking the mystery out of Big Data - Berlin - Feb 2014
 
Artificial Intelligence in Life Sciences and Agriculture.
Artificial Intelligence in Life Sciences and Agriculture.Artificial Intelligence in Life Sciences and Agriculture.
Artificial Intelligence in Life Sciences and Agriculture.
 
Section 7.2Problem SolvingIncremental CreativityWhile mo.docx
Section 7.2Problem SolvingIncremental CreativityWhile mo.docxSection 7.2Problem SolvingIncremental CreativityWhile mo.docx
Section 7.2Problem SolvingIncremental CreativityWhile mo.docx
 
SC1 - Hangout 2: The Open PHACTS pilot
SC1 - Hangout 2: The Open PHACTS pilotSC1 - Hangout 2: The Open PHACTS pilot
SC1 - Hangout 2: The Open PHACTS pilot
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
 
Meaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchMeaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine research
 
2016-07-06-openphacts-docker
2016-07-06-openphacts-docker2016-07-06-openphacts-docker
2016-07-06-openphacts-docker
 
AMIA 2014
AMIA 2014AMIA 2014
AMIA 2014
 
Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Towards Knowledge Graph based Representation, Augmentation and Exploration of...Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Towards Knowledge Graph based Representation, Augmentation and Exploration of...
 
Automating fetal heart monitor using machine learning
Automating fetal heart monitor using machine learningAutomating fetal heart monitor using machine learning
Automating fetal heart monitor using machine learning
 
How to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceHow to Prepare for a Career in Data Science
How to Prepare for a Career in Data Science
 
Computational Thinking in Bioinformatics
Computational Thinking in BioinformaticsComputational Thinking in Bioinformatics
Computational Thinking in Bioinformatics
 
CDISC Presentation
CDISC PresentationCDISC Presentation
CDISC Presentation
 
Hacking for fun during COVID19
Hacking for fun during COVID19Hacking for fun during COVID19
Hacking for fun during COVID19
 
GLOBAL EDITIONGLOBAL EDITION GLOBAL ED
GLOBAL EDITIONGLOBAL EDITION GLOBAL EDGLOBAL EDITIONGLOBAL EDITION GLOBAL ED
GLOBAL EDITIONGLOBAL EDITION GLOBAL ED
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadata
 
Improving pharmaceutical marketing using big data solutions
Improving pharmaceutical marketing using big data solutionsImproving pharmaceutical marketing using big data solutions
Improving pharmaceutical marketing using big data solutions
 

Recently uploaded

Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptx
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptxBREEDING FOR RESISTANCE TO BIOTIC STRESS.pptx
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptxPABOLU TEJASREE
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantadityabhardwaj282
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxEran Akiva Sinbar
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2John Carlo Rollon
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 

Recently uploaded (20)

Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptx
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptxBREEDING FOR RESISTANCE TO BIOTIC STRESS.pptx
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptx
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are important
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 

De-siloing data and building knowledge graphs outside of drug discovery: Opportunities and challenges (CINF 3667313, ACS National Meeting 2022-03-20)

  • 1. CINF – ACS National Meeting – 20 March 2022 Dr Frederik van den Broek – Elsevier Professional Services De-siloing data and building knowledge graphs outside of drug discovery: Opportunities and challenges • A Quest for the Holy Grail?
  • 3. “There are […] known knowns, […] known unknowns, […] and unknown unknowns” Also see: https://en.wikipedia.org/wiki/There_are_known_knowns https://www.youtube.com/watch?v=REWeBzGuzCc
  • 4. General problem with data https://sgnm.nl/wp-content/uploads/2019/11/datasilos.jpg
  • 5. Knowledge or information / data silos https://www.gene.com/scientists/our-scientists/dana-caulder
  • 6. Linking data to retrieve insights • Getting data sets out of a silo into a single data warehouse / data lake / knowledgebase is not enough • As long as the sets are not connected and/or normalised, you will have only turned data silos into data islands • So the challenge will be to build bridges between the islands Image: https://commons.wikimedia.org/wiki/File:ORESUNDBRIDGE_WIDE.jpg
  • 7. Example (knowledge) project question: • Consumers associate a cool sensation in the mouth with a “fresh” feeling • How can we have that cool sensation from a toothpaste or mouthwash?
  • 8. “Cooling sensation knowledge” Fresh feeling Cooling sensation Is associated with Consumer perception knowledge base
  • 10. “Cooling sensation knowledge” Fresh feeling Cooling sensation TRPM8 protein 1374760-96-9 N-ethyl-N-(thiophen- 2-ylmethyl)-2-(p- tolyloxy)acetamide Binds to Has chemical name Has CAS number Chemical & Bioactivity knowledge base
  • 11. “Cooling sensation knowledge” Fresh feeling Cooling sensation TRPM8 protein 1374760-96-9 N-ethyl-N-(thiophen- 2-ylmethyl)-2-(p- tolyloxy)acetamide Is associated with Increases Binds to Has chemical name Has CAS number Consumer perception knowledge base Biology knowledge base Chemical & Bioactivity knowledge base
  • 12. “Cooling sensation knowledge” Fresh feeling Cooling sensation TRPM8 protein 1374760-96-9 N-ethyl-N-(thiophen- 2-ylmethyl)-2-(p- tolyloxy)acetamide Is associated with Increases Binds to Has chemical name Has CAS number Consumer perception knowledge base Biology knowledge base Chemical & Bioactivity knowledge base Linking will only be possible if concepts are mapped across the siloes
  • 13. “Cooling sensation knowledge graph” Fresh feeling Cooling sensation TRPM8 protein 1374760-96-9 N-ethyl-N-(thiophen- 2-ylmethyl)-2-(p- tolyloxy)acetamide Is associated with Increases Binds to Has chemical name Has CAS number “Traversing” the graph allows logical inference for retrieving implicit knowledge from data that could have come from various sources
  • 14. Knowledge graphs can be created from all kinds of data sources From: https://www.stardog.com/blog/what-is-a-knowledge-graph/
  • 15. So how do we get to a knowledge graph? Internal Data Search AI/ML Knowledge Graphs Analytics Enterprise search
  • 16. So how do we get to a knowledge graph? Internal Data Search AI/ML Knowledge Graphs Analytics Enterprise search Map and normalise concepts with / into ontologies Named Entity Recognition
  • 17. The need to map and normalise concepts From: https://www.panadol.com/de-ch/products/adult-product/other-pain-n-fever/panadol-s-with-optizorb.html How to represent this concept?
  • 18. The need to map and normalise concepts Paracetamol Acetaminophen N-acetyl-para- aminophenol Tylenol CAS: 103-90-2 Panadol InChI=1S/C8H9NO2/c1-6(10)9-7-2- 4-8(11)5-3-7/h2-5,11H,1H3,(H,9,10) CC(=O)Nc1ccc(cc1)O DrugBank: DB00316 Ingredients Active ingredient: Paracetamol 500 mg. Also contains : Pregelatinised starch, calcium carbonate, alginic acid, crospovidone, povidone, magnesium stearate, colloidal anhydrous silica and sodium methyl (E 219), sodium ethyl (E 215), and sodium propyl (E 217) parahydroxybenzoates.
  • 19. The need to map and normalise concepts Paracetamol Acetaminophen N-acetyl-para- aminophenol Tylenol CAS: 103-90-2 Panadol InChI=1S/C8H9NO2/c1-6(10)9-7-2- 4-8(11)5-3-7/h2-5,11H,1H3,(H,9,10) CC(=O)Nc1ccc(cc1)O DrugBank: DB00316 Ingredients Active ingredient: Paracetamol 500 mg. Also contains : Pregelatinised starch, calcium carbonate, alginic acid, crospovidone, povidone, magnesium stearate, colloidal anhydrous silica and sodium methyl (E 219), sodium ethyl (E 215), and sodium propyl (E 217) parahydroxybenzoates.
  • 20. The need to map and normalise concepts Paracetamol Acetaminophen N-acetyl-para- aminophenol Tylenol CAS: 103-90-2 Panadol InChI=1S/C8H9NO2/c1-6(10)9-7-2- 4-8(11)5-3-7/h2-5,11H,1H3,(H,9,10) CC(=O)Nc1ccc(cc1)O DrugBank: DB00316 Ingredients Active ingredient: Paracetamol 500 mg. Also contains : Pregelatinised starch, calcium carbonate, alginic acid, crospovidone, povidone, magnesium stearate, colloidal anhydrous silica and sodium methyl (E 219), sodium ethyl (E 215), and sodium propyl (E 217) parahydroxybenzoates. Names (Common) Identifiers Chemical structures Formulations Trade names
  • 21. Trying to put concepts and synonyms in a structure
  • 22. Named Entity Recognition From: https://research.zalando.com/welcome/mission/research-projects/flair-nlp/ The task of identifying and categorizing key information (entities) in text
  • 23. Named Entity Recognition From: https://research.zalando.com/welcome/mission/research-projects/flair-nlp/ The task of identifying and categorizing key information (entities) in text The context also matters!
  • 24. When to use which concept? • Use different ontologies / taxonomies for different contexts • But when the ontologies are mapped and linked, the knowledge graph will follow….
  • 25. Is this Knowledge Graph thing just another hype? In 2019… https://www.gartner.com/smarterwithgartner/top-trends-on-the-gartner-hype-cycle-for-artificial-intelligence-2019/
  • 26. Is this Knowledge Graph thing just another hype? In 2021… https://www.gartner.com/en/articles/the-4-trends-that-prevail-on-the-gartner-hype-cycle-for-ai-2021
  • 27. Is this Knowledge Graph thing just another hype? In 2021… Creating a knowledge graph builds on tools and output from Semantic Search https://www.gartner.com/en/articles/the-4-trends-that-prevail-on-the-gartner-hype-cycle-for-ai-2021
  • 28. Daily use of a knowledge graph
  • 29. Daily use of a knowledge graph • Knowledge Panel next to the Google search results is powered by a knowledge graph • Information gathered from a variety of sources • Also powers Google Assistant and Google Home For more details, see: https://support.google.com/knowledgepanel/answer/9787176?hl=en https://en.wikipedia.org/wiki/Google_Knowledge_Graph
  • 30. So can I then do Machine Learning? https://xkcd.com/1838
  • 31. Special branch of Machine Learning for graphs • Social networks analytics • Traffic network prediction • Recommender systems • NLP, text classification • Chemical reaction prediction https://doi.org/10.1021/acsomega.1c04017 https://doi.org/10.1093/bib/bbab159 https://doi.org/10.3389/fgene.2021.690049
  • 32. Main challenges outside of drug discovery • For pharma and life sciences there are many established and publicly available ontologies https://www.ebi.ac.uk/ols/index
  • 33. Main challenges outside of drug discovery • For pharma and life sciences there are many established and publicly available ontologies • Directly adjacent fields might make use of some life science ontologies • For other fields (e.g. polymers) ontologies/taxonomies will need to be created, which will require time, effort and some FAIRness https://www.go-fair.org/fair-principles/ Findable Accessible Interoperable Reusable
  • 34. It can be done though… • Created a small-ish taxonomy for petroleum engineering subject of overpressure mechanisms Open Access conference paper: https://doi.org/10.3997/2214-4609.202113138
  • 35. Take-home messages • Getting data sets out of a silo into a single data warehouse / data lake / knowledgebase is not enough • The need to map and normalise concepts (ontologies / taxonomies) • When the ontologies are mapped and linked, the knowledge graph will follow…. • Use of Named Entity Recognition to extract from (semi-)structured data • Biggest effort will be creating ontologies in fields adjacent to life sciences, but Elsevier / Scibite have the expertise, technology and content to help build knowledge graphs from literature and your internal data & documents Photo by Alexander Schimmeck on Unsplash
  • 36. Questions? • Can also be asked later: f.broek@elsevier.com • LinkedIn: https://www.linkedin.com/in/frederik-van-den-broek/ By Malis - https://commons.wikimedia.org/w/index.php?curid=2633354
  • 38. Taxonomy vs. Ontology Ontologies specify Taxonomies classify https://stangarfield.medium.com/whats-the-difference-between-an-ontology-and-a-taxonomy-c8da7c56fbea
  • 39. Marrying the concepts • By introducing parent-child relationships or class hierarchies, an ontology can be transformed into a taxonomy for e.g. taxonomy-powered searches Image: https://enterprise-knowledge.com/from-taxonomy-to-ontology/