Slides from my talk at the ACS CINF Symposium on Expanding cheminformatics to adjacent industries on 20 March 2022 in San Diego & online.
Abstract:
Extracting structured data from literature, patents and documents using various text mining techniques and technologies is an established part of the process of building knowledge repositories for drug discovery. There is an increasing interest from research areas outside of drug discovery to make use of such techniques to de-silo data and build knowledge graphs, e.g. for chemical manufacturing, consumer products, and other fields. This talk will give an overview of the opportunities and challenges faced in building up systems to extract the data to build semantic searches and knowledge graphs.
Similar to De-siloing data and building knowledge graphs outside of drug discovery: Opportunities and challenges (CINF 3667313, ACS National Meeting 2022-03-20)
Improving pharmaceutical marketing using big data solutionsPaul Grant
Similar to De-siloing data and building knowledge graphs outside of drug discovery: Opportunities and challenges (CINF 3667313, ACS National Meeting 2022-03-20) (20)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
De-siloing data and building knowledge graphs outside of drug discovery: Opportunities and challenges (CINF 3667313, ACS National Meeting 2022-03-20)
1. CINF – ACS National Meeting – 20 March 2022
Dr Frederik van den Broek – Elsevier Professional Services
De-siloing data and building
knowledge graphs outside of
drug discovery: Opportunities
and challenges
• A Quest for the Holy Grail?
3. “There are […]
known knowns, […]
known unknowns, […] and
unknown unknowns”
Also see:
https://en.wikipedia.org/wiki/There_are_known_knowns
https://www.youtube.com/watch?v=REWeBzGuzCc
4. General problem with data
https://sgnm.nl/wp-content/uploads/2019/11/datasilos.jpg
5. Knowledge or information / data silos
https://www.gene.com/scientists/our-scientists/dana-caulder
6. Linking data to retrieve insights
• Getting data sets out of a silo into a single data warehouse / data lake /
knowledgebase is not enough
• As long as the sets are not connected and/or normalised, you will have
only turned data silos into data islands
• So the challenge will be to build bridges between the islands
Image: https://commons.wikimedia.org/wiki/File:ORESUNDBRIDGE_WIDE.jpg
7. Example (knowledge) project question:
• Consumers associate a cool sensation in the mouth with a “fresh” feeling
• How can we have that cool sensation from a toothpaste or mouthwash?
12. “Cooling sensation knowledge”
Fresh
feeling
Cooling
sensation
TRPM8
protein
1374760-96-9 N-ethyl-N-(thiophen-
2-ylmethyl)-2-(p-
tolyloxy)acetamide
Is associated with
Increases
Binds to
Has chemical name
Has CAS
number
Consumer perception
knowledge base
Biology knowledge base
Chemical &
Bioactivity
knowledge
base
Linking will only be possible if concepts are
mapped across the siloes
13. “Cooling sensation knowledge graph”
Fresh
feeling
Cooling
sensation
TRPM8
protein
1374760-96-9 N-ethyl-N-(thiophen-
2-ylmethyl)-2-(p-
tolyloxy)acetamide
Is associated with
Increases
Binds to
Has chemical name
Has CAS
number
“Traversing” the graph allows logical inference
for retrieving implicit knowledge from data that
could have come from various sources
14. Knowledge graphs can be created from all kinds
of data sources
From: https://www.stardog.com/blog/what-is-a-knowledge-graph/
15. So how do we get to a knowledge graph?
Internal Data
Search
AI/ML
Knowledge Graphs
Analytics
Enterprise search
16. So how do we get to a knowledge graph?
Internal Data
Search
AI/ML
Knowledge Graphs
Analytics
Enterprise search
Map and normalise
concepts with / into
ontologies
Named Entity
Recognition
17. The need to map and normalise concepts
From: https://www.panadol.com/de-ch/products/adult-product/other-pain-n-fever/panadol-s-with-optizorb.html
How to represent this concept?
18. The need to map and normalise concepts
Paracetamol
Acetaminophen
N-acetyl-para-
aminophenol
Tylenol
CAS: 103-90-2
Panadol
InChI=1S/C8H9NO2/c1-6(10)9-7-2-
4-8(11)5-3-7/h2-5,11H,1H3,(H,9,10)
CC(=O)Nc1ccc(cc1)O
DrugBank:
DB00316
Ingredients
Active ingredient: Paracetamol 500 mg.
Also contains : Pregelatinised starch, calcium carbonate,
alginic acid, crospovidone, povidone, magnesium stearate,
colloidal anhydrous silica and sodium methyl (E 219),
sodium ethyl (E 215), and sodium propyl (E 217)
parahydroxybenzoates.
19. The need to map and normalise concepts
Paracetamol
Acetaminophen
N-acetyl-para-
aminophenol
Tylenol
CAS: 103-90-2
Panadol
InChI=1S/C8H9NO2/c1-6(10)9-7-2-
4-8(11)5-3-7/h2-5,11H,1H3,(H,9,10)
CC(=O)Nc1ccc(cc1)O
DrugBank:
DB00316
Ingredients
Active ingredient: Paracetamol 500 mg.
Also contains : Pregelatinised starch, calcium carbonate,
alginic acid, crospovidone, povidone, magnesium stearate,
colloidal anhydrous silica and sodium methyl (E 219),
sodium ethyl (E 215), and sodium propyl (E 217)
parahydroxybenzoates.
20. The need to map and normalise concepts
Paracetamol
Acetaminophen
N-acetyl-para-
aminophenol
Tylenol
CAS: 103-90-2
Panadol
InChI=1S/C8H9NO2/c1-6(10)9-7-2-
4-8(11)5-3-7/h2-5,11H,1H3,(H,9,10)
CC(=O)Nc1ccc(cc1)O
DrugBank:
DB00316
Ingredients
Active ingredient: Paracetamol 500 mg.
Also contains : Pregelatinised starch, calcium carbonate,
alginic acid, crospovidone, povidone, magnesium stearate,
colloidal anhydrous silica and sodium methyl (E 219),
sodium ethyl (E 215), and sodium propyl (E 217)
parahydroxybenzoates.
Names
(Common) Identifiers
Chemical
structures
Formulations
Trade names
22. Named Entity Recognition
From: https://research.zalando.com/welcome/mission/research-projects/flair-nlp/
The task of identifying and categorizing key information (entities) in text
23. Named Entity Recognition
From: https://research.zalando.com/welcome/mission/research-projects/flair-nlp/
The task of identifying and categorizing key information (entities) in text
The context also matters!
24. When to use which concept?
• Use different ontologies / taxonomies for different contexts
• But when the ontologies are mapped and linked, the knowledge graph will
follow….
25. Is this Knowledge Graph thing just another hype?
In 2019…
https://www.gartner.com/smarterwithgartner/top-trends-on-the-gartner-hype-cycle-for-artificial-intelligence-2019/
26. Is this Knowledge Graph thing just another hype?
In 2021…
https://www.gartner.com/en/articles/the-4-trends-that-prevail-on-the-gartner-hype-cycle-for-ai-2021
27. Is this Knowledge Graph thing just another hype?
In 2021…
Creating a knowledge
graph builds on tools
and output from
Semantic Search
https://www.gartner.com/en/articles/the-4-trends-that-prevail-on-the-gartner-hype-cycle-for-ai-2021
29. Daily use of a knowledge graph
• Knowledge Panel next
to the Google search
results is powered by a
knowledge graph
• Information gathered
from a variety of
sources
• Also powers Google
Assistant and Google
Home
For more details, see:
https://support.google.com/knowledgepanel/answer/9787176?hl=en
https://en.wikipedia.org/wiki/Google_Knowledge_Graph
30. So can I then do Machine Learning?
https://xkcd.com/1838
31. Special branch of Machine Learning for graphs
• Social networks analytics
• Traffic network prediction
• Recommender systems
• NLP, text classification
• Chemical reaction prediction
https://doi.org/10.1021/acsomega.1c04017
https://doi.org/10.1093/bib/bbab159
https://doi.org/10.3389/fgene.2021.690049
32. Main challenges outside of drug discovery
• For pharma and life sciences there are many established and publicly
available ontologies
https://www.ebi.ac.uk/ols/index
33. Main challenges outside of drug discovery
• For pharma and life sciences there are many established and publicly
available ontologies
• Directly adjacent fields might make use of some life science ontologies
• For other fields (e.g. polymers) ontologies/taxonomies will need to be
created, which will require time, effort and some FAIRness
https://www.go-fair.org/fair-principles/
Findable
Accessible
Interoperable
Reusable
34. It can be done though…
• Created a small-ish taxonomy for petroleum engineering subject of
overpressure mechanisms
Open Access conference paper:
https://doi.org/10.3997/2214-4609.202113138
35. Take-home messages
• Getting data sets out of a silo into a single data
warehouse / data lake / knowledgebase is not enough
• The need to map and normalise concepts (ontologies / taxonomies)
• When the ontologies are mapped and linked, the knowledge graph will
follow….
• Use of Named Entity Recognition to extract from (semi-)structured data
• Biggest effort will be creating ontologies in fields adjacent to life sciences, but
Elsevier / Scibite have the expertise, technology and content to help build
knowledge graphs from literature and your internal data & documents
Photo by Alexander Schimmeck on Unsplash
36. Questions?
• Can also be asked later: f.broek@elsevier.com
• LinkedIn: https://www.linkedin.com/in/frederik-van-den-broek/
By Malis - https://commons.wikimedia.org/w/index.php?curid=2633354
38. Taxonomy vs. Ontology
Ontologies specify Taxonomies classify
https://stangarfield.medium.com/whats-the-difference-between-an-ontology-and-a-taxonomy-c8da7c56fbea
39. Marrying the concepts
• By introducing parent-child relationships or class hierarchies, an ontology
can be transformed into a taxonomy for e.g. taxonomy-powered searches
Image: https://enterprise-knowledge.com/from-taxonomy-to-ontology/