SlideShare a Scribd company logo
KNOWLEDGE GRAPH CONSTRUCTION
FOR RESEARCH & MEDICINE
Paul Groth (@pgroth)
pgroth.com
Disruptive Technology Director
Elsevier Labs (@elsevierlabs)
Connected Data London 2017
Contributions: Brad Allen, Pascal Coupet, Sujit Pal, Craig Stanley, Ron Daniel, Alex de Jong
Our customers are facing challenges in
science and health
1. Industrial Research Institute 2. The Lancet 3. Tufts 4. World Health Organization
Elsevier is in a unique position to make a contribution
towards solving these challenges
Life-saving drugs are expensive to develop.3
Global research spend is growing every year.1
3.4%
from 2015
Predicted spend
$1.9TN
research in 2016
Studies:
70-80% of
research asks the
wrong questions
or cannot be
reproduced
Researchers lack the tools they need to be
effective.2
Preventable medicalerrors:
Third largest cause of death in theUS
Health providers cannot save lives without the best
information.4
$2.5BN
median pharmaceutical
spend per drug
1/20
successrate
of drugs
Heart
Disease
611k
Cancer
585k
Medical
Error
225k 149k
Respiratory
Illness
ELSEVIER’S BUSINESS: PROVIDING ANSWERS FOR
RESEARCHERS, DOCTORS AND NURSES
My work is moving towards a new field; what should I know?
• Journal articles, reference works, profiles of researchers, funders &
institutions
• Recommendations of people to connect with, reading lists, topic pages
How should I treat my patient given her condition & history?
• Journal articles, reference works, medical guidelines, electronic health
records
• Treatment plan with alternatives personalized for the patient
How can I master the subject matter of the course I am taking?
• Course syllabus, reference works, course objectives, student history
• Quiz plan based on the student’s history and course objectives
THE ROLE OF METADATA IN THE SECOND MACHINE AGE – DC-2016 / KØBENHAVN / 13 OCTOBER
ANSWERS ARE ABOUT THINGS, NOT JUST WORKS
Why shouldn’t a search on an author return
information about the author, including the
author’s works? Where was the author born,
when did she live, what is she known for? … All of
this is possible, but only if we can make some
fundamental changes in our approach to
bibliographic description. ... The challenge for us
lies in transforming what we can of our data into
interrelated “things” without overindulging that
metaphor.
Coyle, K. (2016). FRBR, before and after: a look at our
bibliographical models. Chicago: ALA Editions.
THE ROLE OF METADATA IN THE SECOND MACHINE AGE – DC-2016 / KØBENHAVN / 13 OCTOBER
KNOWLEDGE GRAPHS DEFINED
• Knowledge graphs are "graph structured knowledge bases (KBs) which store factual
information in form of relationships between entities” (Nickel, M., Murphy, K., Tresp, V. and
Gabrilovich, E. (2015). A review of relational machine learning for knowledge graphs.
arXiv:1503.00759v3)
• Knowledge graphs are metadata evolved beyond the focus on the work, linking people, concepts,
things and events
• Knowledge Graphs are focused on things to provide answers
THE ROLE OF METADATA IN THE SECOND MACHINE AGE – DC-2016 / KØBENHAVN / 13 OCTOBER
ELSEVIER’S KNOWLEDGE PLATFORM
Products
Data & Content
Sources
Knowledge
Graphs
Platforms &
Shared Services
Entity Hubs
Usage logs Pathways EHRsArticles Authors Institutions
SyllabiCitations ChemicalsBooks DrugsFunders
Funder Hub Article HubProfile Hub Journal Hub Institution Hub
Research HealthcareLife Sciences
Content Life Sciences Search IdentityResearch
Reaxys CK SherpathScopus SD ROS
THE ROLE OF METADATA IN THE SECOND MACHINE AGE – DC-2016 / KØBENHAVN / 13 OCTOBER
THE GROWTH OF SCIENCE COMPLICATES OUR EFFORTS
MORE DOMAINS & MORE SPECIFICITY
Gregory, K., Groth, P., Cousijn, H., Scharnhorst, A.,
& Wyatt, S. (2017). Searching Data: A Review of
Observational Data Retrieval Practices. arXiv
preprint arXiv:1707.06937.
Some observations from @gregory_km
survey:
1. The needs and behaviours of specific user groups
(e.g. early career researchers, policy makers,
students) are not well documented.
2. Background uses of observational data are better
documented than foreground uses.
3. Reconstructing data tables from journal articles,
using general search engines, and making direct data
requests are common.
MACHINE READING
TOPIC PAGES
Definition
Related
terms
Relevant
ranked
snippets
HOW - OVERVIEW
Content
Books, Articles, Ontologies ...
• Identification of concepts
• Disambiguation
• Domain/sub-domain
identification
• Abbreviations,
variants
• Gazeteering
• Identification and
classification of text snippets
around concepts
• Features building for
concept/snippet pairs
• Lexical, syntactic,
semantic, doc
structure …
• Ranking concept snippet pairs
• Machine learning
• Hand made rules
• Similarities
• Deduplication
Technologies
NLP, ML
• Curation
• White list driven
• Black list
• Corrections/improve
ments
• Evaluation
• Gold set by domain
• Random set by
domain
• By SMEs (Subject
Matter Experts)
• Automation
• Content Enrichment
Framework
• Taxonomy coverage
extension
Knowledge Graph
Concepts, snippets, meta data, …
| 16
OmniScience
Neuros
cience
Extension vocabularies by domains to provide coverage
Number of Concepts Number of Labels
OmniScience 01.16.11 45969 47421
OmniScience Neuroscience branch 21/11/2016 2356 2455
OmniScience Extension Neuroscience branch 21/11/2016 23932 101276
| 17
Concept Bad Good
Inferior Colliculus
By comparing activation obtained in an equivalent
standard ( non-cardiac-gated ) fMRI experiment ,
Guimaraes and colleagues found that cardiac-
gated activation maps yielded much greater
activation in subcortical nuclei , such as the
inferior colliculus .
The inferior colliculus (IC) is part of the tectum of the midbrain (mesencephalon) comprising the quadrigeminal
plate (Lamina quadrigemina). It is located caudal to the superior colliculus on the dorsal surface of the
mesencephalon ( Figure 36.7 FIGURE 36.7Overview of the human brainstem; view from dorsal. The superior and
inferior colliculi form the quadrigeminal plate. Parts of the cerebellum are removed.). The ventral border is
formed by the lateral lemniscus. The inferior colliculus is the largest nucleus of the human auditory system. …
Purkinje cells
It is felt that the aminopyridines are likely to
increase the excitability of the potassium channel-
rich cerebellar Purkinje cells in the flocculus (
Etzion and Grossman , 2001 ) .
Purkinje cells are the most salient cellular elements of the cerebellar cortex. They are arranged in a single row
throughout the entire cerebellar cortex between the molecular (outer) layer and the granular (inner) layer. They
are among the largest neurons and have a round perikaryon, classically described as shaped “like a chianti
bottle,” with a highly branched dendritic tree shaped like a candelabrum and extending into the molecular layer
where they are contacted by incoming systems of afferent fibers from granule neurons and the brainstem…
Olfactory Bulb
The most common sites used for induction of
kindling include the amygdala, perforant path ,
dorsal hippocampus , olfactory bulb , and
perirhinal cortex.
The olfactory bulb is the first relay station of the central olfactory system in the vertebrate brain and contains in
its superficial layer a few thousand glomeruli, spherical neuropils with sharp borders ( Figure 1 Figure 1Axonal
projection pattern of olfactory sensory neurons to the glomeruli of the rodent olfactory bulb. The olfactory
epithelium in rats and mice is divided into four zones (zones 1–4). A given odorant receptor is expressed by
sensory neurons located within one zone of the epithelium. Individual olfactory sensory neurons express a single
odorant receptor…
Examples of good and bad snippets
19
One Weird Trick from Natural Language Processing (NLP)
• Knowledge bases are populated by scanning text and doing Information Extraction
• Most information extraction systems are looking for very specific things, like drug-drug interactions
• Best accuracy for that one kind of data, but misses out on all the other concepts and relations in the text
• For broad knowledge base, use Open Information Extraction that only uses some knowledge of grammar
• The weird trick for open information extraction … a simple algorithm, known as ReVerb*:
1. Find “relation phrases” starting with a verb and ending with a verb or preposition
2. Find noun phrases before and after the relation phrase
3. Discard relation phrases not used with multiple combinations of arguments.
In addition, brain scans were performed to exclude
other causes of dementia.
* Fader et al. Identifying Relations for Open Information Extraction
20
ReVerb output
# SD Documents Scanned 14,000,000
Extracted ReVerb Triples 473,350,566
21
Universal schemas – Predict ‘missing‘ KG facts
• Make a matrix:
• columns for the relation phrases
from ReVerb or the semantic
relations from EMMeT
• rows are the pairs of concepts
linked by a relation
• A ‘1.0’ in a cell if those concepts
were linked by that relation
• Outlined cells in diagram
are the ones initialized to
1.
• Factorize matrix to ExK and KxR, then
recombine.
• “Learns” the correlations between text
relations and EMMeT relations, in the
context of pairs of objects.
• Cells going from 0 to > 0
indicates potential.
• Find new triples to go into EMMeT e.g.,
(glaucoma, has_alternativeProcedure,
biofeedback)
BUILDING
KNOWLEDGE GRAPHS
FROM DATA
23
Medical Graph – Statistical correlations at scale
I65
Occlusion and stenosis
of precerebral arteries
G40
Epilepsy
has_successor
I61
C71
Malignant neoplasm
of brain
odds ratio: 1.12
intracerebral
hemorrhage has_successor criteria1:
• Correlation selected by
preditive modeling
algorithmus
• No. of relations is higher
than in mirrored relation
• p-value < 0,05
• Odds ratios balanced over
all covariates.
1 Criteria based on: Jensen et.al.: Temporal disease trajectories condensed from population-wide registry data
covering 6.2 million patients. Nature Communications, 2014 Jun 24 ;5:4022. doi: 10.1038/ncomms5022.
Other
covariates
Primary care
Secondary care
Drug prescriptions
5m patients
each 6 years longitudinality
24
Medical Graph in practice, patient 35: risk of depression
• 49 year old man
• Dx: overweight,
diabetes,
hypertension,
anxiety disorder
 has an absolute
risk of 36% to
develop a
depression within
the next 4 years
25
… and rationale of why model thinks this
26
• Targets for prediction: ICD-coded diagnoses
• Only incident patients per diagnose considered, i.e. diagnosis-free 2009 – 2010
• if these patients remain diagnosis-free 2011 - 2014 (observation period), then 0 else 1
• Covariates: all ICD-/ATC-codes, age and sex measured in 2010
Example: Model to predict „I50 – Heart Failure“
26
Analysis Design
Predict 4 year long-term effects, balanced for all co-variables
I50 -
I50 free
patients
2009 2010
time
I50 -
(coded
as 0)
I50 +
(coded
as1)
2011 2014
Covariates
Remaining I50 free patients/ newly I50 diagnosed patients
27
27
(A) integrate & clean
Research on anonymized claims data
Primary care
Secondary care
Drug prescriptions
Other data
Visits & diagnoses
Visits, diagnoses &
procedures
Drug presciptions
Further cooperations just started
Will enable analysis of vital and laboratory parameters
Data integration
& cleaning
• Data cleaning
• Longitudinally linked &
integrated for analytics
• Anonymized
6 Mio patients
6 years
> 1.5b events
Billing data flow
60+ sickness funds
28
Technology
stack
feature
extraction
For 3.8m patients:
• age, gender
• all diagnoses: ICD10-coded, 3 digits, i.e. 2054 codes
• all medications: ATC-coded, 5 digits, i.e. 906 codes
• death, hospitalization
Results in: 6277 features
• 1623 targets, 2011-2014
• 2320 covariates, 2010
• 2334 filter-columns, 2009-2010
data mining Calculate prevalence, incidence, mean age for all covariates (i.e. diseases
and medications)
machine
learning
Predictive modelling for ~1600 targets
• Linear classification model, resulting in odds ratios
• Calculation of p-values
(B) mine & learn
Calculate statistics & build prediction models for ~1600 targets
KNOWLEDGE GRAPHS
AS A BACKBONE
29
• Total concepts = 540,632
• 100+ person years of clinical
expert knowledge
EMMeT Ontology
EXPANDING TO INCLUDE IMAGE SNIPPETS
33
Enrichment though integration using linked data
CONCLUSION
• Knowledge graphs are critical components for delivering customer value
• AI techniques such as machine learning and predictive modelling from data are key parts of
knowledge graph construction
• This is particularly the case as the amount, speed and specificity of data and requirements
accelerates
• Leveraging existing assets such as ontologies, data, and controlled (i.e. connected data) have
been key assets for Elsevier in the build out of knowledge graphs
• Another talk is how all this enables intelligent based solutions
• Oh and we are hiring 
• Paul Groth p.groth@elsevier.com

More Related Content

What's hot

Hackolade Tutorial - part 13 - Leverage a Polyglot data model
Hackolade Tutorial - part 13 - Leverage a Polyglot data modelHackolade Tutorial - part 13 - Leverage a Polyglot data model
Hackolade Tutorial - part 13 - Leverage a Polyglot data model
PascalDesmarets1
 
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
Neo4j
 
Introduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AIIntroduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AI
Semantic Web Company
 
Migration From Oracle to PostgreSQL
Migration From Oracle to PostgreSQLMigration From Oracle to PostgreSQL
Migration From Oracle to PostgreSQL
PGConf APAC
 
Data Analytics Strategy
Data Analytics StrategyData Analytics Strategy
Data Analytics StrategyeHealthCareers
 
Data Migration Strategies PowerPoint Presentation Slides
Data Migration Strategies PowerPoint Presentation SlidesData Migration Strategies PowerPoint Presentation Slides
Data Migration Strategies PowerPoint Presentation Slides
SlideTeam
 
Building a Data Strategy Your C-Suite Will Support
Building a Data Strategy Your C-Suite Will SupportBuilding a Data Strategy Your C-Suite Will Support
Building a Data Strategy Your C-Suite Will Support
Reid Colson
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
LibbySchulze
 
Introduction to Knowledge Graphs
Introduction to Knowledge GraphsIntroduction to Knowledge Graphs
Introduction to Knowledge Graphs
mukuljoshi
 
Exadata Cloud Service Overview(v2)
Exadata Cloud Service Overview(v2) Exadata Cloud Service Overview(v2)
Exadata Cloud Service Overview(v2)
Okcan Yasin Saygılı
 
Successful Content Management Through Taxonomy And Metadata Design
Successful Content Management Through Taxonomy And Metadata DesignSuccessful Content Management Through Taxonomy And Metadata Design
Successful Content Management Through Taxonomy And Metadata Design
sarakirsten
 
Graph-Based Customer Journey Analytics with Neo4j
Graph-Based Customer Journey Analytics with Neo4jGraph-Based Customer Journey Analytics with Neo4j
Graph-Based Customer Journey Analytics with Neo4j
Neo4j
 
Alteryx investor presentation
Alteryx investor presentationAlteryx investor presentation
Alteryx investor presentation
alteryxinvestor
 
Machine Learning and AI at Oracle
Machine Learning and AI at OracleMachine Learning and AI at Oracle
Machine Learning and AI at Oracle
Sandesh Rao
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Driven Possibilities with Qlik
Data Driven Possibilities with QlikData Driven Possibilities with Qlik
Data Driven Possibilities with Qlik
Mischa van Werkhoven
 
Slides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property GraphsSlides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property Graphs
DATAVERSITY
 
[EPPG] Oracle to PostgreSQL, Challenges to Opportunity
[EPPG] Oracle to PostgreSQL, Challenges to Opportunity[EPPG] Oracle to PostgreSQL, Challenges to Opportunity
[EPPG] Oracle to PostgreSQL, Challenges to Opportunity
Equnix Business Solutions
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Denodo
 
Migrating Oracle database to PostgreSQL
Migrating Oracle database to PostgreSQLMigrating Oracle database to PostgreSQL
Migrating Oracle database to PostgreSQL
Umair Mansoob
 

What's hot (20)

Hackolade Tutorial - part 13 - Leverage a Polyglot data model
Hackolade Tutorial - part 13 - Leverage a Polyglot data modelHackolade Tutorial - part 13 - Leverage a Polyglot data model
Hackolade Tutorial - part 13 - Leverage a Polyglot data model
 
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
 
Introduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AIIntroduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AI
 
Migration From Oracle to PostgreSQL
Migration From Oracle to PostgreSQLMigration From Oracle to PostgreSQL
Migration From Oracle to PostgreSQL
 
Data Analytics Strategy
Data Analytics StrategyData Analytics Strategy
Data Analytics Strategy
 
Data Migration Strategies PowerPoint Presentation Slides
Data Migration Strategies PowerPoint Presentation SlidesData Migration Strategies PowerPoint Presentation Slides
Data Migration Strategies PowerPoint Presentation Slides
 
Building a Data Strategy Your C-Suite Will Support
Building a Data Strategy Your C-Suite Will SupportBuilding a Data Strategy Your C-Suite Will Support
Building a Data Strategy Your C-Suite Will Support
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
 
Introduction to Knowledge Graphs
Introduction to Knowledge GraphsIntroduction to Knowledge Graphs
Introduction to Knowledge Graphs
 
Exadata Cloud Service Overview(v2)
Exadata Cloud Service Overview(v2) Exadata Cloud Service Overview(v2)
Exadata Cloud Service Overview(v2)
 
Successful Content Management Through Taxonomy And Metadata Design
Successful Content Management Through Taxonomy And Metadata DesignSuccessful Content Management Through Taxonomy And Metadata Design
Successful Content Management Through Taxonomy And Metadata Design
 
Graph-Based Customer Journey Analytics with Neo4j
Graph-Based Customer Journey Analytics with Neo4jGraph-Based Customer Journey Analytics with Neo4j
Graph-Based Customer Journey Analytics with Neo4j
 
Alteryx investor presentation
Alteryx investor presentationAlteryx investor presentation
Alteryx investor presentation
 
Machine Learning and AI at Oracle
Machine Learning and AI at OracleMachine Learning and AI at Oracle
Machine Learning and AI at Oracle
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Driven Possibilities with Qlik
Data Driven Possibilities with QlikData Driven Possibilities with Qlik
Data Driven Possibilities with Qlik
 
Slides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property GraphsSlides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property Graphs
 
[EPPG] Oracle to PostgreSQL, Challenges to Opportunity
[EPPG] Oracle to PostgreSQL, Challenges to Opportunity[EPPG] Oracle to PostgreSQL, Challenges to Opportunity
[EPPG] Oracle to PostgreSQL, Challenges to Opportunity
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
 
Migrating Oracle database to PostgreSQL
Migrating Oracle database to PostgreSQLMigrating Oracle database to PostgreSQL
Migrating Oracle database to PostgreSQL
 

Similar to Knowledge graph construction for research & medicine

Connected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul GrothConnected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul Groth
Connected Data World
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domains
Paul Groth
 
Applying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domainApplying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domain
Angelo Salatino
 
The real world of ontologies and phenotype representation: perspectives from...
The real world of ontologies and phenotype representation:  perspectives from...The real world of ontologies and phenotype representation:  perspectives from...
The real world of ontologies and phenotype representation: perspectives from...
Maryann Martone
 
The Neuroscience Information Framework:The present and future of neuroscience...
The Neuroscience Information Framework:The present and future of neuroscience...The Neuroscience Information Framework:The present and future of neuroscience...
The Neuroscience Information Framework:The present and future of neuroscience...
Neuroscience Information Framework
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Paul Groth
 
The possibility and probability of a global Neuroscience Information Framework
The possibility and probability of a global Neuroscience Information Framework The possibility and probability of a global Neuroscience Information Framework
The possibility and probability of a global Neuroscience Information Framework
Neuroscience Information Framework
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization Systems
Paul Groth
 
How do we know what we don’t know: Using the Neuroscience Information Framew...
How do we know what we don’t know:  Using the Neuroscience Information Framew...How do we know what we don’t know:  Using the Neuroscience Information Framew...
How do we know what we don’t know: Using the Neuroscience Information Framew...
Maryann Martone
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
Angelo Salatino
 
A knowledge capture framework for domain specific search systems
A knowledge capture framework for domain specific search systemsA knowledge capture framework for domain specific search systems
A knowledge capture framework for domain specific search systems
ramakanz
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
Angelo Salatino
 
Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...
BaoTramDuong2
 
How do we know what we don't know?  Exploring the data and knowledge space th...
How do we know what we don't know?  Exploring the data and knowledge space th...How do we know what we don't know?  Exploring the data and knowledge space th...
How do we know what we don't know?  Exploring the data and knowledge space th...
Maryann Martone
 
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
Artificial Intelligence Institute at UofSC
 
Text-Mining PubMed Search Results to Identify Emerging Technologies Relevant ...
Text-Mining PubMed Search Results to Identify Emerging Technologies Relevant ...Text-Mining PubMed Search Results to Identify Emerging Technologies Relevant ...
Text-Mining PubMed Search Results to Identify Emerging Technologies Relevant ...
University of Michigan Taubman Health Sciences Library
 
Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014
Mark Wilkinson
 
Neuroscience as networked science
Neuroscience as networked scienceNeuroscience as networked science
Neuroscience as networked science
Neuroscience Information Framework
 
Effective search of bibliographic databases
Effective search of bibliographic databasesEffective search of bibliographic databases
Effective search of bibliographic databasesTarek Tawfik Amin
 
Navigating the Neuroscience Data Landscape
Navigating the Neuroscience Data LandscapeNavigating the Neuroscience Data Landscape
Navigating the Neuroscience Data Landscape
Neuroscience Information Framework
 

Similar to Knowledge graph construction for research & medicine (20)

Connected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul GrothConnected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul Groth
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domains
 
Applying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domainApplying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domain
 
The real world of ontologies and phenotype representation: perspectives from...
The real world of ontologies and phenotype representation:  perspectives from...The real world of ontologies and phenotype representation:  perspectives from...
The real world of ontologies and phenotype representation: perspectives from...
 
The Neuroscience Information Framework:The present and future of neuroscience...
The Neuroscience Information Framework:The present and future of neuroscience...The Neuroscience Information Framework:The present and future of neuroscience...
The Neuroscience Information Framework:The present and future of neuroscience...
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
 
The possibility and probability of a global Neuroscience Information Framework
The possibility and probability of a global Neuroscience Information Framework The possibility and probability of a global Neuroscience Information Framework
The possibility and probability of a global Neuroscience Information Framework
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization Systems
 
How do we know what we don’t know: Using the Neuroscience Information Framew...
How do we know what we don’t know:  Using the Neuroscience Information Framew...How do we know what we don’t know:  Using the Neuroscience Information Framew...
How do we know what we don’t know: Using the Neuroscience Information Framew...
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
A knowledge capture framework for domain specific search systems
A knowledge capture framework for domain specific search systemsA knowledge capture framework for domain specific search systems
A knowledge capture framework for domain specific search systems
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...
 
How do we know what we don't know?  Exploring the data and knowledge space th...
How do we know what we don't know?  Exploring the data and knowledge space th...How do we know what we don't know?  Exploring the data and knowledge space th...
How do we know what we don't know?  Exploring the data and knowledge space th...
 
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
 
Text-Mining PubMed Search Results to Identify Emerging Technologies Relevant ...
Text-Mining PubMed Search Results to Identify Emerging Technologies Relevant ...Text-Mining PubMed Search Results to Identify Emerging Technologies Relevant ...
Text-Mining PubMed Search Results to Identify Emerging Technologies Relevant ...
 
Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014
 
Neuroscience as networked science
Neuroscience as networked scienceNeuroscience as networked science
Neuroscience as networked science
 
Effective search of bibliographic databases
Effective search of bibliographic databasesEffective search of bibliographic databases
Effective search of bibliographic databases
 
Navigating the Neuroscience Data Landscape
Navigating the Neuroscience Data LandscapeNavigating the Neuroscience Data Landscape
Navigating the Neuroscience Data Landscape
 

More from Paul Groth

To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AI
Paul Groth
 
Content + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningContent + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learning
Paul Groth
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
Paul Groth
 
Minimal viable-datareuse-czi
Minimal viable-datareuse-cziMinimal viable-datareuse-czi
Minimal viable-datareuse-czi
Paul Groth
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
Paul Groth
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph Futures
Paul Groth
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
Paul Groth
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
Paul Groth
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of Data
Paul Groth
 
End-to-End Learning for Answering Structured Queries Directly over Text
End-to-End Learning for  Answering Structured Queries Directly over Text End-to-End Learning for  Answering Structured Queries Directly over Text
End-to-End Learning for Answering Structured Queries Directly over Text
Paul Groth
 
From Data Search to Data Showcasing
From Data Search to Data ShowcasingFrom Data Search to Data Showcasing
From Data Search to Data Showcasing
Paul Groth
 
Elsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphElsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge Graph
Paul Groth
 
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for ScienceThe Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for Science
Paul Groth
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?
Paul Groth
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computation
Paul Groth
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge Graphs
Paul Groth
 
The need for a transparent data supply chain
The need for a transparent data supply chainThe need for a transparent data supply chain
The need for a transparent data supply chain
Paul Groth
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture Data
Paul Groth
 
Machines are people too
Machines are people tooMachines are people too
Machines are people too
Paul Groth
 

More from Paul Groth (20)

To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AI
 
Content + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningContent + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learning
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
Minimal viable-datareuse-czi
Minimal viable-datareuse-cziMinimal viable-datareuse-czi
Minimal viable-datareuse-czi
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph Futures
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of Data
 
End-to-End Learning for Answering Structured Queries Directly over Text
End-to-End Learning for  Answering Structured Queries Directly over Text End-to-End Learning for  Answering Structured Queries Directly over Text
End-to-End Learning for Answering Structured Queries Directly over Text
 
From Data Search to Data Showcasing
From Data Search to Data ShowcasingFrom Data Search to Data Showcasing
From Data Search to Data Showcasing
 
Elsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphElsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge Graph
 
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for ScienceThe Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for Science
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computation
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge Graphs
 
The need for a transparent data supply chain
The need for a transparent data supply chainThe need for a transparent data supply chain
The need for a transparent data supply chain
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture Data
 
Machines are people too
Machines are people tooMachines are people too
Machines are people too
 

Recently uploaded

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 

Recently uploaded (20)

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 

Knowledge graph construction for research & medicine

  • 1. KNOWLEDGE GRAPH CONSTRUCTION FOR RESEARCH & MEDICINE Paul Groth (@pgroth) pgroth.com Disruptive Technology Director Elsevier Labs (@elsevierlabs) Connected Data London 2017 Contributions: Brad Allen, Pascal Coupet, Sujit Pal, Craig Stanley, Ron Daniel, Alex de Jong
  • 2. Our customers are facing challenges in science and health 1. Industrial Research Institute 2. The Lancet 3. Tufts 4. World Health Organization Elsevier is in a unique position to make a contribution towards solving these challenges Life-saving drugs are expensive to develop.3 Global research spend is growing every year.1 3.4% from 2015 Predicted spend $1.9TN research in 2016 Studies: 70-80% of research asks the wrong questions or cannot be reproduced Researchers lack the tools they need to be effective.2 Preventable medicalerrors: Third largest cause of death in theUS Health providers cannot save lives without the best information.4 $2.5BN median pharmaceutical spend per drug 1/20 successrate of drugs Heart Disease 611k Cancer 585k Medical Error 225k 149k Respiratory Illness
  • 3. ELSEVIER’S BUSINESS: PROVIDING ANSWERS FOR RESEARCHERS, DOCTORS AND NURSES My work is moving towards a new field; what should I know? • Journal articles, reference works, profiles of researchers, funders & institutions • Recommendations of people to connect with, reading lists, topic pages How should I treat my patient given her condition & history? • Journal articles, reference works, medical guidelines, electronic health records • Treatment plan with alternatives personalized for the patient How can I master the subject matter of the course I am taking? • Course syllabus, reference works, course objectives, student history • Quiz plan based on the student’s history and course objectives
  • 4. THE ROLE OF METADATA IN THE SECOND MACHINE AGE – DC-2016 / KØBENHAVN / 13 OCTOBER ANSWERS ARE ABOUT THINGS, NOT JUST WORKS Why shouldn’t a search on an author return information about the author, including the author’s works? Where was the author born, when did she live, what is she known for? … All of this is possible, but only if we can make some fundamental changes in our approach to bibliographic description. ... The challenge for us lies in transforming what we can of our data into interrelated “things” without overindulging that metaphor. Coyle, K. (2016). FRBR, before and after: a look at our bibliographical models. Chicago: ALA Editions.
  • 5. THE ROLE OF METADATA IN THE SECOND MACHINE AGE – DC-2016 / KØBENHAVN / 13 OCTOBER KNOWLEDGE GRAPHS DEFINED • Knowledge graphs are "graph structured knowledge bases (KBs) which store factual information in form of relationships between entities” (Nickel, M., Murphy, K., Tresp, V. and Gabrilovich, E. (2015). A review of relational machine learning for knowledge graphs. arXiv:1503.00759v3) • Knowledge graphs are metadata evolved beyond the focus on the work, linking people, concepts, things and events • Knowledge Graphs are focused on things to provide answers
  • 6.
  • 7.
  • 8.
  • 9. THE ROLE OF METADATA IN THE SECOND MACHINE AGE – DC-2016 / KØBENHAVN / 13 OCTOBER ELSEVIER’S KNOWLEDGE PLATFORM Products Data & Content Sources Knowledge Graphs Platforms & Shared Services Entity Hubs Usage logs Pathways EHRsArticles Authors Institutions SyllabiCitations ChemicalsBooks DrugsFunders Funder Hub Article HubProfile Hub Journal Hub Institution Hub Research HealthcareLife Sciences Content Life Sciences Search IdentityResearch Reaxys CK SherpathScopus SD ROS
  • 10. THE ROLE OF METADATA IN THE SECOND MACHINE AGE – DC-2016 / KØBENHAVN / 13 OCTOBER THE GROWTH OF SCIENCE COMPLICATES OUR EFFORTS
  • 11. MORE DOMAINS & MORE SPECIFICITY Gregory, K., Groth, P., Cousijn, H., Scharnhorst, A., & Wyatt, S. (2017). Searching Data: A Review of Observational Data Retrieval Practices. arXiv preprint arXiv:1707.06937. Some observations from @gregory_km survey: 1. The needs and behaviours of specific user groups (e.g. early career researchers, policy makers, students) are not well documented. 2. Background uses of observational data are better documented than foreground uses. 3. Reconstructing data tables from journal articles, using general search engines, and making direct data requests are common.
  • 14.
  • 15. HOW - OVERVIEW Content Books, Articles, Ontologies ... • Identification of concepts • Disambiguation • Domain/sub-domain identification • Abbreviations, variants • Gazeteering • Identification and classification of text snippets around concepts • Features building for concept/snippet pairs • Lexical, syntactic, semantic, doc structure … • Ranking concept snippet pairs • Machine learning • Hand made rules • Similarities • Deduplication Technologies NLP, ML • Curation • White list driven • Black list • Corrections/improve ments • Evaluation • Gold set by domain • Random set by domain • By SMEs (Subject Matter Experts) • Automation • Content Enrichment Framework • Taxonomy coverage extension Knowledge Graph Concepts, snippets, meta data, …
  • 16. | 16 OmniScience Neuros cience Extension vocabularies by domains to provide coverage Number of Concepts Number of Labels OmniScience 01.16.11 45969 47421 OmniScience Neuroscience branch 21/11/2016 2356 2455 OmniScience Extension Neuroscience branch 21/11/2016 23932 101276
  • 17. | 17 Concept Bad Good Inferior Colliculus By comparing activation obtained in an equivalent standard ( non-cardiac-gated ) fMRI experiment , Guimaraes and colleagues found that cardiac- gated activation maps yielded much greater activation in subcortical nuclei , such as the inferior colliculus . The inferior colliculus (IC) is part of the tectum of the midbrain (mesencephalon) comprising the quadrigeminal plate (Lamina quadrigemina). It is located caudal to the superior colliculus on the dorsal surface of the mesencephalon ( Figure 36.7 FIGURE 36.7Overview of the human brainstem; view from dorsal. The superior and inferior colliculi form the quadrigeminal plate. Parts of the cerebellum are removed.). The ventral border is formed by the lateral lemniscus. The inferior colliculus is the largest nucleus of the human auditory system. … Purkinje cells It is felt that the aminopyridines are likely to increase the excitability of the potassium channel- rich cerebellar Purkinje cells in the flocculus ( Etzion and Grossman , 2001 ) . Purkinje cells are the most salient cellular elements of the cerebellar cortex. They are arranged in a single row throughout the entire cerebellar cortex between the molecular (outer) layer and the granular (inner) layer. They are among the largest neurons and have a round perikaryon, classically described as shaped “like a chianti bottle,” with a highly branched dendritic tree shaped like a candelabrum and extending into the molecular layer where they are contacted by incoming systems of afferent fibers from granule neurons and the brainstem… Olfactory Bulb The most common sites used for induction of kindling include the amygdala, perforant path , dorsal hippocampus , olfactory bulb , and perirhinal cortex. The olfactory bulb is the first relay station of the central olfactory system in the vertebrate brain and contains in its superficial layer a few thousand glomeruli, spherical neuropils with sharp borders ( Figure 1 Figure 1Axonal projection pattern of olfactory sensory neurons to the glomeruli of the rodent olfactory bulb. The olfactory epithelium in rats and mice is divided into four zones (zones 1–4). A given odorant receptor is expressed by sensory neurons located within one zone of the epithelium. Individual olfactory sensory neurons express a single odorant receptor… Examples of good and bad snippets
  • 18.
  • 19. 19 One Weird Trick from Natural Language Processing (NLP) • Knowledge bases are populated by scanning text and doing Information Extraction • Most information extraction systems are looking for very specific things, like drug-drug interactions • Best accuracy for that one kind of data, but misses out on all the other concepts and relations in the text • For broad knowledge base, use Open Information Extraction that only uses some knowledge of grammar • The weird trick for open information extraction … a simple algorithm, known as ReVerb*: 1. Find “relation phrases” starting with a verb and ending with a verb or preposition 2. Find noun phrases before and after the relation phrase 3. Discard relation phrases not used with multiple combinations of arguments. In addition, brain scans were performed to exclude other causes of dementia. * Fader et al. Identifying Relations for Open Information Extraction
  • 20. 20 ReVerb output # SD Documents Scanned 14,000,000 Extracted ReVerb Triples 473,350,566
  • 21. 21 Universal schemas – Predict ‘missing‘ KG facts • Make a matrix: • columns for the relation phrases from ReVerb or the semantic relations from EMMeT • rows are the pairs of concepts linked by a relation • A ‘1.0’ in a cell if those concepts were linked by that relation • Outlined cells in diagram are the ones initialized to 1. • Factorize matrix to ExK and KxR, then recombine. • “Learns” the correlations between text relations and EMMeT relations, in the context of pairs of objects. • Cells going from 0 to > 0 indicates potential. • Find new triples to go into EMMeT e.g., (glaucoma, has_alternativeProcedure, biofeedback)
  • 23. 23 Medical Graph – Statistical correlations at scale I65 Occlusion and stenosis of precerebral arteries G40 Epilepsy has_successor I61 C71 Malignant neoplasm of brain odds ratio: 1.12 intracerebral hemorrhage has_successor criteria1: • Correlation selected by preditive modeling algorithmus • No. of relations is higher than in mirrored relation • p-value < 0,05 • Odds ratios balanced over all covariates. 1 Criteria based on: Jensen et.al.: Temporal disease trajectories condensed from population-wide registry data covering 6.2 million patients. Nature Communications, 2014 Jun 24 ;5:4022. doi: 10.1038/ncomms5022. Other covariates Primary care Secondary care Drug prescriptions 5m patients each 6 years longitudinality
  • 24. 24 Medical Graph in practice, patient 35: risk of depression • 49 year old man • Dx: overweight, diabetes, hypertension, anxiety disorder  has an absolute risk of 36% to develop a depression within the next 4 years
  • 25. 25 … and rationale of why model thinks this
  • 26. 26 • Targets for prediction: ICD-coded diagnoses • Only incident patients per diagnose considered, i.e. diagnosis-free 2009 – 2010 • if these patients remain diagnosis-free 2011 - 2014 (observation period), then 0 else 1 • Covariates: all ICD-/ATC-codes, age and sex measured in 2010 Example: Model to predict „I50 – Heart Failure“ 26 Analysis Design Predict 4 year long-term effects, balanced for all co-variables I50 - I50 free patients 2009 2010 time I50 - (coded as 0) I50 + (coded as1) 2011 2014 Covariates Remaining I50 free patients/ newly I50 diagnosed patients
  • 27. 27 27 (A) integrate & clean Research on anonymized claims data Primary care Secondary care Drug prescriptions Other data Visits & diagnoses Visits, diagnoses & procedures Drug presciptions Further cooperations just started Will enable analysis of vital and laboratory parameters Data integration & cleaning • Data cleaning • Longitudinally linked & integrated for analytics • Anonymized 6 Mio patients 6 years > 1.5b events Billing data flow 60+ sickness funds
  • 28. 28 Technology stack feature extraction For 3.8m patients: • age, gender • all diagnoses: ICD10-coded, 3 digits, i.e. 2054 codes • all medications: ATC-coded, 5 digits, i.e. 906 codes • death, hospitalization Results in: 6277 features • 1623 targets, 2011-2014 • 2320 covariates, 2010 • 2334 filter-columns, 2009-2010 data mining Calculate prevalence, incidence, mean age for all covariates (i.e. diseases and medications) machine learning Predictive modelling for ~1600 targets • Linear classification model, resulting in odds ratios • Calculation of p-values (B) mine & learn Calculate statistics & build prediction models for ~1600 targets
  • 29. KNOWLEDGE GRAPHS AS A BACKBONE 29
  • 30.
  • 31. • Total concepts = 540,632 • 100+ person years of clinical expert knowledge EMMeT Ontology
  • 32. EXPANDING TO INCLUDE IMAGE SNIPPETS
  • 33. 33 Enrichment though integration using linked data
  • 34. CONCLUSION • Knowledge graphs are critical components for delivering customer value • AI techniques such as machine learning and predictive modelling from data are key parts of knowledge graph construction • This is particularly the case as the amount, speed and specificity of data and requirements accelerates • Leveraging existing assets such as ontologies, data, and controlled (i.e. connected data) have been key assets for Elsevier in the build out of knowledge graphs • Another talk is how all this enables intelligent based solutions • Oh and we are hiring  • Paul Groth p.groth@elsevier.com

Editor's Notes

  1. Work with dans Reviewed 400 papers deep dive 114