SlideShare a Scribd company logo
1 of 36
1
Combining Explicit and
Latent Web Semantics
Paul Groth - @pgroth
Elsevier Labs
BigNet : WWW 2018
Thanks to Ron Daniel, Brad Allen & the Labs Team
Empowering
KnowledgeTM
for Maintaining Knowledge Graphs
2
Outline
Goal: to tell you our current thinking and to get your feedback
• Why we’re interested
• What we’ve tried
• What we’re missing
• Webby Data
• 2 Sources of Semantics
• State of the art
• What’s missing
Warning: The back half is like a probably incomplete
literature review so think of this as pointers 
3
Knowledge Graphs
4
5
EMMeT (Elsevier Merged Medical Taxonomy)
EMMeT is a multilingual, concept-based clinical ontology
• Multilingual: English, French, Spanish
• Concept-based: All terms, synonyms, translations, mappings are
related to a unique identifier (“IMUI”)
• Ontology: Provides semantic relationships between concepts
(symptoms of a disease, treatment procedures of a disease,
complications of a disease or a procedure, etc…)
EMMeT is a controlled reference terminology
• Based on Unified Medical Language System (UMLS), standard clinical
terminologies as well as Elsevier proprietary vocabularies and lists of
acronyms
• Explicitly mapped to international medical standards (SNOMED-CT,
ICD-9-CM, ICD-10-CM, LOINC, RXNorm, CVX, etc.) and Elsevier’s
vocabularies (Gold Standard, EMTREE, etc.)
EMMeT is current
• Continuously updated, and released every 12 weeks for automatic
indexing
• Updated daily and available via an API for manual tagging access
• Maintained by a team of medical terminology experts,
6
Automated Tagging
Manual Tagging/
Data Structuring
Products and platforms using EMMeT
Clinical Solutions
ClinicalKey Global
ClinicalKey ANZ
ClinicalKey France
ClinicalKey Espanol
ClinicalKey Nursing
ClinicalKey German
ClinicalKey Nursing ANZ
ClinicalKey Brazil
Amirsys Decision Point
RP/STMJ
Health Advance
The Lancet
Cell
LexisNexis
MedMal Nav
LN Insight
Legend
In production
In Pilot
In Pipeline
Nursing Education
Mosby’s Dictionary
Clinical Solutions
PoC - Clinical Overviews
ClinicalKey HL7 API
Health Analytics
IDS FHIR API/Apps
Dorland’s Dictionary
Patient Engagement
Gold Standard CP
ERC
Content 2.0
Nursing Education
Sherpath
EMEALAAP
MedEnact
RP/STMJ (SCT)
Health Advance
The Lancet
Cell
7
EMMeT Clinical Knowledge Graph
8
Rankings of EMMeT’s ontological relationships
• Relationships are ranked according to 5-tiered ranking model: for simplicity and accessibility.
• 10: best option;
• 9: second option. When the rank of 10 is not applicable;
• 8: given two concepts that are too general to be directly related to a specific disease;
• 7: is used as an outlier.
• 6: default / non validated.
Relationship
Ranking Criteria
10 9 8 7
has cause most common common sometimes rare
has clinical finding most common common sometimes rare
has_complication severity (disease) severe/death high moderate low morbiditiy
has_complication prevalence (disease) Strong occurrence/high prevalence Likely occurrence/ commonly prevalent Sometimes occurs Rare occurrence
has_complication severity (procedure) critical/death major moderate minor
has comorbidity strongly associated Commonly associated Sometimes associated Rarely associated
has screening procedure best choice is done sometimes done rarely done
has risk factor strongly associated Commonly associated Sometimes associated Rarely associated
has diagnostic procedure best choice commonly done sometimes done rarely done
has differential diagnosis Strong occurrence/high prevalence Likely occurrence/ commonly prevalent Sometimes occurs/ low prevalence Rare occurrence
has drug best choice 2nd line 3rd line rarely given
has contraindication drug Strongly avoid/black box Commonly avoid Sometimes avoid Rarely avoid
has treatment procedure best choice commonly done sometimes done rarely done
has prevention Best option common option sometimes advised rarely advised
has physician specialty specific specialty general/specialty broad rare
has device standard device acceptable device sometimes used rarely used
9
From EMMeT to H-Graph
• Based on EMMeT
• Support more complex relations including patient context (Clinical Overview content + more)
• Flexible and extensible model to support links to content, model treatment strategies, numeric values, temporal
data, etc. Age, sex, weight, … are very simple context.
In people with atrial fibrillation presenting acutely without life-threatening haemodynamic instability, offer rate
or rhythm control if the onset of the arrhythmia is less than 48 hours, and start rate control if it is more than 48
hours or is uncertain. NICE Guideline Atrial Fibrillation: Management
• Continue to support existing indexing pipelines (e.g. ClinicalKey), and tagging use cases (e.g. Clinical Overviews)
From EMMeT… …To H-Graph
10
Universal Schemas
11
Universal schemas
• … are a specific technique from the Information Extraction and the Automatic Knowledge Base
Completion literature
• … are an unsupervised method to ‘learn’ by combining text extracts with existing knowledge base
assertions
• Applications:
• Extend a medical knowledge base
• scan incoming literature to suggest new additions to EMMeT and show the
underlying evidence to the taxonomy editor.
• scan literature backlog to find evidence for data already in EMMeT
• Literature Surveillance
• scan incoming literature to find existing facts even if expressed in very different ways
• find new concepts in the literature related to an existing EMMeT concept*. Let taxonomy
editor decide whether to add new concept and relation to EMMeT
12
Open Information Extraction
• Knowledge bases are populated by scanning text and doing Information Extraction
• Most information extraction systems are looking for very specific things, like drug-drug interactions
• Best accuracy for that one kind of data, but misses out on all the other concepts and relations in the text
• For broad knowledge base, use Open Information Extraction that only uses some knowledge of grammar
• One weird trick for open information extraction …
• ReVerb*:
1. Find “relation phrases” starting with a verb and ending with a verb or preposition
2. Find noun phrases before and after the relation phrase
3. Discard relation phrases not used with multiple combinations of arguments.
In addition, brain scans were performed to exclude
other causes of dementia.
* Fader et al. Identifying Relations for Open Information Extraction
13
ReVerb output
After ReVerb pulls out noun phrases, match them up to EMMeT concepts
Discard rare concepts, relations, or relations that are not used with many different concepts
# SD Documents Scanned 14,000,000
Extracted ReVerb Triples 473,350,566
14
Universal schemas - Initialization
• Method to combine ‘facts’ found by
machine reading with stronger
assertions from ontology.
• Build ExR matrix with entity-pairs
as rows and relations as columns.
• Relation columns can come from
EMMeT, or from ReVerb
extractions.
• Cells contain 1.0 if that pair of
entities is connected by that
relation.
15
Universal schemas - Prediction
• Factorize matrix to ExK and KxR,
then recombine.
• “Learns” the correlations between
text relations and EMMeT relations,
in the context of pairs of objects.
• Find new triples to go into EMMeT
e.g., (glaucoma,
has_alternativeProcedure,
biofeedback)
16
Content
Universal
schema
Surface form
relations
Structured
relations
Factorization
model
Matrix
Construction
Open
Information
Extraction
Entity
Resolution
Matrix
Factorization
Knowledge
graph
Curation
Predicted
relations
Matrix
Completion
Taxonomy
Triple
Extraction
Concept
Resolution
14M
SD articles
475 M
triples
3.3 million
relations
49 M
relations
~15k ->
1M
entries
Paul Groth, Sujit Pal, Darin McBeath, Brad Allen, Ron Daniel
“Applying Universal Schemas for Domain Specific Ontology Expansion”
5th Workshop on Automated Knowledge Base Construction (AKBC) 2016
Michael Lauruhn, and Paul Groth. "Sources of Change for Modern
Knowledge Organization Systems." Knowledge Organization 43, no. 8
(2016).
ONTOLOGY MAINTENANCE
• Pretty good F measure around -.7
• Good enough with human in the loop
• But we want more!
17
Paulheim, Heiko. "Knowledge graph refinement: A survey of approaches and evaluation
methods." Semantic web 8.3 (2017): 489-508.
WHERE TO GO?
18
MORE THAN LINK PREDICTION
• Data has deep hierarchy –link prediction flattens this
• Data has hooks into specific content
• Schemas are increasingly richly defined – not just a
single type
• N-ary relations
19
OUR KG’S SHARE PROPERTIES WITH WEB KGS
Ringler, Daniel, and Heiko Paulheim. "One knowledge graph to rule them all? Analyzing
the differences between DBpedia, YAGO, Wikidata & co." Joint German/Austrian
Conference on Artificial Intelligence (Künstliche Intelligenz). Springer, Cham, 2017.
20
The Web of Data
http://webdatacommons.org/structureddata/
2017-12/stats/stats.html
http://lodlaundromat.org
21
Two sources of semantics
1.Dereferenceablity
2.Rules
22
Dereferenceablity
Looking definitions up – Natural Language and Programmatic
23
WIKIDATA VOCABULARY
24
Pay attention to the underlying data
Paul Groth, Michael Lauruhn, Antony Scerri: “Open Information Extraction on Scientific Text: An
Evaluation”, 2018; [http://arxiv.org/abs/1802.05574 arXiv:1802.05574]
25
Embed more
Gupta, N., Singh, S., & Roth, D. (2017). Entity linking via joint encoding of types,
descriptions, and context. In Proceedings of the 2017 Conference on Empirical
Methods in Natural Language Processing (pp. 2681-2690).
26
Embed more
Both, Fabian, Steffen Thoma, and Achim Rettinger. "Cross-modal Knowledge Transfer:
Improving the Word Embedding of Apple by Looking at Oranges." Proceedings of the
Knowledge Capture Conference. ACM, 2017.
27
Social Semantics?
de Rooij, S., Beek, W., Bloem, P., van Harmelen, F., & Schlobach, S. (2016, October).
Are Names Meaningful? Quantifying Social Meaning on the Semantic Web.
In International Semantic Web Conference (pp. 184-199). Springer, Cham.
• Distributional semantics for
identifiers (NTN)
• But uses the global network
• Could we use the discussion
space as well?
NTN - Socher, R., Chen, D., Manning, C. D., & Ng, A. (2013).
Reasoning with neural tensor networks for knowledge base
completion. In Advances in neural information processing
systems (pp. 926-934).
28
schema:dateModified a rdf:Property ;
rdfs:label "dateModified" ;
schema:domainIncludes schema:CreativeWork,
schema:DataFeedItem ;
schema:rangeIncludes schema:Date,
schema:DateTime ;
rdfs:comment "The date on which the CreativeWork was
most recently modified or when the item's entry was
modified within a DataFeed." .
schema:datePublished a rdf:Property ;
rdfs:label "datePublished" ;
schema:domainIncludes schema:CreativeWork ;
schema:rangeIncludes schema:Date ;
rdfs:comment "Date of first broadcast/publication." .
schema:disambiguatingDescription a rdf:Property ;
rdfs:label "disambiguatingDescription" ;
schema:domainIncludes schema:Thing ;
schema:rangeIncludes schema:Text ;
rdfs:comment "A sub property of description. A short
description of the item used to disambiguate from other,
similar items. Information from other properties (in
particular, name) may be necessary for the description to
be useful for disambiguation." ;
rdfs:subPropertyOf schema:description .
https://www.w3.org/TR/rdf11-mt/
Rules
29
Injecting Background Knowledge as Constraints
Rocktäschel, T., Singh, S., & Riedel, S. (2015). Injecting logical background knowledge into embeddings for relation
extraction. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies (pp. 1119-1129)
30
Learning Rules
Yang, Fan, Zhilin Yang, and William W. Cohen. "Differentiable learning of logical
rules for knowledge base reasoning." Advances in Neural Information Processing
Systems. 2017.
31
Combing Both – supporting complex reasoning with subsymbolic representations
Rocktäschel, T., & Riedel, S. (2017). End-to-end
differentiable proving. In Advances in Neural Information
Processing Systems (pp. 3791-3803).
32
Future
Welbl, J., Stenetorp, P., & Riedel, S. (2017). Constructing Datasets for
Multi-hop Reading Comprehension Across Documents. arXiv preprint
arXiv:1710.06481.
•Scale
•The knowledge base == text?
•Multi-hop reasoning
•Is everything end-to-end
differentiable
33
Conclusion
• In practice: data is webby data
• Messy
• Interconnected
• Constraints and rules associated
• Semantic Web: semantics can come from multiple different sources
• Explicit & implicit
• Take advantage of those sources
• Knowledge graphs benefit from inference
• Your thoughts?
• Thanks & We’re hiring!
p.groth@elsevier.com | pgroth.com
labs.elsevier.com
34
Backup
35
INTEGRATION OF LARGE NUMBERS OF DATA SOURCES
Groth, Paul, "The Knowledge-Remixing Bottleneck," Intelligent Systems, IEEE ,
vol.28, no.5, pp.44,48, Sept.-Oct. 2013 doi: 10.1109/MIS.2013.138
• 10 different extractors
• E.g mapping-based infobox extractor
• Infobox uses a hand-built ontology based on the 350
• Based on acommonly used English language
infoboxes
• Integrates with Yago
• Yago relies on Wikipedia + Wordnet
• Upper ontology from Wordnet and then a mapping to
Wikipedia categories based frequencies
• Wordnet is built by psycholinguists
36
Units & Measurement Annotations
• Time
• Dosage
• Probability
• Percent
• Count
• Not handled yet
Find numbers followed by a unit name or abbreviation (perhaps with scale factor like k, m, G, …). Provide value
normalized to SI units. Also provide type of measurement (time, temperature, length, mass, dosage, etc.) based on
unit. Handling tolerances, ranges, probabilities, and counts adds complexity. Conjunctions not yet handled but very
important.
Current work – identify the property being measured (e.g. dosages of AA, indomethacin, HtE, leptin, etc.)
Additionally at 120 min following glucose administration, the 100 mg/kg 5g and 5e groups had
significantly (P ⩽ 0.005) a greater drop in blood glucose than the 10 and 50 mg/kg groups.
In the mouse xenograft model of LLC cells in C57BL/6J mice, once daily administration of AA (50 and
100 mg/kg) inhibited tumor growth in a dose-dependent manner (Fig. 6A and C).
Groups of Swiss mice (n = 6) were treated (p.o.) with vehicle, indomethacin (10 mg/kg-Roche®) or HtE
(50, 100 or 200 mg/kg) 1 h before administration of carrageenan at 2.5% (Sigma-Aldrich®) injected
subcutaneously into the plantar region of the left hind paw and phosphate buffer saline (PBS) in
right hind paw.
In the experiments designed to study the antidepressant-like effect of the repeated treatment (for
14 days) of EET, the immobility time in the TST and the locomotor activity in the open-field were
assessed in independent groups of mice 24 h after the last daily administration of EET (10–100
mg/kg, p.o.).
Hoppers containing chow were removed from the cages 1 h before the administration of leptin
[depending on studies, 5 mg/kg or 2.5 mg/kg, ip; mouse recombinant leptin obtained from Dr. A.F.

More Related Content

What's hot

Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Paul Groth
 
End-to-End Learning for Answering Structured Queries Directly over Text
End-to-End Learning for  Answering Structured Queries Directly over Text End-to-End Learning for  Answering Structured Queries Directly over Text
End-to-End Learning for Answering Structured Queries Directly over Text Paul Groth
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsPaul Groth
 
Content + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningContent + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningPaul Groth
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?Paul Groth
 
Open interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIOpen interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIPistoia Alliance
 
Reproducible research: First steps.
Reproducible research: First steps. Reproducible research: First steps.
Reproducible research: First steps. Richard Layton
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph MaintenancePaul Groth
 
CEDAR work bench for metadata management
CEDAR work bench for metadata managementCEDAR work bench for metadata management
CEDAR work bench for metadata managementPistoia Alliance
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkPaul Groth
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...Carole Goble
 
Elsevier's Healthcare Knowledge Graph: An Actionable Medical Knowledge Platfo...
Elsevier's Healthcare Knowledge Graph: An Actionable Medical Knowledge Platfo...Elsevier's Healthcare Knowledge Graph: An Actionable Medical Knowledge Platfo...
Elsevier's Healthcare Knowledge Graph: An Actionable Medical Knowledge Platfo...Maulik Kamdar
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical DataPaul Agapow
 
Digital webinar master deck final
Digital webinar master deck finalDigital webinar master deck final
Digital webinar master deck finalPistoia Alliance
 
On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...Susanna-Assunta Sansone
 
On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...Susanna-Assunta Sansone
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Carole Goble
 
Fairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matricesFairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matricesPistoia Alliance
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the WebRinke Hoekstra
 

What's hot (20)

Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
End-to-End Learning for Answering Structured Queries Directly over Text
End-to-End Learning for  Answering Structured Queries Directly over Text End-to-End Learning for  Answering Structured Queries Directly over Text
End-to-End Learning for Answering Structured Queries Directly over Text
 
Fair by design
Fair by designFair by design
Fair by design
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domains
 
Content + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningContent + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learning
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?
 
Open interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIOpen interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBI
 
Reproducible research: First steps.
Reproducible research: First steps. Reproducible research: First steps.
Reproducible research: First steps.
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
CEDAR work bench for metadata management
CEDAR work bench for metadata managementCEDAR work bench for metadata management
CEDAR work bench for metadata management
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic Framework
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
Elsevier's Healthcare Knowledge Graph: An Actionable Medical Knowledge Platfo...
Elsevier's Healthcare Knowledge Graph: An Actionable Medical Knowledge Platfo...Elsevier's Healthcare Knowledge Graph: An Actionable Medical Knowledge Platfo...
Elsevier's Healthcare Knowledge Graph: An Actionable Medical Knowledge Platfo...
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical Data
 
Digital webinar master deck final
Digital webinar master deck finalDigital webinar master deck final
Digital webinar master deck final
 
On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...
 
On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how
 
Fairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matricesFairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matrices
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the Web
 

Similar to Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs

Provenance abstraction for implementing security: Learning Health System and ...
Provenance abstraction for implementing security: Learning Health System and ...Provenance abstraction for implementing security: Learning Health System and ...
Provenance abstraction for implementing security: Learning Health System and ...Vasa Curcin
 
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...Koray Atalag
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Amit Sheth
 
Ontologies: What Librarians Need to Know
Ontologies: What Librarians Need to KnowOntologies: What Librarians Need to Know
Ontologies: What Librarians Need to KnowBarry Smith
 
Standards in health informatics - Problem, clinical models and terminologies
Standards in health informatics - Problem, clinical models and terminologiesStandards in health informatics - Problem, clinical models and terminologies
Standards in health informatics - Problem, clinical models and terminologiesSilje Ljosland Bakke
 
openEHR Approach to Detailed Clinical Models (DCM) Development - Lessons Lear...
openEHR Approach to Detailed Clinical Models (DCM) Development - Lessons Lear...openEHR Approach to Detailed Clinical Models (DCM) Development - Lessons Lear...
openEHR Approach to Detailed Clinical Models (DCM) Development - Lessons Lear...Koray Atalag
 
Amia tb-review-12
Amia tb-review-12Amia tb-review-12
Amia tb-review-12Russ Altman
 
AllegroGraph - Cognitive Probability Graph webcast
AllegroGraph - Cognitive Probability Graph webcastAllegroGraph - Cognitive Probability Graph webcast
AllegroGraph - Cognitive Probability Graph webcastFranz Inc. - AllegroGraph
 
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...Health Informatics New Zealand
 
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsEnabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsLuis Marco Ruiz
 
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsEnabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsLuis Marco Ruiz
 
Creating an Urban Legend: A System for Electrophysiology Data Management and ...
Creating an Urban Legend: A System for Electrophysiology Data Management and ...Creating an Urban Legend: A System for Electrophysiology Data Management and ...
Creating an Urban Legend: A System for Electrophysiology Data Management and ...Anita de Waard
 
A Semantic Web based Framework for Linking Healthcare Information with Comput...
A Semantic Web based Framework for Linking Healthcare Information with Comput...A Semantic Web based Framework for Linking Healthcare Information with Comput...
A Semantic Web based Framework for Linking Healthcare Information with Comput...Koray Atalag
 
Ehr models, standards and semantic interoperability
Ehr models, standards and semantic interoperabilityEhr models, standards and semantic interoperability
Ehr models, standards and semantic interoperabilityDavid Moner Cano
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...William Gunn
 
Machine learning, health data & the limits of knowledge
Machine learning, health data & the limits of knowledgeMachine learning, health data & the limits of knowledge
Machine learning, health data & the limits of knowledgePaul Agapow
 
Simplifying semantics for biomedical applications
Simplifying semantics for biomedical applicationsSimplifying semantics for biomedical applications
Simplifying semantics for biomedical applicationsSemantic Web San Diego
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biologyrobertstevens65
 
Big Data: Learning from MIMIC- Celi
Big Data: Learning from MIMIC- CeliBig Data: Learning from MIMIC- Celi
Big Data: Learning from MIMIC- Celiintensivecaresociety
 

Similar to Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs (20)

Provenance abstraction for implementing security: Learning Health System and ...
Provenance abstraction for implementing security: Learning Health System and ...Provenance abstraction for implementing security: Learning Health System and ...
Provenance abstraction for implementing security: Learning Health System and ...
 
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
 
Paul Groth
Paul GrothPaul Groth
Paul Groth
 
Ontologies: What Librarians Need to Know
Ontologies: What Librarians Need to KnowOntologies: What Librarians Need to Know
Ontologies: What Librarians Need to Know
 
Standards in health informatics - Problem, clinical models and terminologies
Standards in health informatics - Problem, clinical models and terminologiesStandards in health informatics - Problem, clinical models and terminologies
Standards in health informatics - Problem, clinical models and terminologies
 
openEHR Approach to Detailed Clinical Models (DCM) Development - Lessons Lear...
openEHR Approach to Detailed Clinical Models (DCM) Development - Lessons Lear...openEHR Approach to Detailed Clinical Models (DCM) Development - Lessons Lear...
openEHR Approach to Detailed Clinical Models (DCM) Development - Lessons Lear...
 
Amia tb-review-12
Amia tb-review-12Amia tb-review-12
Amia tb-review-12
 
AllegroGraph - Cognitive Probability Graph webcast
AllegroGraph - Cognitive Probability Graph webcastAllegroGraph - Cognitive Probability Graph webcast
AllegroGraph - Cognitive Probability Graph webcast
 
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...
 
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsEnabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
 
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsEnabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
 
Creating an Urban Legend: A System for Electrophysiology Data Management and ...
Creating an Urban Legend: A System for Electrophysiology Data Management and ...Creating an Urban Legend: A System for Electrophysiology Data Management and ...
Creating an Urban Legend: A System for Electrophysiology Data Management and ...
 
A Semantic Web based Framework for Linking Healthcare Information with Comput...
A Semantic Web based Framework for Linking Healthcare Information with Comput...A Semantic Web based Framework for Linking Healthcare Information with Comput...
A Semantic Web based Framework for Linking Healthcare Information with Comput...
 
Ehr models, standards and semantic interoperability
Ehr models, standards and semantic interoperabilityEhr models, standards and semantic interoperability
Ehr models, standards and semantic interoperability
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
 
Machine learning, health data & the limits of knowledge
Machine learning, health data & the limits of knowledgeMachine learning, health data & the limits of knowledge
Machine learning, health data & the limits of knowledge
 
Simplifying semantics for biomedical applications
Simplifying semantics for biomedical applicationsSimplifying semantics for biomedical applications
Simplifying semantics for biomedical applications
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biology
 
Big Data: Learning from MIMIC- Celi
Big Data: Learning from MIMIC- CeliBig Data: Learning from MIMIC- Celi
Big Data: Learning from MIMIC- Celi
 

More from Paul Groth

Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIPaul Groth
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph FuturesPaul Groth
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph MaintenancePaul Groth
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of DataPaul Groth
 
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for ScienceThe Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for SciencePaul Groth
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationPaul Groth
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsPaul Groth
 
Are we finally ready for transclusion?*
Are we finally ready for transclusion?*Are we finally ready for transclusion?*
Are we finally ready for transclusion?*Paul Groth
 
Structured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialStructured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialPaul Groth
 
Data for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchersData for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchersPaul Groth
 
Tradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance CaptureTradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance CapturePaul Groth
 
Knowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaKnowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaPaul Groth
 
Information architecture at Elsevier
Information architecture at ElsevierInformation architecture at Elsevier
Information architecture at ElsevierPaul Groth
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging EnvironmentsPaul Groth
 

More from Paul Groth (15)

Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AI
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph Futures
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of Data
 
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for ScienceThe Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for Science
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computation
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge Graphs
 
Are we finally ready for transclusion?*
Are we finally ready for transclusion?*Are we finally ready for transclusion?*
Are we finally ready for transclusion?*
 
Structured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialStructured Data & the Future of Educational Material
Structured Data & the Future of Educational Material
 
Data for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchersData for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchers
 
Tradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance CaptureTradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance Capture
 
Knowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaKnowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPedia
 
Information architecture at Elsevier
Information architecture at ElsevierInformation architecture at Elsevier
Information architecture at Elsevier
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging Environments
 

Recently uploaded

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 

Recently uploaded (20)

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 

Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs

  • 1. 1 Combining Explicit and Latent Web Semantics Paul Groth - @pgroth Elsevier Labs BigNet : WWW 2018 Thanks to Ron Daniel, Brad Allen & the Labs Team Empowering KnowledgeTM for Maintaining Knowledge Graphs
  • 2. 2 Outline Goal: to tell you our current thinking and to get your feedback • Why we’re interested • What we’ve tried • What we’re missing • Webby Data • 2 Sources of Semantics • State of the art • What’s missing Warning: The back half is like a probably incomplete literature review so think of this as pointers 
  • 4. 4
  • 5. 5 EMMeT (Elsevier Merged Medical Taxonomy) EMMeT is a multilingual, concept-based clinical ontology • Multilingual: English, French, Spanish • Concept-based: All terms, synonyms, translations, mappings are related to a unique identifier (“IMUI”) • Ontology: Provides semantic relationships between concepts (symptoms of a disease, treatment procedures of a disease, complications of a disease or a procedure, etc…) EMMeT is a controlled reference terminology • Based on Unified Medical Language System (UMLS), standard clinical terminologies as well as Elsevier proprietary vocabularies and lists of acronyms • Explicitly mapped to international medical standards (SNOMED-CT, ICD-9-CM, ICD-10-CM, LOINC, RXNorm, CVX, etc.) and Elsevier’s vocabularies (Gold Standard, EMTREE, etc.) EMMeT is current • Continuously updated, and released every 12 weeks for automatic indexing • Updated daily and available via an API for manual tagging access • Maintained by a team of medical terminology experts,
  • 6. 6 Automated Tagging Manual Tagging/ Data Structuring Products and platforms using EMMeT Clinical Solutions ClinicalKey Global ClinicalKey ANZ ClinicalKey France ClinicalKey Espanol ClinicalKey Nursing ClinicalKey German ClinicalKey Nursing ANZ ClinicalKey Brazil Amirsys Decision Point RP/STMJ Health Advance The Lancet Cell LexisNexis MedMal Nav LN Insight Legend In production In Pilot In Pipeline Nursing Education Mosby’s Dictionary Clinical Solutions PoC - Clinical Overviews ClinicalKey HL7 API Health Analytics IDS FHIR API/Apps Dorland’s Dictionary Patient Engagement Gold Standard CP ERC Content 2.0 Nursing Education Sherpath EMEALAAP MedEnact RP/STMJ (SCT) Health Advance The Lancet Cell
  • 8. 8 Rankings of EMMeT’s ontological relationships • Relationships are ranked according to 5-tiered ranking model: for simplicity and accessibility. • 10: best option; • 9: second option. When the rank of 10 is not applicable; • 8: given two concepts that are too general to be directly related to a specific disease; • 7: is used as an outlier. • 6: default / non validated. Relationship Ranking Criteria 10 9 8 7 has cause most common common sometimes rare has clinical finding most common common sometimes rare has_complication severity (disease) severe/death high moderate low morbiditiy has_complication prevalence (disease) Strong occurrence/high prevalence Likely occurrence/ commonly prevalent Sometimes occurs Rare occurrence has_complication severity (procedure) critical/death major moderate minor has comorbidity strongly associated Commonly associated Sometimes associated Rarely associated has screening procedure best choice is done sometimes done rarely done has risk factor strongly associated Commonly associated Sometimes associated Rarely associated has diagnostic procedure best choice commonly done sometimes done rarely done has differential diagnosis Strong occurrence/high prevalence Likely occurrence/ commonly prevalent Sometimes occurs/ low prevalence Rare occurrence has drug best choice 2nd line 3rd line rarely given has contraindication drug Strongly avoid/black box Commonly avoid Sometimes avoid Rarely avoid has treatment procedure best choice commonly done sometimes done rarely done has prevention Best option common option sometimes advised rarely advised has physician specialty specific specialty general/specialty broad rare has device standard device acceptable device sometimes used rarely used
  • 9. 9 From EMMeT to H-Graph • Based on EMMeT • Support more complex relations including patient context (Clinical Overview content + more) • Flexible and extensible model to support links to content, model treatment strategies, numeric values, temporal data, etc. Age, sex, weight, … are very simple context. In people with atrial fibrillation presenting acutely without life-threatening haemodynamic instability, offer rate or rhythm control if the onset of the arrhythmia is less than 48 hours, and start rate control if it is more than 48 hours or is uncertain. NICE Guideline Atrial Fibrillation: Management • Continue to support existing indexing pipelines (e.g. ClinicalKey), and tagging use cases (e.g. Clinical Overviews) From EMMeT… …To H-Graph
  • 11. 11 Universal schemas • … are a specific technique from the Information Extraction and the Automatic Knowledge Base Completion literature • … are an unsupervised method to ‘learn’ by combining text extracts with existing knowledge base assertions • Applications: • Extend a medical knowledge base • scan incoming literature to suggest new additions to EMMeT and show the underlying evidence to the taxonomy editor. • scan literature backlog to find evidence for data already in EMMeT • Literature Surveillance • scan incoming literature to find existing facts even if expressed in very different ways • find new concepts in the literature related to an existing EMMeT concept*. Let taxonomy editor decide whether to add new concept and relation to EMMeT
  • 12. 12 Open Information Extraction • Knowledge bases are populated by scanning text and doing Information Extraction • Most information extraction systems are looking for very specific things, like drug-drug interactions • Best accuracy for that one kind of data, but misses out on all the other concepts and relations in the text • For broad knowledge base, use Open Information Extraction that only uses some knowledge of grammar • One weird trick for open information extraction … • ReVerb*: 1. Find “relation phrases” starting with a verb and ending with a verb or preposition 2. Find noun phrases before and after the relation phrase 3. Discard relation phrases not used with multiple combinations of arguments. In addition, brain scans were performed to exclude other causes of dementia. * Fader et al. Identifying Relations for Open Information Extraction
  • 13. 13 ReVerb output After ReVerb pulls out noun phrases, match them up to EMMeT concepts Discard rare concepts, relations, or relations that are not used with many different concepts # SD Documents Scanned 14,000,000 Extracted ReVerb Triples 473,350,566
  • 14. 14 Universal schemas - Initialization • Method to combine ‘facts’ found by machine reading with stronger assertions from ontology. • Build ExR matrix with entity-pairs as rows and relations as columns. • Relation columns can come from EMMeT, or from ReVerb extractions. • Cells contain 1.0 if that pair of entities is connected by that relation.
  • 15. 15 Universal schemas - Prediction • Factorize matrix to ExK and KxR, then recombine. • “Learns” the correlations between text relations and EMMeT relations, in the context of pairs of objects. • Find new triples to go into EMMeT e.g., (glaucoma, has_alternativeProcedure, biofeedback)
  • 16. 16 Content Universal schema Surface form relations Structured relations Factorization model Matrix Construction Open Information Extraction Entity Resolution Matrix Factorization Knowledge graph Curation Predicted relations Matrix Completion Taxonomy Triple Extraction Concept Resolution 14M SD articles 475 M triples 3.3 million relations 49 M relations ~15k -> 1M entries Paul Groth, Sujit Pal, Darin McBeath, Brad Allen, Ron Daniel “Applying Universal Schemas for Domain Specific Ontology Expansion” 5th Workshop on Automated Knowledge Base Construction (AKBC) 2016 Michael Lauruhn, and Paul Groth. "Sources of Change for Modern Knowledge Organization Systems." Knowledge Organization 43, no. 8 (2016). ONTOLOGY MAINTENANCE • Pretty good F measure around -.7 • Good enough with human in the loop • But we want more!
  • 17. 17 Paulheim, Heiko. "Knowledge graph refinement: A survey of approaches and evaluation methods." Semantic web 8.3 (2017): 489-508. WHERE TO GO?
  • 18. 18 MORE THAN LINK PREDICTION • Data has deep hierarchy –link prediction flattens this • Data has hooks into specific content • Schemas are increasingly richly defined – not just a single type • N-ary relations
  • 19. 19 OUR KG’S SHARE PROPERTIES WITH WEB KGS Ringler, Daniel, and Heiko Paulheim. "One knowledge graph to rule them all? Analyzing the differences between DBpedia, YAGO, Wikidata & co." Joint German/Austrian Conference on Artificial Intelligence (Künstliche Intelligenz). Springer, Cham, 2017.
  • 20. 20 The Web of Data http://webdatacommons.org/structureddata/ 2017-12/stats/stats.html http://lodlaundromat.org
  • 21. 21 Two sources of semantics 1.Dereferenceablity 2.Rules
  • 22. 22 Dereferenceablity Looking definitions up – Natural Language and Programmatic
  • 24. 24 Pay attention to the underlying data Paul Groth, Michael Lauruhn, Antony Scerri: “Open Information Extraction on Scientific Text: An Evaluation”, 2018; [http://arxiv.org/abs/1802.05574 arXiv:1802.05574]
  • 25. 25 Embed more Gupta, N., Singh, S., & Roth, D. (2017). Entity linking via joint encoding of types, descriptions, and context. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 2681-2690).
  • 26. 26 Embed more Both, Fabian, Steffen Thoma, and Achim Rettinger. "Cross-modal Knowledge Transfer: Improving the Word Embedding of Apple by Looking at Oranges." Proceedings of the Knowledge Capture Conference. ACM, 2017.
  • 27. 27 Social Semantics? de Rooij, S., Beek, W., Bloem, P., van Harmelen, F., & Schlobach, S. (2016, October). Are Names Meaningful? Quantifying Social Meaning on the Semantic Web. In International Semantic Web Conference (pp. 184-199). Springer, Cham. • Distributional semantics for identifiers (NTN) • But uses the global network • Could we use the discussion space as well? NTN - Socher, R., Chen, D., Manning, C. D., & Ng, A. (2013). Reasoning with neural tensor networks for knowledge base completion. In Advances in neural information processing systems (pp. 926-934).
  • 28. 28 schema:dateModified a rdf:Property ; rdfs:label "dateModified" ; schema:domainIncludes schema:CreativeWork, schema:DataFeedItem ; schema:rangeIncludes schema:Date, schema:DateTime ; rdfs:comment "The date on which the CreativeWork was most recently modified or when the item's entry was modified within a DataFeed." . schema:datePublished a rdf:Property ; rdfs:label "datePublished" ; schema:domainIncludes schema:CreativeWork ; schema:rangeIncludes schema:Date ; rdfs:comment "Date of first broadcast/publication." . schema:disambiguatingDescription a rdf:Property ; rdfs:label "disambiguatingDescription" ; schema:domainIncludes schema:Thing ; schema:rangeIncludes schema:Text ; rdfs:comment "A sub property of description. A short description of the item used to disambiguate from other, similar items. Information from other properties (in particular, name) may be necessary for the description to be useful for disambiguation." ; rdfs:subPropertyOf schema:description . https://www.w3.org/TR/rdf11-mt/ Rules
  • 29. 29 Injecting Background Knowledge as Constraints Rocktäschel, T., Singh, S., & Riedel, S. (2015). Injecting logical background knowledge into embeddings for relation extraction. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 1119-1129)
  • 30. 30 Learning Rules Yang, Fan, Zhilin Yang, and William W. Cohen. "Differentiable learning of logical rules for knowledge base reasoning." Advances in Neural Information Processing Systems. 2017.
  • 31. 31 Combing Both – supporting complex reasoning with subsymbolic representations Rocktäschel, T., & Riedel, S. (2017). End-to-end differentiable proving. In Advances in Neural Information Processing Systems (pp. 3791-3803).
  • 32. 32 Future Welbl, J., Stenetorp, P., & Riedel, S. (2017). Constructing Datasets for Multi-hop Reading Comprehension Across Documents. arXiv preprint arXiv:1710.06481. •Scale •The knowledge base == text? •Multi-hop reasoning •Is everything end-to-end differentiable
  • 33. 33 Conclusion • In practice: data is webby data • Messy • Interconnected • Constraints and rules associated • Semantic Web: semantics can come from multiple different sources • Explicit & implicit • Take advantage of those sources • Knowledge graphs benefit from inference • Your thoughts? • Thanks & We’re hiring! p.groth@elsevier.com | pgroth.com labs.elsevier.com
  • 35. 35 INTEGRATION OF LARGE NUMBERS OF DATA SOURCES Groth, Paul, "The Knowledge-Remixing Bottleneck," Intelligent Systems, IEEE , vol.28, no.5, pp.44,48, Sept.-Oct. 2013 doi: 10.1109/MIS.2013.138 • 10 different extractors • E.g mapping-based infobox extractor • Infobox uses a hand-built ontology based on the 350 • Based on acommonly used English language infoboxes • Integrates with Yago • Yago relies on Wikipedia + Wordnet • Upper ontology from Wordnet and then a mapping to Wikipedia categories based frequencies • Wordnet is built by psycholinguists
  • 36. 36 Units & Measurement Annotations • Time • Dosage • Probability • Percent • Count • Not handled yet Find numbers followed by a unit name or abbreviation (perhaps with scale factor like k, m, G, …). Provide value normalized to SI units. Also provide type of measurement (time, temperature, length, mass, dosage, etc.) based on unit. Handling tolerances, ranges, probabilities, and counts adds complexity. Conjunctions not yet handled but very important. Current work – identify the property being measured (e.g. dosages of AA, indomethacin, HtE, leptin, etc.) Additionally at 120 min following glucose administration, the 100 mg/kg 5g and 5e groups had significantly (P ⩽ 0.005) a greater drop in blood glucose than the 10 and 50 mg/kg groups. In the mouse xenograft model of LLC cells in C57BL/6J mice, once daily administration of AA (50 and 100 mg/kg) inhibited tumor growth in a dose-dependent manner (Fig. 6A and C). Groups of Swiss mice (n = 6) were treated (p.o.) with vehicle, indomethacin (10 mg/kg-Roche®) or HtE (50, 100 or 200 mg/kg) 1 h before administration of carrageenan at 2.5% (Sigma-Aldrich®) injected subcutaneously into the plantar region of the left hind paw and phosphate buffer saline (PBS) in right hind paw. In the experiments designed to study the antidepressant-like effect of the repeated treatment (for 14 days) of EET, the immobility time in the TST and the locomotor activity in the open-field were assessed in independent groups of mice 24 h after the last daily administration of EET (10–100 mg/kg, p.o.). Hoppers containing chow were removed from the cages 1 h before the administration of leptin [depending on studies, 5 mg/kg or 2.5 mg/kg, ip; mouse recombinant leptin obtained from Dr. A.F.

Editor's Notes

  1. 100+ years of expert knowledge
  2. On the left side we see one concept, breast cancer, and a number of pieces of informaiton about it such as synonyms, parent and child concepts, etc. On the right we see some ontological relations from breast cancer to other concepts, such as (breast cancer, has diagnostic procedure, breast biopsy). One of the major differences between EMMeT and what is in UMLS is that we not only provide the basic 3-part relationship, such as (breast cancer, has_treatment, radical mastectomy), we also provide information about the ‘strength’ of that relation according to current medical evidence.
  3. Excerpt from National Institute for Health and Care Excellence (In people with atrial fibrillation presenting acutely without life-threatening haemodynamic instability, offer rate or rhythm control if the onset of the arrhythmia is less than 48 hours, and start rate control if it is more than 48 hours or is uncertain. . In people with atrial fibrillation presenting acutely without life-threatening haemodynamic instability, offer rate or rhythm control if the onset of the arrhythmia is less than 48 hours, and start rate control if it is more than 48 hours or is uncertain.
  4. Using EMMeT, and some code and data we already had, he built a quick prototype and tested it. Performance (in terms of accuracy of predictions) was surprisingly high. Unsupervised is very important because it means the construction of the rough underlying knowledge base is scalable and not limited by the availability of experts. Raw predictions not good enough for fully automatic operation, but are plenty good enough to help taxonomy editors and other people do their job much faster.
  5. Complex axioms Messy Integrates lots of infromation
  6. Predict entity types
  7. Concept similarity Conc svd and pca are combinations
  8. verify the null hypothesis that names are statistically independent from the two meaning proxies
  9. SRL performance TRL 2 And inductive logic programming
  10. Translate to natural language (sli)
  11. One type of NLP annotation Labs is implementing is to mark up measurements – find the quantity, the unit, any tolerances, etc. We also normalize them to SI standards so measurements can be compared and searched. This is not novel research. However, we have not found prior work that attempts to detect the specific object and property being measured. We are using several domain-specific scenarios (mouse cancer, concrete additives, NLP algorithm accuracy, neuronal properties) to find ways that information is expressed. For mouse cancer, it is relatively easy to detect that a measurement is a dosage of a particular drug. But those patterns are of little use in the other scenarios. This work has application to the h-graph – dosages, ages, weights, etc. are all important properties for the patient context. Cohort size and probability are important for the quality of evidence measures.