CoronaWhy Seminar
Augmented Claim Craft Ecosystem:
HyperKnowledge-
OpenSherlock Overview
Jack Park
TopicQuests Foundation
ORCID: https://orcid.org/0000-0002-4356-4928
Marc-Antoine Parent
Conversence
ORCID: https://orcid.org/0000-0003-4159-7678
Does ApoE4 cause
Alzheimer’s?
A focus question about claims
people want answered
Wikipedia
A motivating story
● Alzheimer’s Context *
○ Dr. Trumble and the Tsimané ** Project ***
■ Anthropologist studying evolutionary medicine
■ Indigenous people, Bolivia
■ Higher elderly cognitive performance with copy of ApoE4 gene
○ Dr. Liddelow studying immune response in brains
■ Some people die without dementia but with brains clogged with Alzheimer’s pathology
● A Quote (emphasis mine):
“I asked Dr. Liddelow whether he was familiar with the Tsimané research. He admitted that he was not — the field of
evolutionary biology is distant from his own. But he said the hypothesis that the ApoE4 gene evolved to protect our brains
from the effects of parasitic infection made perfect sense. “That’s absolutely in line with what we found. For our ancestors,
an ApoE4 gene could have been beneficial,” Dr. Liddelow said, in part because it would have helped the astrocytes go on
the attack.” *https://www.nytimes.com/2017/07/14/opinion/sunday/alzheimers-cure-south-america.html
**https://en.wikipedia.org/wiki/Tsiman%C3%A9
***http://www.unm.edu/~tsimane/
From documents to augmenting knowledge work
Documents
Structured
Documents
Basic
claim
discovery
Entity
identification
Augmented
Claim CraftCoronaWhy OpenSherlock 1Spacy ?
Claim representation in HyperKnowledge
Aim: To be able to bring claims together: compare and federate claims,
make claims about claims...
The data model should be rich enough to express claims found in the
literature, and claims about those claims.
Basic claim representation
Hydroxychloroquine is used to treat Covid-19
Basic claim representation
Hydroxychloroquine is used to treat Covid-19
Subject - Predicate - Object (used in RDF)
Each concept (topic) has an identifier (URI) to reduce ambiguity
Covid-19
wiki:Q84263196
Hydroxychloroquine
wiki:Q84263196
Drug used for treatment
wikip:P2176
paper “Hydroxychloroquine and azithromycin as a treatment of COVID‐19: results of an open‐label
non‐randomized clinical trial” describes a study where Hydroxychloroquine was used to treat Covid-19.
“Citation needed”: qualifying claims
“Citation needed”: qualifying claims
paper “Hydroxychloroquine and azithromycin as a treatment of COVID‐19: results of an open‐label
non‐randomized clinical trial” describes a study where Hydroxychloroquine was used to treat Covid-19.
RDF does this through reification, Wikidata just gives identity to claims (snaks)
Give an identity to the claim itself, so we can make further claims about that claim, such as
provenance, authority, etc.
Hydroxychloroquine
wiki:Q84263196
Covid-19
wiki:Q84263196
Drug used for treatment
wikip:P2176
Claim
cc00feb7-4b9b-121d-898b-7c6652b2b406
Hydroxychloroquine and azithromycin …
DOI:10.1016/J.IJANTIMICAG.2020.105949
rdf:subject
rdf:predicate
rdf:object
Stated in
wikip:P248
Complex claims
The experimental protocol involved oral absorption of 200 mg of hydroxychloroquine three times a
day for 10 days, for 20 patients whose average age was 51 years (σ=19)
Many claims involve many entities in complex relationships, and should be represented as such.
Complex claims
The experimental protocol involved oral absorption of 200 mg of hydroxychloroquine three times a
day for 10 days, for 20 patients whose average age was 51 years (σ=19)
Many claims involve many entities in complex relationships, and should be represented as such.
Topic mapping, frames (Minsky), KIF
Hydroxychloroquine
wiki:Q84263196 Covid-19
wiki:Q84263196
Medical
Protocolsubstance
disease
200
mgamount
3x/
day
frequency
10
days
duration
Group 2
Group 1
Control
group
Study
group
20 px
size
μ=51 σ=19
age
Introducing Topic Maps
● A Topic Map is like a library without all the books
○ A Topic Map is indexical
■ Like a card catalog
■ Each topic has its own representation
■ Improving on a card catalog, a topic can be identified many different ways
■ Captures metadata and optional content
○ A Topic Map is relational
■ Like a good road map
■ Topics are connected by associations (relations)
■ Topics point to their occurrences in the territory
○ A Topic Map is organized
■ Multiple records on the same topic are co-located (stored as one topic) in the map
Topic Map Structure
Some claims are hypothetical
If social distancing measures are not followed, we risk a second wave.
Some claims are hypothetical
If social distancing measures are not followed, we risk a second wave.
We need a way to represent hypothetical scenarios.
The hypothetical world is a whole separate universe of discourse, which we represent as a
subgraph. (Sowa’s Conceptual graphs)
Event:
infection rate
> 50 % rise
Social
distancing
norms
Compliance
level
Target
population
< 80%
consequence
Hypothetical situation
Points of view should be explicit
Covid-19, as depicted by Fox News, is not more serious than a minor cold.
Epidemiologists’ estimates of Covid-19 transmission rates have not been explained by virologists.
The results of laboratory X have been contested.
Claims are made by agents, and adopted by communities. It is sometimes important to distinguish
references to a topic as it is understood by a specific agent or community.
Points of view should be explicit
Covid-19, as depicted by Fox News, is not more serious than a minor cold.
Epidemiologists’ estimates of Covid-19 transmission rates have not been explained by virologists.
The results of laboratory X have been contested.
Claims are made by agents, and adopted by communities. It is sometimes important to distinguish
references to a topic as it is understood by a specific agent or community.
Each claim has to be identified as coming from a specific source, maintained by agents. The
properties and links attributed to a topic can be different for each source. Source federation is
explicit.
BA
7
CA
6
CA
8
Claims are made and retracted
Lab X claimed to find reinfection after remission, but those cases were due to false negative testing
in an asymptomatic phase.
People can change their minds; claims can be a correction to earlier claim.
Claims are made and retracted
Lab X claimed to find reinfection after remission, but those cases were due to false negative testing
in an asymptomatic phase.
We view claims from a source as an event stream. Some events in the stream can explicitly
contradict earlier events
A
B
A
3
A
C
A
5
x y x y
A
D
x
...
{
“@id”: “A”,
“x”: [“B”, “D”],
“y”: 5
}
Different communities use different names or identifiers
English names for Covid-19 in Wikidata: 2019-nCoV acute respiratory disease ; coronavirus disease 2019 ; COVID19 ; COVID 19 ;
Covid-19 ; 2019 novel coronavirus pneumonia ; Coronavirus disease 2019 ; nCOVD19 ; nCOVD 19 ; nCOVD-19 ; COVID-2019 ; seafood
market pneumonia ; Wuhan pneumonia ; 2019 NCP ; WuRS ; severe acute respiratory syndrome type 2 ; SARS-CoV-2 infection ; 2019 novel
coronavirus respiratory syndrome ; Wuhan respiratory syndrome ; novel coronavirus ; coronavirus
Of course we’d want to also search for 2019冠状病毒病 etc.
RDF identifiers in Wikidata:
<http://www.wikidata.org/wiki/Q84263196>
<https://catalogue.bnf.fr/ark:/12148/cb17874453m>
<https://d-nb.info/gnd/1206347392>
<https://id.loc.gov/authorities/sh2020000570>
<https://meshb.nlm.nih.gov/#/record/ui?ui=C000657245>
<http://id.nlm.nih.gov/mesh/T001007884>
<http://id.nlm.nih.gov/mesh/M000681578>
<https://www.courrierinternational.com/sujet/covid-19>
<http://www.disease-ontology.org/?id=DOID:0080600>
<http://www.diseasesdatabase.com/ddb60833.htm>
<http://emedicine.medscape.com/article/2500114-overview>
<https://www.britannica.com/science/COVID-19>
<https://www.enciclopedia.cat/EC-GEC-23470930.xml>
<https://icd.who.int/browse10/2019/en#/U07.1>
<https://icd.who.int/browse10/2019/en#/U07.2>
<https://icd.who.int/dev11/f/en#/http://id.who.int/icd/entity/1790791774>
<https://www.malacards.org/card/2019_novel_coronavirus>
<https://www.ne.se/uppslagsverk/encyklopedi/lång/covid-19>
<https://www.nhs.uk/conditions/coronavirus-covid-19>
<http://www.omegawiki.org/DefinedMeaning:1733730>
<https://philpapers.org/browse/covid-19>
<https://www.quora.com/topic/COVID>
<http://snomed.info/id/840539006>
<https://sml.snl.no/covid-19>
<https://www.reddit.com/r/Coronavirus/>
<https://www.reddit.com/r/COVID19/>
<http://www.treccani.it/enciclopedia/ricerca/COVID>
<https://tvtropes.org/pmwiki/pmwiki.php/UsefulNotes/CoronavirusDiseas
e2019Pandemic>
<http://www.yso.fi/onto/yso/p38829>
<https://denstoredanske.lex.dk/COVID-19>
Note missing: kg:/m/01cpyy (Google)
Different communities use different names or identifiers
Many concepts share the same name. Many names share the same concept.
Names have to be disambiguated. Global concept identifiers can be tentatively
identified, but all identifiers are tagged with their source, and the identifier X as
used by source A may not correspond to the concept referred to by X in source B.
Unifying topics is the domain of topic mappings
Topic Map as a federation platform
● A topic map aggressively works to ensure that, for each individual subject represented in the map,
there will be one and only one location for that subject.
● To accomplish that, when a decision is made that two subject representations in the map are about
the same subject, a new representation - a VirtualProxy- will be created which non-redundantly
contains information from both - or any other topic which later enters the topic map.
Federating Silos: introduction
● Siloed Research Topics
○ Raynaud’s Syndrome
Therapies
○ Fish Oil
● Machine Reading collects
graph structures from
different sources
○ Form tuple-like
structures which are
graphs
Federating Silos: Topic Mapping
● TopicMap Process
○ Rule:
■ One Location in the Map for
each Subject
■ Federates (merges topics
about the same subject)
collected from different
resources
Topic merging opens questions and creates events
● Does Fish Oil
qualify as a
Raynaud’s
therapy?
○ Turns out
Yes
● Topic Merge
events feed back
into the
HyperKnowledge
ecosystem
Distributed federation in HyperKnowledge
Each source maintains its own table of topic merges, and federated queries must
keep track of those equivalences.
This can be expanded (with normalization) to identification of composite topics.
The plan is for the HK ecosystem to maintain a probabilistic (bloom) map of which
sources maintain information about which topics.
Comparing claims
The research on Hydroxychloroquin in study X was contradicted in study Y.
132/203 virologists consulted believe hydroxychloroquin’s side effects to be too severe for Covid-19
treatment.
Comparing claims
The research on Hydroxychloroquin in study X was contradicted in study Y.
132/203 virologists consulted believe hydroxychloroquin’s side effects to be too severe for Covid-19
treatment.
Once claims have an identity, we can compare claims and make higher-level claims.
Hydroxy-
chloroquine
Covid-19
Drug used
for treatment
Claim 1 DOI:10.1016/
J.ijantimicag.
2020.105949
Hydroxy-
chloroquine
refractory
ventricular
arrhythmia
Side-effect
Claim 2
DOI:10.1080/
15563650500514558
risk/benefit
analysis
Risks outweigh
benefits
risks
benefits
outcome
Comparing claims
The research on Hydroxychloroquin in study X was contradicted in study Y.
132/203 virologists consulted believe hydroxychloroquin’s side effects to be too severe for Covid-19
treatment.
Claim streams representing individual points of views can be combined into “community” streams,
and into combined values
...
...
...
...
...
...
...
So what can be a stream?
Comparing claims allows combining claims in larger aggregates
● Base case: One person’s point of view
● One team (guild), with a procedure to merge how member’s PoV’s streams are
combined (can be a rule like majority, consent, etc.)
● A thematic collation, with points of dis/agreement marked without resolution
● A curated thematic overview, with data-driven evidence
● Eventually: global federation
Opposite end of the spectrum: Casual small streams (like git branches)
● A thought experiment or hypothetical situation
● A computed slice (query) of a stream can be treated like a stream
Inference engine ecosystem
Event sourcing as a backbone for knowledge-based microservices
Services subscribe to claims, produces calculations, main queue subscribes to calculations
Reactive calculations
Eg.: Rule-based inference,
Live query maintenance,
Machine learning,
Inference combination, etc.
...
Inference engine ecosystem
Synthesis as a service
Synthesis can be simple
statistics (who believes this),
sample size, Bayesian, etc.
Simple awareness of which
claims are established or
contested (and by whom) is
useful
...
...
...
Inference engine ecosystem
Augmented collaboration: start with a single-source view of a claim stream
...
...
...
Inference engine ecosystem
Augmented collaboration: become aware of relevant claims from federation stream
...
...
...
HyperKnowledge
From documents to augmenting knowledge work
Documents
Structured
Documents
Basic
claim
discovery
Entity
identification
Augmented Claim Craft
- Higher order claim
discovery
- Claim combination
- Rule-based claim
micro-services
- ML-based claims
- Human claim
identification
CoronaWhy OpenSherlock 1Spacy !
Structured documents to claims with OpenSherlock
● Basic Setup
○ Each document is
■ mapped to a JSON structure and transferred to a Document database
■ broken into individual paragraphs
○ Each paragraph is becomes a Kafka event
● Machine Reading
○ From paragraph Kafka events, each paragraph is
■ Broken into sentences by SpaCy
○ Each sentence is
■ Parsed by SpaCy
■ Parsed by LinkGrammar parser
■ Parse results are processed by a tuple detector to identify claims
OpenSherlock: example sentence
The pandemic of obesity, type 2 diabetes
mellitus (T2DM) and nonalcoholic fatty
liver disease (NAFLD) has frequently been
associated with dietary intake of
saturated fats (1) and specifically with
dietary palm oil (PO) (2).
Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5272194/
OpenSherlock: expected claims from that sentence
Obesity associated with saturated fats
Obesity associated with palm oil
T2DM associated with saturated fats
Type 2 Diabetes mellitus has acronym T2DM
T2DM associated with palm oil
NAFLD associated with saturated fats
Nonalcoholic Fatty Liver Disease has acronym NAFML
NAFLD associated with palm oil
“Obesity associated with saturated fats”: The Predicate
Obesity associated with saturated fats: The Object
“Obesity associated with saturated fats”: The Subject
Next steps
Higher-order claims are still beyond current NLP techniques; but deep learning
tools can augment intelligence of researchers identifying claims, and symbolic AI
can be used to identify logical connections and contradictions.
The HyperKnowledge federation can help researchers craft higher-order claims
by identifying both the logical and social neighbourhood of claims.
We would like this ecosystem to be how the next Drs. Liddelow and Trumble get to
be aware of one another.
References
https://hyperknowledge.org
https://topicquests.org
RDF, W3C
Wikidata data model primer
Patrick Durusau, Steven R. Newcomb, and Robert Barta. Topic maps reference model. ISO standard 13250-5 CD, 11 2007.
John F. Sowa. Handbook of Knowledge Representation, chapter Conceptual Graphs, pages 213–237. Elsevier, 2008. isbn:
9780444522115
Knowledge Interchange Format, Stanford
https://ipld.io

Augmented Claim Craft Ecosystem: HyperKnowledge- OpenSherlock Overview

  • 1.
    CoronaWhy Seminar Augmented ClaimCraft Ecosystem: HyperKnowledge- OpenSherlock Overview Jack Park TopicQuests Foundation ORCID: https://orcid.org/0000-0002-4356-4928 Marc-Antoine Parent Conversence ORCID: https://orcid.org/0000-0003-4159-7678
  • 2.
    Does ApoE4 cause Alzheimer’s? Afocus question about claims people want answered Wikipedia
  • 3.
    A motivating story ●Alzheimer’s Context * ○ Dr. Trumble and the Tsimané ** Project *** ■ Anthropologist studying evolutionary medicine ■ Indigenous people, Bolivia ■ Higher elderly cognitive performance with copy of ApoE4 gene ○ Dr. Liddelow studying immune response in brains ■ Some people die without dementia but with brains clogged with Alzheimer’s pathology ● A Quote (emphasis mine): “I asked Dr. Liddelow whether he was familiar with the Tsimané research. He admitted that he was not — the field of evolutionary biology is distant from his own. But he said the hypothesis that the ApoE4 gene evolved to protect our brains from the effects of parasitic infection made perfect sense. “That’s absolutely in line with what we found. For our ancestors, an ApoE4 gene could have been beneficial,” Dr. Liddelow said, in part because it would have helped the astrocytes go on the attack.” *https://www.nytimes.com/2017/07/14/opinion/sunday/alzheimers-cure-south-america.html **https://en.wikipedia.org/wiki/Tsiman%C3%A9 ***http://www.unm.edu/~tsimane/
  • 4.
    From documents toaugmenting knowledge work Documents Structured Documents Basic claim discovery Entity identification Augmented Claim CraftCoronaWhy OpenSherlock 1Spacy ?
  • 5.
    Claim representation inHyperKnowledge Aim: To be able to bring claims together: compare and federate claims, make claims about claims... The data model should be rich enough to express claims found in the literature, and claims about those claims.
  • 6.
  • 7.
    Basic claim representation Hydroxychloroquineis used to treat Covid-19 Subject - Predicate - Object (used in RDF) Each concept (topic) has an identifier (URI) to reduce ambiguity Covid-19 wiki:Q84263196 Hydroxychloroquine wiki:Q84263196 Drug used for treatment wikip:P2176
  • 8.
    paper “Hydroxychloroquine andazithromycin as a treatment of COVID‐19: results of an open‐label non‐randomized clinical trial” describes a study where Hydroxychloroquine was used to treat Covid-19. “Citation needed”: qualifying claims
  • 9.
    “Citation needed”: qualifyingclaims paper “Hydroxychloroquine and azithromycin as a treatment of COVID‐19: results of an open‐label non‐randomized clinical trial” describes a study where Hydroxychloroquine was used to treat Covid-19. RDF does this through reification, Wikidata just gives identity to claims (snaks) Give an identity to the claim itself, so we can make further claims about that claim, such as provenance, authority, etc. Hydroxychloroquine wiki:Q84263196 Covid-19 wiki:Q84263196 Drug used for treatment wikip:P2176 Claim cc00feb7-4b9b-121d-898b-7c6652b2b406 Hydroxychloroquine and azithromycin … DOI:10.1016/J.IJANTIMICAG.2020.105949 rdf:subject rdf:predicate rdf:object Stated in wikip:P248
  • 10.
    Complex claims The experimentalprotocol involved oral absorption of 200 mg of hydroxychloroquine three times a day for 10 days, for 20 patients whose average age was 51 years (σ=19) Many claims involve many entities in complex relationships, and should be represented as such.
  • 11.
    Complex claims The experimentalprotocol involved oral absorption of 200 mg of hydroxychloroquine three times a day for 10 days, for 20 patients whose average age was 51 years (σ=19) Many claims involve many entities in complex relationships, and should be represented as such. Topic mapping, frames (Minsky), KIF Hydroxychloroquine wiki:Q84263196 Covid-19 wiki:Q84263196 Medical Protocolsubstance disease 200 mgamount 3x/ day frequency 10 days duration Group 2 Group 1 Control group Study group 20 px size μ=51 σ=19 age
  • 12.
    Introducing Topic Maps ●A Topic Map is like a library without all the books ○ A Topic Map is indexical ■ Like a card catalog ■ Each topic has its own representation ■ Improving on a card catalog, a topic can be identified many different ways ■ Captures metadata and optional content ○ A Topic Map is relational ■ Like a good road map ■ Topics are connected by associations (relations) ■ Topics point to their occurrences in the territory ○ A Topic Map is organized ■ Multiple records on the same topic are co-located (stored as one topic) in the map
  • 13.
  • 14.
    Some claims arehypothetical If social distancing measures are not followed, we risk a second wave.
  • 15.
    Some claims arehypothetical If social distancing measures are not followed, we risk a second wave. We need a way to represent hypothetical scenarios. The hypothetical world is a whole separate universe of discourse, which we represent as a subgraph. (Sowa’s Conceptual graphs) Event: infection rate > 50 % rise Social distancing norms Compliance level Target population < 80% consequence Hypothetical situation
  • 16.
    Points of viewshould be explicit Covid-19, as depicted by Fox News, is not more serious than a minor cold. Epidemiologists’ estimates of Covid-19 transmission rates have not been explained by virologists. The results of laboratory X have been contested. Claims are made by agents, and adopted by communities. It is sometimes important to distinguish references to a topic as it is understood by a specific agent or community.
  • 17.
    Points of viewshould be explicit Covid-19, as depicted by Fox News, is not more serious than a minor cold. Epidemiologists’ estimates of Covid-19 transmission rates have not been explained by virologists. The results of laboratory X have been contested. Claims are made by agents, and adopted by communities. It is sometimes important to distinguish references to a topic as it is understood by a specific agent or community. Each claim has to be identified as coming from a specific source, maintained by agents. The properties and links attributed to a topic can be different for each source. Source federation is explicit. BA 7 CA 6 CA 8
  • 18.
    Claims are madeand retracted Lab X claimed to find reinfection after remission, but those cases were due to false negative testing in an asymptomatic phase. People can change their minds; claims can be a correction to earlier claim.
  • 19.
    Claims are madeand retracted Lab X claimed to find reinfection after remission, but those cases were due to false negative testing in an asymptomatic phase. We view claims from a source as an event stream. Some events in the stream can explicitly contradict earlier events A B A 3 A C A 5 x y x y A D x ... { “@id”: “A”, “x”: [“B”, “D”], “y”: 5 }
  • 20.
    Different communities usedifferent names or identifiers English names for Covid-19 in Wikidata: 2019-nCoV acute respiratory disease ; coronavirus disease 2019 ; COVID19 ; COVID 19 ; Covid-19 ; 2019 novel coronavirus pneumonia ; Coronavirus disease 2019 ; nCOVD19 ; nCOVD 19 ; nCOVD-19 ; COVID-2019 ; seafood market pneumonia ; Wuhan pneumonia ; 2019 NCP ; WuRS ; severe acute respiratory syndrome type 2 ; SARS-CoV-2 infection ; 2019 novel coronavirus respiratory syndrome ; Wuhan respiratory syndrome ; novel coronavirus ; coronavirus Of course we’d want to also search for 2019冠状病毒病 etc. RDF identifiers in Wikidata: <http://www.wikidata.org/wiki/Q84263196> <https://catalogue.bnf.fr/ark:/12148/cb17874453m> <https://d-nb.info/gnd/1206347392> <https://id.loc.gov/authorities/sh2020000570> <https://meshb.nlm.nih.gov/#/record/ui?ui=C000657245> <http://id.nlm.nih.gov/mesh/T001007884> <http://id.nlm.nih.gov/mesh/M000681578> <https://www.courrierinternational.com/sujet/covid-19> <http://www.disease-ontology.org/?id=DOID:0080600> <http://www.diseasesdatabase.com/ddb60833.htm> <http://emedicine.medscape.com/article/2500114-overview> <https://www.britannica.com/science/COVID-19> <https://www.enciclopedia.cat/EC-GEC-23470930.xml> <https://icd.who.int/browse10/2019/en#/U07.1> <https://icd.who.int/browse10/2019/en#/U07.2> <https://icd.who.int/dev11/f/en#/http://id.who.int/icd/entity/1790791774> <https://www.malacards.org/card/2019_novel_coronavirus> <https://www.ne.se/uppslagsverk/encyklopedi/lång/covid-19> <https://www.nhs.uk/conditions/coronavirus-covid-19> <http://www.omegawiki.org/DefinedMeaning:1733730> <https://philpapers.org/browse/covid-19> <https://www.quora.com/topic/COVID> <http://snomed.info/id/840539006> <https://sml.snl.no/covid-19> <https://www.reddit.com/r/Coronavirus/> <https://www.reddit.com/r/COVID19/> <http://www.treccani.it/enciclopedia/ricerca/COVID> <https://tvtropes.org/pmwiki/pmwiki.php/UsefulNotes/CoronavirusDiseas e2019Pandemic> <http://www.yso.fi/onto/yso/p38829> <https://denstoredanske.lex.dk/COVID-19> Note missing: kg:/m/01cpyy (Google)
  • 21.
    Different communities usedifferent names or identifiers Many concepts share the same name. Many names share the same concept. Names have to be disambiguated. Global concept identifiers can be tentatively identified, but all identifiers are tagged with their source, and the identifier X as used by source A may not correspond to the concept referred to by X in source B. Unifying topics is the domain of topic mappings
  • 22.
    Topic Map asa federation platform ● A topic map aggressively works to ensure that, for each individual subject represented in the map, there will be one and only one location for that subject. ● To accomplish that, when a decision is made that two subject representations in the map are about the same subject, a new representation - a VirtualProxy- will be created which non-redundantly contains information from both - or any other topic which later enters the topic map.
  • 23.
    Federating Silos: introduction ●Siloed Research Topics ○ Raynaud’s Syndrome Therapies ○ Fish Oil ● Machine Reading collects graph structures from different sources ○ Form tuple-like structures which are graphs
  • 24.
    Federating Silos: TopicMapping ● TopicMap Process ○ Rule: ■ One Location in the Map for each Subject ■ Federates (merges topics about the same subject) collected from different resources
  • 25.
    Topic merging opensquestions and creates events ● Does Fish Oil qualify as a Raynaud’s therapy? ○ Turns out Yes ● Topic Merge events feed back into the HyperKnowledge ecosystem
  • 26.
    Distributed federation inHyperKnowledge Each source maintains its own table of topic merges, and federated queries must keep track of those equivalences. This can be expanded (with normalization) to identification of composite topics. The plan is for the HK ecosystem to maintain a probabilistic (bloom) map of which sources maintain information about which topics.
  • 27.
    Comparing claims The researchon Hydroxychloroquin in study X was contradicted in study Y. 132/203 virologists consulted believe hydroxychloroquin’s side effects to be too severe for Covid-19 treatment.
  • 28.
    Comparing claims The researchon Hydroxychloroquin in study X was contradicted in study Y. 132/203 virologists consulted believe hydroxychloroquin’s side effects to be too severe for Covid-19 treatment. Once claims have an identity, we can compare claims and make higher-level claims. Hydroxy- chloroquine Covid-19 Drug used for treatment Claim 1 DOI:10.1016/ J.ijantimicag. 2020.105949 Hydroxy- chloroquine refractory ventricular arrhythmia Side-effect Claim 2 DOI:10.1080/ 15563650500514558 risk/benefit analysis Risks outweigh benefits risks benefits outcome
  • 29.
    Comparing claims The researchon Hydroxychloroquin in study X was contradicted in study Y. 132/203 virologists consulted believe hydroxychloroquin’s side effects to be too severe for Covid-19 treatment. Claim streams representing individual points of views can be combined into “community” streams, and into combined values ... ... ... ... ... ... ...
  • 30.
    So what canbe a stream? Comparing claims allows combining claims in larger aggregates ● Base case: One person’s point of view ● One team (guild), with a procedure to merge how member’s PoV’s streams are combined (can be a rule like majority, consent, etc.) ● A thematic collation, with points of dis/agreement marked without resolution ● A curated thematic overview, with data-driven evidence ● Eventually: global federation Opposite end of the spectrum: Casual small streams (like git branches) ● A thought experiment or hypothetical situation ● A computed slice (query) of a stream can be treated like a stream
  • 31.
    Inference engine ecosystem Eventsourcing as a backbone for knowledge-based microservices Services subscribe to claims, produces calculations, main queue subscribes to calculations Reactive calculations Eg.: Rule-based inference, Live query maintenance, Machine learning, Inference combination, etc. ...
  • 32.
    Inference engine ecosystem Synthesisas a service Synthesis can be simple statistics (who believes this), sample size, Bayesian, etc. Simple awareness of which claims are established or contested (and by whom) is useful ... ... ...
  • 33.
    Inference engine ecosystem Augmentedcollaboration: start with a single-source view of a claim stream ... ... ...
  • 34.
    Inference engine ecosystem Augmentedcollaboration: become aware of relevant claims from federation stream ... ... ...
  • 35.
    HyperKnowledge From documents toaugmenting knowledge work Documents Structured Documents Basic claim discovery Entity identification Augmented Claim Craft - Higher order claim discovery - Claim combination - Rule-based claim micro-services - ML-based claims - Human claim identification CoronaWhy OpenSherlock 1Spacy !
  • 36.
    Structured documents toclaims with OpenSherlock ● Basic Setup ○ Each document is ■ mapped to a JSON structure and transferred to a Document database ■ broken into individual paragraphs ○ Each paragraph is becomes a Kafka event ● Machine Reading ○ From paragraph Kafka events, each paragraph is ■ Broken into sentences by SpaCy ○ Each sentence is ■ Parsed by SpaCy ■ Parsed by LinkGrammar parser ■ Parse results are processed by a tuple detector to identify claims
  • 37.
    OpenSherlock: example sentence Thepandemic of obesity, type 2 diabetes mellitus (T2DM) and nonalcoholic fatty liver disease (NAFLD) has frequently been associated with dietary intake of saturated fats (1) and specifically with dietary palm oil (PO) (2). Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5272194/
  • 38.
    OpenSherlock: expected claimsfrom that sentence Obesity associated with saturated fats Obesity associated with palm oil T2DM associated with saturated fats Type 2 Diabetes mellitus has acronym T2DM T2DM associated with palm oil NAFLD associated with saturated fats Nonalcoholic Fatty Liver Disease has acronym NAFML NAFLD associated with palm oil
  • 39.
    “Obesity associated withsaturated fats”: The Predicate
  • 40.
    Obesity associated withsaturated fats: The Object
  • 41.
    “Obesity associated withsaturated fats”: The Subject
  • 42.
    Next steps Higher-order claimsare still beyond current NLP techniques; but deep learning tools can augment intelligence of researchers identifying claims, and symbolic AI can be used to identify logical connections and contradictions. The HyperKnowledge federation can help researchers craft higher-order claims by identifying both the logical and social neighbourhood of claims. We would like this ecosystem to be how the next Drs. Liddelow and Trumble get to be aware of one another.
  • 43.
    References https://hyperknowledge.org https://topicquests.org RDF, W3C Wikidata datamodel primer Patrick Durusau, Steven R. Newcomb, and Robert Barta. Topic maps reference model. ISO standard 13250-5 CD, 11 2007. John F. Sowa. Handbook of Knowledge Representation, chapter Conceptual Graphs, pages 213–237. Elsevier, 2008. isbn: 9780444522115 Knowledge Interchange Format, Stanford https://ipld.io