Szymon Klarman's slides from his lightning talk at Connected Data London. Szymon is a research fellow at the Brunel University, his talk highlighted the current climate of academia publishing and how to makes sense of this information explosion using Knowledge Graphs.
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Knowledge Assembly at Scale with Semantic Probabilistic Techniques
1. Knowledge Assembly at Scale
with Semantic and Probabilistic Techniques
Szymon Klarman
Department of Computer Science
Brunel University London
Connected Data London 2016
2. Scientific publishing deluge
50 mln papers published since 1665
2.5 mln papers published last year
publication output doubling every 9 years
Effects:
narrowing of science and scholarship – we cite a small pool of
mostly recent papers
narrowing of expertise
„publish or perish” principle affects the quality of results
4. Challanges
• ambiguity and vagueness of natural language
• general quality and reliability of the sources
• the inaccuracy of the information extraction tools
• the typical „Vs” of the big data, i.e.: volume, variety, volatility, velocity
• inconsistent, inconclusive or non-reproducible results
• gaps, omissions, contextual assumptions
In vitro curcumin downregulated the expression of Bcl-
2, and Bcl-XL and upregulated the expression of
p53, Bax, Bak, PUMA, Noxa, and Bim at mRNA and protein
levels in prostate cancer cells [14].
5. extraction
reconciliation
filtering
aggregation
evidence knowledge model formation
Knowledge assembly is a process of reconstructing complex knowledge from contextually
asserted atomic statements and data fragments (evidence).
Knowledge assembly
knowledge assembly„[…] A can associate with B […]” <A binding B>
6. extraction assemblyevidence (probabilistic)
knowledge
probabilistic inference
learning
model updates
Probabilistic knowledge assembly
expert input
In Probabilistic Knowledge Assembly (PANDA) framework, evidence with all contextual
information is part of the knowledge base to enable continuous update-assembly loop.
7. extraction assemblyevidence (probabilistic)
knowledge
probabilistic inference
learning
model updates
„A can associate with B”
extractionacurracy = 0.7
published in: „Molecular Cancer”
<A binding B> is supported to degree 0.7 Evidence contradicts the model to degree 0.7
<A binding B> is experimentally confirmed
Probabilistic knowledge assembly
expert input
In Probabilistic Knowledge Assembly (PANDA) framework, evidence with all contextual
information is part of the knowledge base to enable continuous update-assembly loop.
9. Event
Biochemical entity / Event
Statement
ArticleJournal
represents
is extracted from
Molecular interaction
has participant
type
published in
Uncertainty level
Textual evidence
Truth value evidence
has evidence
has truth value
has uncertainty
(of type X)
Knowledge graph: data model
knowledge
12. GRB2 binding GAB1
statement_1
textual
evidence
0.8
extraction prob
GRB2_MOUSE GAB1_MOUSE
has participant A has participant B
True
truth value
PMC123456
extracted from
„In addition, GRB2 can
associate with GAB1”
Event
Binding
Protein
Statement
Article
type
type
subclass of
typetype
type
represents0.7
provenance prob
[...]
In addition, GRB2
can associate with
GAB1
[...]
13. GRB2 binding GAB1
statement_1
textual
evidence
0.8
extraction prob
statement_..99
represents
GRB2_MOUSE GAB1_MOUSE
has participant A has participant B
True
truth value
PMC123456
extracted from
„In addition, GRB2 can
associate with GAB1”
Event
Binding
Protein
Statement
Article
PMC654321 False
„GRB2 does not interact
directly with GAB1”
typetype
type
subclass of
typetype
type type
represents
extractedFrom
0.7
provenance prob
0.6
0.7
provenance prob
extraction prob
textual
evidence
truth value
14. GRB2 binding GAB1
statement_1
textual
evidence
0.8
extraction prob
statement_..99
represents
GRB2_MOUSE GAB1_MOUSE
has participant A has participant B
True
truth value
PMC123456
extracted from
„In addition, GRB2 can
associate with GAB1”
Event
Binding
Protein
Statement
Article
PMC654321 False
„GRB2 does not interact
directly with GAB1”
typetype
type
subclass of
typetype
type type
represents
extractedFrom
0.7
provenance prob
0.6
0.7
provenance prob
extraction prob
textual
evidence
truth value
So what can we really say about
the truth of events?
15. event = <A binding B>
0
0,5
1
{s1} {s1, s2} {s1, s2, s3}
positive support
negative support
inconsistency
Statement Extraction accurracy Provenance uncertainty
S1 = event is true 0.8 0.7
S2 = event is false 0.8 0.7
S3 = event is false 0.9 0.6
Support aggregation
17. Extraction
Accuracy
Provenance
Uncertainty
Total
Uncertainty
Experimental
Confirmation
T F -
0.9 0.1 0.5
Molecule Interaction Gene
Total Uncertainty
Before Experiment
Experimental
Confirmation
Total Uncertainty
After Experiment
curcumin
negative
regulation
BCL2_MOUSE 0.3941 TRUE 0.7489
curcumin
positive
regulation
P53_HUMAN 0.3924 FALSE 0.1569
curcumin
negative
regulation
Q9H014_HUMAN 0.3929 - 0.3929
... ... ... ... ... ...
Expert input
18. Big Mechanism technology
We need to find generic solutions for extracting Big Mechanisms and enabling them to
computational agents.
Probabilistic Knowledge Assembly framework (semantics + probabilistic reasoning) offers:
• a powerful framework for scalable and flexible knowledge assembly tasks
• a uniform knowledge representation model and data access interface based on generic
tools and technologies (particularly W3C standards)
• the use of declarative formalisms facilitates provenance tracking
• continuous update-assembly loop for dynamic environments