Knowledge Assembly at Scale with Semantic Probabilistic Techniques

Knowledge Assembly at Scale
with Semantic and Probabilistic Techniques
Szymon Klarman
Department of Computer Science
Brunel University London
Connected Data London 2016

Scientific publishing deluge
 50 mln papers published since 1665
 2.5 mln papers published last year
 publication output doubling every 9 years
Effects:
 narrowing of science and scholarship – we cite a small pool of
mostly recent papers
 narrowing of expertise
 „publish or perish” principle affects the quality of results

Big Mechanism
Reading Assembly Explanation

Challanges
• ambiguity and vagueness of natural language
• general quality and reliability of the sources
• the inaccuracy of the information extraction tools
• the typical „Vs” of the big data, i.e.: volume, variety, volatility, velocity
• inconsistent, inconclusive or non-reproducible results
• gaps, omissions, contextual assumptions
In vitro curcumin downregulated the expression of Bcl-
2, and Bcl-XL and upregulated the expression of
p53, Bax, Bak, PUMA, Noxa, and Bim at mRNA and protein
levels in prostate cancer cells [14].

extraction
reconciliation
filtering
aggregation
evidence knowledge model formation
Knowledge assembly is a process of reconstructing complex knowledge from contextually
asserted atomic statements and data fragments (evidence).
Knowledge assembly
knowledge assembly„[…] A can associate with B […]” <A binding B>

extraction assemblyevidence (probabilistic)
knowledge
probabilistic inference
learning
model updates
Probabilistic knowledge assembly
expert input
In Probabilistic Knowledge Assembly (PANDA) framework, evidence with all contextual
information is part of the knowledge base to enable continuous update-assembly loop.

extraction assemblyevidence (probabilistic)
knowledge
probabilistic inference
learning
model updates
„A can associate with B”
extractionacurracy = 0.7
published in: „Molecular Cancer”
<A binding B> is supported to degree 0.7 Evidence contradicts the model to degree 0.7
<A binding B> is experimentally confirmed
Probabilistic knowledge assembly
expert input
In Probabilistic Knowledge Assembly (PANDA) framework, evidence with all contextual
information is part of the knowledge base to enable continuous update-assembly loop.

 ontologies:
• biomedical (GO, BioPax, MI)
• uncertainty (UNO)
• information/document/provenance description
(IAO, Prov-O, VoID, Dublin Core)
 (linked) open data via SPARQL endpoints and APIs:
• PubMed
• journal rankings (SciMago)
• bioinformatics databases (UniProt, Chebi, HGNC)
 unique identifiers
• biochemical enitities
• journals / articles
Linked data resources

Event
Biochemical entity / Event
Statement
ArticleJournal
represents
is extracted from
Molecular interaction
has participant
type
published in
Uncertainty level
Textual evidence
Truth value evidence
has evidence
has truth value
has uncertainty
(of type X)
Knowledge graph: data model
knowledge

[...]
In addition, GRB2
can associate with
GAB1
[...]
Knowledge graph: example

statement_1
textual
evidence
0.8
extraction prob
True
truth value
PMC123456
extracted from
„In addition, GRB2 can
associate with GAB1”
Statement
Article
type
type
0.7
provenance prob
[...]
In addition, GRB2
can associate with
GAB1
[...]
Knowledge graph: example

GRB2 binding GAB1
statement_1
textual
evidence
0.8
extraction prob
GRB2_MOUSE GAB1_MOUSE
has participant A has participant B
True
truth value
PMC123456
extracted from
Event
Binding
Protein
Statement
Article
type
type
subclass of
typetype
type
represents0.7
provenance prob
[...]
In addition, GRB2
can associate with
GAB1
[...]

GRB2 binding GAB1
statement_1
textual
evidence
0.8
extraction prob
statement_..99
represents
True
truth value
PMC123456
extracted from
Event
Binding
Protein
Statement
Article
PMC654321 False
„GRB2 does not interact
directly with GAB1”
typetype
type
subclass of
typetype
type type
represents
extractedFrom
0.7
provenance prob
0.6
0.7
provenance prob
extraction prob
textual
evidence
truth value

GRB2 binding GAB1
statement_1
textual
evidence
0.8
extraction prob
statement_..99
represents
True
truth value
PMC123456
extracted from
Event
Binding
Protein
Statement
Article
PMC654321 False
„GRB2 does not interact
directly with GAB1”
typetype
type
subclass of
typetype
type type
represents
extractedFrom
0.7
provenance prob
0.6
0.7
provenance prob
extraction prob
textual
evidence
truth value
So what can we really say about
the truth of events?

event = <A binding B>
0
0,5
1
{s1} {s1, s2} {s1, s2, s3}
positive support
negative support
inconsistency
Statement Extraction accurracy Provenance uncertainty
S1 = event is true 0.8 0.7
S2 = event is false 0.8 0.7
S3 = event is false 0.9 0.6
Support aggregation

Positive
support
Negative
support
Event
likelihood
Doc_1
Doc_2
Stat_1
Stat_2
Provenance
uncertainty
Extraction
accurracy
Textual
uncertainty
Stat...
Doc...
Document
part weight
Total uncertainty aggregation
Probabilistic model (~Bayes net) over linked data expressed via probabilistic logic
programming (ProbLog).

Extraction
Accuracy
Provenance
Uncertainty
Total
Uncertainty
Experimental
Confirmation
T F -
0.9 0.1 0.5
Molecule Interaction Gene
Total Uncertainty
Before Experiment
Experimental
Confirmation
Total Uncertainty
After Experiment
curcumin
negative
regulation
BCL2_MOUSE 0.3941 TRUE 0.7489
curcumin
positive
regulation
P53_HUMAN 0.3924 FALSE 0.1569
curcumin
negative
regulation
Q9H014_HUMAN 0.3929 - 0.3929
... ... ... ... ... ...
Expert input

Big Mechanism technology
We need to find generic solutions for extracting Big Mechanisms and enabling them to
computational agents.
Probabilistic Knowledge Assembly framework (semantics + probabilistic reasoning) offers:
• a powerful framework for scalable and flexible knowledge assembly tasks
• a uniform knowledge representation model and data access interface based on generic
tools and technologies (particularly W3C standards)
• the use of declarative formalisms facilitates provenance tracking
• continuous update-assembly loop for dynamic environments

szymon.klarman@gmail.com
Thank you!

Knowledge Assembly at Scale with Semantic Probabilistic Techniques

Recommended

Recommended

More Related Content

Similar to Knowledge Assembly at Scale with Semantic Probabilistic Techniques

Similar to Knowledge Assembly at Scale with Semantic Probabilistic Techniques (20)

More from Connected Data World

More from Connected Data World (20)

Recently uploaded

Recently uploaded (20)

Knowledge Assembly at Scale with Semantic Probabilistic Techniques