Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Knowledge Assembly at Scale
with Semantic and Probabilistic Techniques
Szymon Klarman
Department of Computer Science
Brune...
Scientific publishing deluge
 50 mln papers published since 1665
 2.5 mln papers published last year
 publication outpu...
Big Mechanism
Reading Assembly Explanation
Challanges
• ambiguity and vagueness of natural language
• general quality and reliability of the sources
• the inaccuracy...
extraction
reconciliation
filtering
aggregation
evidence knowledge model formation
Knowledge assembly is a process of reco...
extraction assemblyevidence (probabilistic)
knowledge
probabilistic inference
learning
model updates
Probabilistic knowled...
extraction assemblyevidence (probabilistic)
knowledge
probabilistic inference
learning
model updates
„A can associate with...
 ontologies:
• biomedical (GO, BioPax, MI)
• uncertainty (UNO)
• information/document/provenance description
(IAO, Prov-O...
Event
Biochemical entity / Event
Statement
ArticleJournal
represents
is extracted from
Molecular interaction
has participa...
[...]
In addition, GRB2
can associate with
GAB1
[...]
Knowledge graph: example
statement_1
textual
evidence
0.8
extraction prob
True
truth value
PMC123456
extracted from
„In addition, GRB2 can
associat...
GRB2 binding GAB1
statement_1
textual
evidence
0.8
extraction prob
GRB2_MOUSE GAB1_MOUSE
has participant A has participant...
GRB2 binding GAB1
statement_1
textual
evidence
0.8
extraction prob
statement_..99
represents
GRB2_MOUSE GAB1_MOUSE
has par...
GRB2 binding GAB1
statement_1
textual
evidence
0.8
extraction prob
statement_..99
represents
GRB2_MOUSE GAB1_MOUSE
has par...
event = <A binding B>
0
0,5
1
{s1} {s1, s2} {s1, s2, s3}
positive support
negative support
inconsistency
Statement Extract...
Positive
support
Negative
support
Event
likelihood
Doc_1
Doc_2
Stat_1
Stat_2
Provenance
uncertainty
Extraction
accurracy
T...
Extraction
Accuracy
Provenance
Uncertainty
Total
Uncertainty
Experimental
Confirmation
T F -
0.9 0.1 0.5
Molecule Interact...
Big Mechanism technology
We need to find generic solutions for extracting Big Mechanisms and enabling them to
computationa...
szymon.klarman@gmail.com
Thank you!
Upcoming SlideShare
Loading in …5
×

Knowledge Assembly at Scale with Semantic and Probabilistic Techniques

818 views

Published on

Szymon Klarman's slides from his lightning talk at Connected Data London. Szymon is a research fellow at the Brunel University, his talk highlighted the current climate of academia publishing and how to makes sense of this information explosion using Knowledge Graphs.

Published in: Technology
  • Be the first to comment

Knowledge Assembly at Scale with Semantic and Probabilistic Techniques

  1. 1. Knowledge Assembly at Scale with Semantic and Probabilistic Techniques Szymon Klarman Department of Computer Science Brunel University London Connected Data London 2016
  2. 2. Scientific publishing deluge  50 mln papers published since 1665  2.5 mln papers published last year  publication output doubling every 9 years Effects:  narrowing of science and scholarship – we cite a small pool of mostly recent papers  narrowing of expertise  „publish or perish” principle affects the quality of results
  3. 3. Big Mechanism Reading Assembly Explanation
  4. 4. Challanges • ambiguity and vagueness of natural language • general quality and reliability of the sources • the inaccuracy of the information extraction tools • the typical „Vs” of the big data, i.e.: volume, variety, volatility, velocity • inconsistent, inconclusive or non-reproducible results • gaps, omissions, contextual assumptions In vitro curcumin downregulated the expression of Bcl- 2, and Bcl-XL and upregulated the expression of p53, Bax, Bak, PUMA, Noxa, and Bim at mRNA and protein levels in prostate cancer cells [14].
  5. 5. extraction reconciliation filtering aggregation evidence knowledge model formation Knowledge assembly is a process of reconstructing complex knowledge from contextually asserted atomic statements and data fragments (evidence). Knowledge assembly knowledge assembly„[…] A can associate with B […]” <A binding B>
  6. 6. extraction assemblyevidence (probabilistic) knowledge probabilistic inference learning model updates Probabilistic knowledge assembly expert input In Probabilistic Knowledge Assembly (PANDA) framework, evidence with all contextual information is part of the knowledge base to enable continuous update-assembly loop.
  7. 7. extraction assemblyevidence (probabilistic) knowledge probabilistic inference learning model updates „A can associate with B” extractionacurracy = 0.7 published in: „Molecular Cancer” <A binding B> is supported to degree 0.7 Evidence contradicts the model to degree 0.7 <A binding B> is experimentally confirmed Probabilistic knowledge assembly expert input In Probabilistic Knowledge Assembly (PANDA) framework, evidence with all contextual information is part of the knowledge base to enable continuous update-assembly loop.
  8. 8.  ontologies: • biomedical (GO, BioPax, MI) • uncertainty (UNO) • information/document/provenance description (IAO, Prov-O, VoID, Dublin Core)  (linked) open data via SPARQL endpoints and APIs: • PubMed • journal rankings (SciMago) • bioinformatics databases (UniProt, Chebi, HGNC)  unique identifiers • biochemical enitities • journals / articles Linked data resources
  9. 9. Event Biochemical entity / Event Statement ArticleJournal represents is extracted from Molecular interaction has participant type published in Uncertainty level Textual evidence Truth value evidence has evidence has truth value has uncertainty (of type X) Knowledge graph: data model knowledge
  10. 10. [...] In addition, GRB2 can associate with GAB1 [...] Knowledge graph: example
  11. 11. statement_1 textual evidence 0.8 extraction prob True truth value PMC123456 extracted from „In addition, GRB2 can associate with GAB1” Statement Article type type 0.7 provenance prob [...] In addition, GRB2 can associate with GAB1 [...] Knowledge graph: example
  12. 12. GRB2 binding GAB1 statement_1 textual evidence 0.8 extraction prob GRB2_MOUSE GAB1_MOUSE has participant A has participant B True truth value PMC123456 extracted from „In addition, GRB2 can associate with GAB1” Event Binding Protein Statement Article type type subclass of typetype type represents0.7 provenance prob [...] In addition, GRB2 can associate with GAB1 [...]
  13. 13. GRB2 binding GAB1 statement_1 textual evidence 0.8 extraction prob statement_..99 represents GRB2_MOUSE GAB1_MOUSE has participant A has participant B True truth value PMC123456 extracted from „In addition, GRB2 can associate with GAB1” Event Binding Protein Statement Article PMC654321 False „GRB2 does not interact directly with GAB1” typetype type subclass of typetype type type represents extractedFrom 0.7 provenance prob 0.6 0.7 provenance prob extraction prob textual evidence truth value
  14. 14. GRB2 binding GAB1 statement_1 textual evidence 0.8 extraction prob statement_..99 represents GRB2_MOUSE GAB1_MOUSE has participant A has participant B True truth value PMC123456 extracted from „In addition, GRB2 can associate with GAB1” Event Binding Protein Statement Article PMC654321 False „GRB2 does not interact directly with GAB1” typetype type subclass of typetype type type represents extractedFrom 0.7 provenance prob 0.6 0.7 provenance prob extraction prob textual evidence truth value So what can we really say about the truth of events?
  15. 15. event = <A binding B> 0 0,5 1 {s1} {s1, s2} {s1, s2, s3} positive support negative support inconsistency Statement Extraction accurracy Provenance uncertainty S1 = event is true 0.8 0.7 S2 = event is false 0.8 0.7 S3 = event is false 0.9 0.6 Support aggregation
  16. 16. Positive support Negative support Event likelihood Doc_1 Doc_2 Stat_1 Stat_2 Provenance uncertainty Extraction accurracy Textual uncertainty Stat... Doc... Document part weight Total uncertainty aggregation Probabilistic model (~Bayes net) over linked data expressed via probabilistic logic programming (ProbLog).
  17. 17. Extraction Accuracy Provenance Uncertainty Total Uncertainty Experimental Confirmation T F - 0.9 0.1 0.5 Molecule Interaction Gene Total Uncertainty Before Experiment Experimental Confirmation Total Uncertainty After Experiment curcumin negative regulation BCL2_MOUSE 0.3941 TRUE 0.7489 curcumin positive regulation P53_HUMAN 0.3924 FALSE 0.1569 curcumin negative regulation Q9H014_HUMAN 0.3929 - 0.3929 ... ... ... ... ... ... Expert input
  18. 18. Big Mechanism technology We need to find generic solutions for extracting Big Mechanisms and enabling them to computational agents. Probabilistic Knowledge Assembly framework (semantics + probabilistic reasoning) offers: • a powerful framework for scalable and flexible knowledge assembly tasks • a uniform knowledge representation model and data access interface based on generic tools and technologies (particularly W3C standards) • the use of declarative formalisms facilitates provenance tracking • continuous update-assembly loop for dynamic environments
  19. 19. szymon.klarman@gmail.com Thank you!

×