Advertisement

Knowledge Assembly at Scale with Semantic and Probabilistic Techniques

Connected Data World
Jul. 18, 2016
Advertisement

More Related Content

Similar to Knowledge Assembly at Scale with Semantic and Probabilistic Techniques(20)

More from Connected Data World(20)

Advertisement

Knowledge Assembly at Scale with Semantic and Probabilistic Techniques

  1. Knowledge Assembly at Scale with Semantic and Probabilistic Techniques Szymon Klarman Department of Computer Science Brunel University London Connected Data London 2016
  2. Scientific publishing deluge  50 mln papers published since 1665  2.5 mln papers published last year  publication output doubling every 9 years Effects:  narrowing of science and scholarship – we cite a small pool of mostly recent papers  narrowing of expertise  „publish or perish” principle affects the quality of results
  3. Big Mechanism Reading Assembly Explanation
  4. Challanges • ambiguity and vagueness of natural language • general quality and reliability of the sources • the inaccuracy of the information extraction tools • the typical „Vs” of the big data, i.e.: volume, variety, volatility, velocity • inconsistent, inconclusive or non-reproducible results • gaps, omissions, contextual assumptions In vitro curcumin downregulated the expression of Bcl- 2, and Bcl-XL and upregulated the expression of p53, Bax, Bak, PUMA, Noxa, and Bim at mRNA and protein levels in prostate cancer cells [14].
  5. extraction reconciliation filtering aggregation evidence knowledge model formation Knowledge assembly is a process of reconstructing complex knowledge from contextually asserted atomic statements and data fragments (evidence). Knowledge assembly knowledge assembly„[…] A can associate with B […]” <A binding B>
  6. extraction assemblyevidence (probabilistic) knowledge probabilistic inference learning model updates Probabilistic knowledge assembly expert input In Probabilistic Knowledge Assembly (PANDA) framework, evidence with all contextual information is part of the knowledge base to enable continuous update-assembly loop.
  7. extraction assemblyevidence (probabilistic) knowledge probabilistic inference learning model updates „A can associate with B” extractionacurracy = 0.7 published in: „Molecular Cancer” <A binding B> is supported to degree 0.7 Evidence contradicts the model to degree 0.7 <A binding B> is experimentally confirmed Probabilistic knowledge assembly expert input In Probabilistic Knowledge Assembly (PANDA) framework, evidence with all contextual information is part of the knowledge base to enable continuous update-assembly loop.
  8.  ontologies: • biomedical (GO, BioPax, MI) • uncertainty (UNO) • information/document/provenance description (IAO, Prov-O, VoID, Dublin Core)  (linked) open data via SPARQL endpoints and APIs: • PubMed • journal rankings (SciMago) • bioinformatics databases (UniProt, Chebi, HGNC)  unique identifiers • biochemical enitities • journals / articles Linked data resources
  9. Event Biochemical entity / Event Statement ArticleJournal represents is extracted from Molecular interaction has participant type published in Uncertainty level Textual evidence Truth value evidence has evidence has truth value has uncertainty (of type X) Knowledge graph: data model knowledge
  10. [...] In addition, GRB2 can associate with GAB1 [...] Knowledge graph: example
  11. statement_1 textual evidence 0.8 extraction prob True truth value PMC123456 extracted from „In addition, GRB2 can associate with GAB1” Statement Article type type 0.7 provenance prob [...] In addition, GRB2 can associate with GAB1 [...] Knowledge graph: example
  12. GRB2 binding GAB1 statement_1 textual evidence 0.8 extraction prob GRB2_MOUSE GAB1_MOUSE has participant A has participant B True truth value PMC123456 extracted from „In addition, GRB2 can associate with GAB1” Event Binding Protein Statement Article type type subclass of typetype type represents0.7 provenance prob [...] In addition, GRB2 can associate with GAB1 [...]
  13. GRB2 binding GAB1 statement_1 textual evidence 0.8 extraction prob statement_..99 represents GRB2_MOUSE GAB1_MOUSE has participant A has participant B True truth value PMC123456 extracted from „In addition, GRB2 can associate with GAB1” Event Binding Protein Statement Article PMC654321 False „GRB2 does not interact directly with GAB1” typetype type subclass of typetype type type represents extractedFrom 0.7 provenance prob 0.6 0.7 provenance prob extraction prob textual evidence truth value
  14. GRB2 binding GAB1 statement_1 textual evidence 0.8 extraction prob statement_..99 represents GRB2_MOUSE GAB1_MOUSE has participant A has participant B True truth value PMC123456 extracted from „In addition, GRB2 can associate with GAB1” Event Binding Protein Statement Article PMC654321 False „GRB2 does not interact directly with GAB1” typetype type subclass of typetype type type represents extractedFrom 0.7 provenance prob 0.6 0.7 provenance prob extraction prob textual evidence truth value So what can we really say about the truth of events?
  15. event = <A binding B> 0 0,5 1 {s1} {s1, s2} {s1, s2, s3} positive support negative support inconsistency Statement Extraction accurracy Provenance uncertainty S1 = event is true 0.8 0.7 S2 = event is false 0.8 0.7 S3 = event is false 0.9 0.6 Support aggregation
  16. Positive support Negative support Event likelihood Doc_1 Doc_2 Stat_1 Stat_2 Provenance uncertainty Extraction accurracy Textual uncertainty Stat... Doc... Document part weight Total uncertainty aggregation Probabilistic model (~Bayes net) over linked data expressed via probabilistic logic programming (ProbLog).
  17. Extraction Accuracy Provenance Uncertainty Total Uncertainty Experimental Confirmation T F - 0.9 0.1 0.5 Molecule Interaction Gene Total Uncertainty Before Experiment Experimental Confirmation Total Uncertainty After Experiment curcumin negative regulation BCL2_MOUSE 0.3941 TRUE 0.7489 curcumin positive regulation P53_HUMAN 0.3924 FALSE 0.1569 curcumin negative regulation Q9H014_HUMAN 0.3929 - 0.3929 ... ... ... ... ... ... Expert input
  18. Big Mechanism technology We need to find generic solutions for extracting Big Mechanisms and enabling them to computational agents. Probabilistic Knowledge Assembly framework (semantics + probabilistic reasoning) offers: • a powerful framework for scalable and flexible knowledge assembly tasks • a uniform knowledge representation model and data access interface based on generic tools and technologies (particularly W3C standards) • the use of declarative formalisms facilitates provenance tracking • continuous update-assembly loop for dynamic environments
  19. szymon.klarman@gmail.com Thank you!
Advertisement