SlideShare a Scribd company logo
1 of 19
Download to read offline
Knowledge Assembly at Scale
with Semantic and Probabilistic Techniques
Szymon Klarman
Department of Computer Science
Brunel University London
Connected Data London 2016
Scientific publishing deluge
 50 mln papers published since 1665
 2.5 mln papers published last year
 publication output doubling every 9 years
Effects:
 narrowing of science and scholarship – we cite a small pool of
mostly recent papers
 narrowing of expertise
 „publish or perish” principle affects the quality of results
Big Mechanism
Reading Assembly Explanation
Challanges
• ambiguity and vagueness of natural language
• general quality and reliability of the sources
• the inaccuracy of the information extraction tools
• the typical „Vs” of the big data, i.e.: volume, variety, volatility, velocity
• inconsistent, inconclusive or non-reproducible results
• gaps, omissions, contextual assumptions
In vitro curcumin downregulated the expression of Bcl-
2, and Bcl-XL and upregulated the expression of
p53, Bax, Bak, PUMA, Noxa, and Bim at mRNA and protein
levels in prostate cancer cells [14].
extraction
reconciliation
filtering
aggregation
evidence knowledge model formation
Knowledge assembly is a process of reconstructing complex knowledge from contextually
asserted atomic statements and data fragments (evidence).
Knowledge assembly
knowledge assembly„[…] A can associate with B […]” <A binding B>
extraction assemblyevidence (probabilistic)
knowledge
probabilistic inference
learning
model updates
Probabilistic knowledge assembly
expert input
In Probabilistic Knowledge Assembly (PANDA) framework, evidence with all contextual
information is part of the knowledge base to enable continuous update-assembly loop.
extraction assemblyevidence (probabilistic)
knowledge
probabilistic inference
learning
model updates
„A can associate with B”
extractionacurracy = 0.7
published in: „Molecular Cancer”
<A binding B> is supported to degree 0.7 Evidence contradicts the model to degree 0.7
<A binding B> is experimentally confirmed
Probabilistic knowledge assembly
expert input
In Probabilistic Knowledge Assembly (PANDA) framework, evidence with all contextual
information is part of the knowledge base to enable continuous update-assembly loop.
 ontologies:
• biomedical (GO, BioPax, MI)
• uncertainty (UNO)
• information/document/provenance description
(IAO, Prov-O, VoID, Dublin Core)
 (linked) open data via SPARQL endpoints and APIs:
• PubMed
• journal rankings (SciMago)
• bioinformatics databases (UniProt, Chebi, HGNC)
 unique identifiers
• biochemical enitities
• journals / articles
Linked data resources
Event
Biochemical entity / Event
Statement
ArticleJournal
represents
is extracted from
Molecular interaction
has participant
type
published in
Uncertainty level
Textual evidence
Truth value evidence
has evidence
has truth value
has uncertainty
(of type X)
Knowledge graph: data model
knowledge
[...]
In addition, GRB2
can associate with
GAB1
[...]
Knowledge graph: example
statement_1
textual
evidence
0.8
extraction prob
True
truth value
PMC123456
extracted from
„In addition, GRB2 can
associate with GAB1”
Statement
Article
type
type
0.7
provenance prob
[...]
In addition, GRB2
can associate with
GAB1
[...]
Knowledge graph: example
GRB2 binding GAB1
statement_1
textual
evidence
0.8
extraction prob
GRB2_MOUSE GAB1_MOUSE
has participant A has participant B
True
truth value
PMC123456
extracted from
„In addition, GRB2 can
associate with GAB1”
Event
Binding
Protein
Statement
Article
type
type
subclass of
typetype
type
represents0.7
provenance prob
[...]
In addition, GRB2
can associate with
GAB1
[...]
GRB2 binding GAB1
statement_1
textual
evidence
0.8
extraction prob
statement_..99
represents
GRB2_MOUSE GAB1_MOUSE
has participant A has participant B
True
truth value
PMC123456
extracted from
„In addition, GRB2 can
associate with GAB1”
Event
Binding
Protein
Statement
Article
PMC654321 False
„GRB2 does not interact
directly with GAB1”
typetype
type
subclass of
typetype
type type
represents
extractedFrom
0.7
provenance prob
0.6
0.7
provenance prob
extraction prob
textual
evidence
truth value
GRB2 binding GAB1
statement_1
textual
evidence
0.8
extraction prob
statement_..99
represents
GRB2_MOUSE GAB1_MOUSE
has participant A has participant B
True
truth value
PMC123456
extracted from
„In addition, GRB2 can
associate with GAB1”
Event
Binding
Protein
Statement
Article
PMC654321 False
„GRB2 does not interact
directly with GAB1”
typetype
type
subclass of
typetype
type type
represents
extractedFrom
0.7
provenance prob
0.6
0.7
provenance prob
extraction prob
textual
evidence
truth value
So what can we really say about
the truth of events?
event = <A binding B>
0
0,5
1
{s1} {s1, s2} {s1, s2, s3}
positive support
negative support
inconsistency
Statement Extraction accurracy Provenance uncertainty
S1 = event is true 0.8 0.7
S2 = event is false 0.8 0.7
S3 = event is false 0.9 0.6
Support aggregation
Positive
support
Negative
support
Event
likelihood
Doc_1
Doc_2
Stat_1
Stat_2
Provenance
uncertainty
Extraction
accurracy
Textual
uncertainty
Stat...
Doc...
Document
part weight
Total uncertainty aggregation
Probabilistic model (~Bayes net) over linked data expressed via probabilistic logic
programming (ProbLog).
Extraction
Accuracy
Provenance
Uncertainty
Total
Uncertainty
Experimental
Confirmation
T F -
0.9 0.1 0.5
Molecule Interaction Gene
Total Uncertainty
Before Experiment
Experimental
Confirmation
Total Uncertainty
After Experiment
curcumin
negative
regulation
BCL2_MOUSE 0.3941 TRUE 0.7489
curcumin
positive
regulation
P53_HUMAN 0.3924 FALSE 0.1569
curcumin
negative
regulation
Q9H014_HUMAN 0.3929 - 0.3929
... ... ... ... ... ...
Expert input
Big Mechanism technology
We need to find generic solutions for extracting Big Mechanisms and enabling them to
computational agents.
Probabilistic Knowledge Assembly framework (semantics + probabilistic reasoning) offers:
• a powerful framework for scalable and flexible knowledge assembly tasks
• a uniform knowledge representation model and data access interface based on generic
tools and technologies (particularly W3C standards)
• the use of declarative formalisms facilitates provenance tracking
• continuous update-assembly loop for dynamic environments
szymon.klarman@gmail.com
Thank you!

More Related Content

Similar to Knowledge Assembly at Scale with Semantic Probabilistic Techniques

Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Ian Foster
 
Microbiome Profiling with the Microbial Genomics Pro Suite
Microbiome Profiling with the Microbial Genomics Pro SuiteMicrobiome Profiling with the Microbial Genomics Pro Suite
Microbiome Profiling with the Microbial Genomics Pro SuiteQIAGEN
 
HyQue: Evaluating scientific Hypotheses using semantic web technologies
HyQue: Evaluating scientific Hypotheses using semantic web technologiesHyQue: Evaluating scientific Hypotheses using semantic web technologies
HyQue: Evaluating scientific Hypotheses using semantic web technologiesMichel Dumontier
 
The impact of different sources of heterogeneity on loss of accuracy from gen...
The impact of different sources of heterogeneity on loss of accuracy from gen...The impact of different sources of heterogeneity on loss of accuracy from gen...
The impact of different sources of heterogeneity on loss of accuracy from gen...Levi Waldron
 
STRING - Prediction of functionally associated proteins from heterogeneous ge...
STRING - Prediction of functionally associated proteins from heterogeneous ge...STRING - Prediction of functionally associated proteins from heterogeneous ge...
STRING - Prediction of functionally associated proteins from heterogeneous ge...Lars Juhl Jensen
 
Enabling the Computational Future of Biology.pdf
Enabling the Computational Future of Biology.pdfEnabling the Computational Future of Biology.pdf
Enabling the Computational Future of Biology.pdfVaticle
 
Building Biomedical Knowledge Graphs for In-Silico Drug Discovery
Building Biomedical Knowledge Graphs for In-Silico Drug DiscoveryBuilding Biomedical Knowledge Graphs for In-Silico Drug Discovery
Building Biomedical Knowledge Graphs for In-Silico Drug DiscoveryVaticle
 
Towards Replicable and Genereralizable Genomic Prediction Models
Towards Replicable and Genereralizable Genomic Prediction ModelsTowards Replicable and Genereralizable Genomic Prediction Models
Towards Replicable and Genereralizable Genomic Prediction ModelsLevi Waldron
 
Data Visualization And Annotation Workshop at Biocuration 2015
Data Visualization And Annotation Workshop at Biocuration 2015Data Visualization And Annotation Workshop at Biocuration 2015
Data Visualization And Annotation Workshop at Biocuration 2015Monica Munoz-Torres
 
BRITEREU_finalposter
BRITEREU_finalposterBRITEREU_finalposter
BRITEREU_finalposterElsa Fecke
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Golden Helix Inc
 
STRING - Prediction of a functional association network for the yeast mitocho...
STRING - Prediction of a functional association network for the yeast mitocho...STRING - Prediction of a functional association network for the yeast mitocho...
STRING - Prediction of a functional association network for the yeast mitocho...Lars Juhl Jensen
 
Pistoia Alliance USA Conference 2016
Pistoia Alliance USA Conference 2016Pistoia Alliance USA Conference 2016
Pistoia Alliance USA Conference 2016Pistoia Alliance
 
Prediction and Meta-Analysis
Prediction and Meta-AnalysisPrediction and Meta-Analysis
Prediction and Meta-AnalysisGolden Helix
 
Prediction and Meta-Analysis
Prediction and Meta-AnalysisPrediction and Meta-Analysis
Prediction and Meta-AnalysisGolden Helix Inc
 
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...Alejandra Gonzalez-Beltran
 
Towards semantic systems chemical biology
Towards semantic systems chemical biology Towards semantic systems chemical biology
Towards semantic systems chemical biology Bin Chen
 
Revolution in the Connectivity Between Medicinal Chemistry and Biology
Revolution in the Connectivity Between Medicinal Chemistry and BiologyRevolution in the Connectivity Between Medicinal Chemistry and Biology
Revolution in the Connectivity Between Medicinal Chemistry and BiologyChris Southan
 

Similar to Knowledge Assembly at Scale with Semantic Probabilistic Techniques (20)

Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
Microbiome Profiling with the Microbial Genomics Pro Suite
Microbiome Profiling with the Microbial Genomics Pro SuiteMicrobiome Profiling with the Microbial Genomics Pro Suite
Microbiome Profiling with the Microbial Genomics Pro Suite
 
HyQue: Evaluating scientific Hypotheses using semantic web technologies
HyQue: Evaluating scientific Hypotheses using semantic web technologiesHyQue: Evaluating scientific Hypotheses using semantic web technologies
HyQue: Evaluating scientific Hypotheses using semantic web technologies
 
The impact of different sources of heterogeneity on loss of accuracy from gen...
The impact of different sources of heterogeneity on loss of accuracy from gen...The impact of different sources of heterogeneity on loss of accuracy from gen...
The impact of different sources of heterogeneity on loss of accuracy from gen...
 
STRING - Prediction of functionally associated proteins from heterogeneous ge...
STRING - Prediction of functionally associated proteins from heterogeneous ge...STRING - Prediction of functionally associated proteins from heterogeneous ge...
STRING - Prediction of functionally associated proteins from heterogeneous ge...
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
GFP Workshop
GFP WorkshopGFP Workshop
GFP Workshop
 
Enabling the Computational Future of Biology.pdf
Enabling the Computational Future of Biology.pdfEnabling the Computational Future of Biology.pdf
Enabling the Computational Future of Biology.pdf
 
Building Biomedical Knowledge Graphs for In-Silico Drug Discovery
Building Biomedical Knowledge Graphs for In-Silico Drug DiscoveryBuilding Biomedical Knowledge Graphs for In-Silico Drug Discovery
Building Biomedical Knowledge Graphs for In-Silico Drug Discovery
 
Towards Replicable and Genereralizable Genomic Prediction Models
Towards Replicable and Genereralizable Genomic Prediction ModelsTowards Replicable and Genereralizable Genomic Prediction Models
Towards Replicable and Genereralizable Genomic Prediction Models
 
Data Visualization And Annotation Workshop at Biocuration 2015
Data Visualization And Annotation Workshop at Biocuration 2015Data Visualization And Annotation Workshop at Biocuration 2015
Data Visualization And Annotation Workshop at Biocuration 2015
 
BRITEREU_finalposter
BRITEREU_finalposterBRITEREU_finalposter
BRITEREU_finalposter
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
 
STRING - Prediction of a functional association network for the yeast mitocho...
STRING - Prediction of a functional association network for the yeast mitocho...STRING - Prediction of a functional association network for the yeast mitocho...
STRING - Prediction of a functional association network for the yeast mitocho...
 
Pistoia Alliance USA Conference 2016
Pistoia Alliance USA Conference 2016Pistoia Alliance USA Conference 2016
Pistoia Alliance USA Conference 2016
 
Prediction and Meta-Analysis
Prediction and Meta-AnalysisPrediction and Meta-Analysis
Prediction and Meta-Analysis
 
Prediction and Meta-Analysis
Prediction and Meta-AnalysisPrediction and Meta-Analysis
Prediction and Meta-Analysis
 
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
 
Towards semantic systems chemical biology
Towards semantic systems chemical biology Towards semantic systems chemical biology
Towards semantic systems chemical biology
 
Revolution in the Connectivity Between Medicinal Chemistry and Biology
Revolution in the Connectivity Between Medicinal Chemistry and BiologyRevolution in the Connectivity Between Medicinal Chemistry and Biology
Revolution in the Connectivity Between Medicinal Chemistry and Biology
 

More from Connected Data World

Systems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van HarmelenSystems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van HarmelenConnected Data World
 
Graph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora LassilaGraph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora LassilaConnected Data World
 
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...Connected Data World
 
How to get started with Graph Machine Learning
How to get started with Graph Machine LearningHow to get started with Graph Machine Learning
How to get started with Graph Machine LearningConnected Data World
 
The years of the graph: The future of the future is here
The years of the graph: The future of the future is hereThe years of the graph: The future of the future is here
The years of the graph: The future of the future is hereConnected Data World
 
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2Connected Data World
 
From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3Connected Data World
 
In Search of the Universal Data Model
In Search of the Universal Data ModelIn Search of the Universal Data Model
In Search of the Universal Data ModelConnected Data World
 
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph DatabaseGraph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph DatabaseConnected Data World
 
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Connected Data World
 
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...Connected Data World
 
Semantic similarity for faster Knowledge Graph delivery at scale
Semantic similarity for faster Knowledge Graph delivery at scaleSemantic similarity for faster Knowledge Graph delivery at scale
Semantic similarity for faster Knowledge Graph delivery at scaleConnected Data World
 
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...Connected Data World
 
Schema, Google & The Future of the Web
Schema, Google & The Future of the WebSchema, Google & The Future of the Web
Schema, Google & The Future of the WebConnected Data World
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsConnected Data World
 
Elegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property GraphsElegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property GraphsConnected Data World
 
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...Connected Data World
 
Graph for Good: Empowering your NGO
Graph for Good: Empowering your NGOGraph for Good: Empowering your NGO
Graph for Good: Empowering your NGOConnected Data World
 

More from Connected Data World (20)

Systems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van HarmelenSystems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van Harmelen
 
Graph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora LassilaGraph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora Lassila
 
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
 
How to get started with Graph Machine Learning
How to get started with Graph Machine LearningHow to get started with Graph Machine Learning
How to get started with Graph Machine Learning
 
Graphs in sustainable finance
Graphs in sustainable financeGraphs in sustainable finance
Graphs in sustainable finance
 
The years of the graph: The future of the future is here
The years of the graph: The future of the future is hereThe years of the graph: The future of the future is here
The years of the graph: The future of the future is here
 
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
 
From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3
 
In Search of the Universal Data Model
In Search of the Universal Data ModelIn Search of the Universal Data Model
In Search of the Universal Data Model
 
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph DatabaseGraph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
 
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
 
Graph Realities
Graph RealitiesGraph Realities
Graph Realities
 
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
 
Semantic similarity for faster Knowledge Graph delivery at scale
Semantic similarity for faster Knowledge Graph delivery at scaleSemantic similarity for faster Knowledge Graph delivery at scale
Semantic similarity for faster Knowledge Graph delivery at scale
 
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
 
Schema, Google & The Future of the Web
Schema, Google & The Future of the WebSchema, Google & The Future of the Web
Schema, Google & The Future of the Web
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
Elegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property GraphsElegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property Graphs
 
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
 
Graph for Good: Empowering your NGO
Graph for Good: Empowering your NGOGraph for Good: Empowering your NGO
Graph for Good: Empowering your NGO
 

Recently uploaded

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 

Recently uploaded (20)

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 

Knowledge Assembly at Scale with Semantic Probabilistic Techniques

  • 1. Knowledge Assembly at Scale with Semantic and Probabilistic Techniques Szymon Klarman Department of Computer Science Brunel University London Connected Data London 2016
  • 2. Scientific publishing deluge  50 mln papers published since 1665  2.5 mln papers published last year  publication output doubling every 9 years Effects:  narrowing of science and scholarship – we cite a small pool of mostly recent papers  narrowing of expertise  „publish or perish” principle affects the quality of results
  • 4. Challanges • ambiguity and vagueness of natural language • general quality and reliability of the sources • the inaccuracy of the information extraction tools • the typical „Vs” of the big data, i.e.: volume, variety, volatility, velocity • inconsistent, inconclusive or non-reproducible results • gaps, omissions, contextual assumptions In vitro curcumin downregulated the expression of Bcl- 2, and Bcl-XL and upregulated the expression of p53, Bax, Bak, PUMA, Noxa, and Bim at mRNA and protein levels in prostate cancer cells [14].
  • 5. extraction reconciliation filtering aggregation evidence knowledge model formation Knowledge assembly is a process of reconstructing complex knowledge from contextually asserted atomic statements and data fragments (evidence). Knowledge assembly knowledge assembly„[…] A can associate with B […]” <A binding B>
  • 6. extraction assemblyevidence (probabilistic) knowledge probabilistic inference learning model updates Probabilistic knowledge assembly expert input In Probabilistic Knowledge Assembly (PANDA) framework, evidence with all contextual information is part of the knowledge base to enable continuous update-assembly loop.
  • 7. extraction assemblyevidence (probabilistic) knowledge probabilistic inference learning model updates „A can associate with B” extractionacurracy = 0.7 published in: „Molecular Cancer” <A binding B> is supported to degree 0.7 Evidence contradicts the model to degree 0.7 <A binding B> is experimentally confirmed Probabilistic knowledge assembly expert input In Probabilistic Knowledge Assembly (PANDA) framework, evidence with all contextual information is part of the knowledge base to enable continuous update-assembly loop.
  • 8.  ontologies: • biomedical (GO, BioPax, MI) • uncertainty (UNO) • information/document/provenance description (IAO, Prov-O, VoID, Dublin Core)  (linked) open data via SPARQL endpoints and APIs: • PubMed • journal rankings (SciMago) • bioinformatics databases (UniProt, Chebi, HGNC)  unique identifiers • biochemical enitities • journals / articles Linked data resources
  • 9. Event Biochemical entity / Event Statement ArticleJournal represents is extracted from Molecular interaction has participant type published in Uncertainty level Textual evidence Truth value evidence has evidence has truth value has uncertainty (of type X) Knowledge graph: data model knowledge
  • 10. [...] In addition, GRB2 can associate with GAB1 [...] Knowledge graph: example
  • 11. statement_1 textual evidence 0.8 extraction prob True truth value PMC123456 extracted from „In addition, GRB2 can associate with GAB1” Statement Article type type 0.7 provenance prob [...] In addition, GRB2 can associate with GAB1 [...] Knowledge graph: example
  • 12. GRB2 binding GAB1 statement_1 textual evidence 0.8 extraction prob GRB2_MOUSE GAB1_MOUSE has participant A has participant B True truth value PMC123456 extracted from „In addition, GRB2 can associate with GAB1” Event Binding Protein Statement Article type type subclass of typetype type represents0.7 provenance prob [...] In addition, GRB2 can associate with GAB1 [...]
  • 13. GRB2 binding GAB1 statement_1 textual evidence 0.8 extraction prob statement_..99 represents GRB2_MOUSE GAB1_MOUSE has participant A has participant B True truth value PMC123456 extracted from „In addition, GRB2 can associate with GAB1” Event Binding Protein Statement Article PMC654321 False „GRB2 does not interact directly with GAB1” typetype type subclass of typetype type type represents extractedFrom 0.7 provenance prob 0.6 0.7 provenance prob extraction prob textual evidence truth value
  • 14. GRB2 binding GAB1 statement_1 textual evidence 0.8 extraction prob statement_..99 represents GRB2_MOUSE GAB1_MOUSE has participant A has participant B True truth value PMC123456 extracted from „In addition, GRB2 can associate with GAB1” Event Binding Protein Statement Article PMC654321 False „GRB2 does not interact directly with GAB1” typetype type subclass of typetype type type represents extractedFrom 0.7 provenance prob 0.6 0.7 provenance prob extraction prob textual evidence truth value So what can we really say about the truth of events?
  • 15. event = <A binding B> 0 0,5 1 {s1} {s1, s2} {s1, s2, s3} positive support negative support inconsistency Statement Extraction accurracy Provenance uncertainty S1 = event is true 0.8 0.7 S2 = event is false 0.8 0.7 S3 = event is false 0.9 0.6 Support aggregation
  • 17. Extraction Accuracy Provenance Uncertainty Total Uncertainty Experimental Confirmation T F - 0.9 0.1 0.5 Molecule Interaction Gene Total Uncertainty Before Experiment Experimental Confirmation Total Uncertainty After Experiment curcumin negative regulation BCL2_MOUSE 0.3941 TRUE 0.7489 curcumin positive regulation P53_HUMAN 0.3924 FALSE 0.1569 curcumin negative regulation Q9H014_HUMAN 0.3929 - 0.3929 ... ... ... ... ... ... Expert input
  • 18. Big Mechanism technology We need to find generic solutions for extracting Big Mechanisms and enabling them to computational agents. Probabilistic Knowledge Assembly framework (semantics + probabilistic reasoning) offers: • a powerful framework for scalable and flexible knowledge assembly tasks • a uniform knowledge representation model and data access interface based on generic tools and technologies (particularly W3C standards) • the use of declarative formalisms facilitates provenance tracking • continuous update-assembly loop for dynamic environments