SlideShare a Scribd company logo
Machine Reading for Cancer
Biology
Sophia Ananiadou
National Centre for Text Mining
School of Computer Science
University of Manchester
End-to-end text mining system
• Machine reading for pathways
– Event extraction
– Uncertainty detection for ranking
• Integration of machine reading tools
– Argo and OpenMinTeD plaDorms
• Interactive visual analytics
– LitPathExplorer
Motivation
3
To support pathway construction and design of
experiments
• extract evidence from literature
• events, entities, contextual interpretation
For these, we need to
• understand pathway representations
• bridge the gap between knowledge and
text
• read against models (deep reading)
From concepts to events
1 Concept recognition
2 Interaction recognition
3 Concept and interaction identification
DrugBank:DB06712 DrugBank:DB00682 DrugBank:DB04610
5
The Big Mechanism: reading, assembly, experiments
Courtesy: Paul Cohen
http://nactem.ac.uk/big_mechanism/
Tools for Event Extraction
EventMine
• EventMine: a machine learning pipeline event
extraction system
– Several parse results, dictionaries
– Coreference resolution, domain adaptation
7
http://www.nactem.ac.uk/EventMine/
Miwa, M. & Ananiadou, S. (2015) Adaptable, high recall, event extraction system with minimal
configuration, BMC Bioinformatics, 16(10), S7
Miwa, M., Thompson, P. and Ananiadou, S. (2012) Boosting automatic event extraction from
the literature using domain adaptation and coreference resolution. Bioinformatics, 28(13)
Linking interactions (events) to
pathways 1. The mitotic arrest-deficient
protein Mad1 forms a complex
with Mad2, which is required
for imposing mitotic arrest on
cells in which the spindle
assembly is perturbed. PMID:
18981471
2. Mad1, an upstream regulator of
Mad2, forms a tight core
complex with Mad2 and
facilitates Mad2 binding to
Cdc20. PMID: 18318601
8
2013 Beyond linking reactions
to documents at coarse
level
Event interpretation
Protein
MUC1
Theme 1
in RAS
Binding
binding to
Event trigger
Protein
PKM2
Theme 2
Event
argument
Entity
argument
Chemical
BRAF
Cause
is notResults suggest that
SIMPLE EVENT
COMPLEX EVENT
Theme
*Complex events have at
lest one argument that is
an event on its own
Regulation
required for
Event trigger
Textual Mentions in Context
• Our results prove that BRAF is required for MUC1 binding to PKM2
– Strong certainty
• Our results suggest that BRAF is required for MUC1 binding to PKM2
– Some hedging/speculation
• Our results indicate that BRAF may be required for MUC1 binding to PKM2
– Strong hedging/speculation
• There is scarce evidence that BRAF is required for MUC1 binding to PKM2
– Hedging
• We are going to test whether BRAF is required for MUC1 binding to PKM2
– Investigation
• Oeen BRAF is required for MUC1 binding to PKM2
– Frequency/time limitation
• Whether BRAF is required for MUC1 binding to PKM2 is out of the scope of
this work
– Admission of lack of knowledge
Uncertainty cues
11
BioNLP-ST, GENIA-MK
Hybrid approach
Dependency relations between cues and event triggers
Automated Rule Induction (from corpus)
1. EventMine (to identify event triggers)
2. Deep parsing (to identify dependencies)
3. Cue lists
Machine Learner (Random Forest)
1. Lexical (e.g. cues, POS tags, event-trigger surface form)
2. Syntactic (e.g. shortest path, dependency cue-trigger)
3. Semantic (e.g. event type, argument type/role)
Multiple mentions of the same event
• Our results prove that BRAF is
required for MUC1 binding to PKM2
• Our results suggest that BRAF is
required for MUC1 binding to PKM2
• Our results indicate that BRAF may
be required for MUC1 binding to
PKM2
• There is scarce evidence that BRAF is
required for MUC1 binding to PKM2
• We are going to test whether BRAF
is required for MUC1 binding to
PKM2
• Oeen BRAF is required for MUC1
binding to PKM2
• Whether BRAF is required for MUC1
binding to PKM2 is out of the scope
of this work
The same interaction can be
mentioned:
• In multiple sentences of the same
paper
• In multiple papers
• With different levels of certainty
in each mention
We need to consolidate different
uncertainty values from each
mention to one “confidence” score
Consolidation over several mentions
Ex
Ex Ex
Adapting subjective logic
framework:
Ex Ex
?
Each mention of Ex
mapped to a pathway is
considered as the
subjective opinion of the
author for the interaction
described by the Ex.
Ex ωx = (bx , dx, ux,α)
belief
disbelief
uncertainty
base rate
uncertainty
identification
prior
probabilities
negation
identification
Evaluation - pathway models
• B-cell acute lymphoblastic leukemia model
(Pathway studio)
– 72 interactions, 260 evidence passages manually
selected, 1-20 sentences per interaction
– 12% flagged uncertain by our system
Zerva, C., Batista-Navarro, R., Day, P. and S. Ananiadou (2017) Using uncertainty to link
and rank evidence from biomedical literature for model reconstruction, Bioinformatics
Results
• Leukemia Pathway (7 annotators) ~ Pathway
Studio
• Average accuracy on sentence level: 0.96
• Average accuracy on interaction level: 0.87
– 1-20 sentences per interaction
Event interpretation
• Uncertainty scoring as an expressive
confidence measure
• Hybrid framework
• Value for each event mentioned in a sentence
– Consolidated uncertainty values from different
papers
• Effort to decrease manual effort and select
more certain events
17
• Web-based, graphical TM workbench
• Unstructured Information Management Architecture (UIMA)
standard
• Rich library of TM components
• Allows Cloud and high-performance computing
• StraighDorward integration of TM analytics
– modular, extensible, reconfigurable,
reusable workflows
Source: LEGO DUPLO
Database: The Journal of Biological Databases and Curation (2014)
Text Mining-assisted Biocuration Workflows in Argo.
Rak, R., Batista-Navarro, R. T. B., Rowley, A., Carter, J. and
Ananiadou, S.
openminted.eu
Workflow Designer
19
Sample workflow (Cancer Mechanisms)
existing components
custom components
existing components supplied
with custom resources
✓ highly extensible
✓ can be optimised by interchanging components
Sample machine reading workflow
22
Annotation Viewer/Editor
This is where the footer goes
Ensemble reading through federation
• What do we gain by combining text mining
tools from different groups?
– enriching/updating results
– comparison
– merging annotations e.g., by taking the union,
intersection, majority vote
– taking advantage of best-of-breed tools
23
Federated system
IASON
Registry of services
Text mining
tool
developer
Text mining
tool as a web
service
Descriptor file
Text mining workflow
24
Deep Reading: Integrating uncertainty
• LitPathExplorer
– Visual analytics tool; maps events from literature to
pathway interactions
– Includes uncertainty measure
Soto, A., Zerva, C., Batista-Navarro, R., and S. Ananiadou (2017) LitPathExplorer
Bioinformatics
LitPathExplorer:
A Visual Tool for Exploring Literature-
Enriched Pathway Models
• Due to their size and complexity, pathway
models are typically neither complete nor
error-free
• Revising, updating models
• Curators and consumers of these models need
to contrast and revise large collections of
research articles
– Costly, time consuming
26
LitPathExplorer: a confidence-based
tool for exploring pathway models
1. Enabling flexible search and exploration of
biomolecular pathway networks
– different views of the data
– various interactive functionalities
2. Provide a means for making existing evidence in the
scientific literature available to support
corroboration
3. Facilitate the discovery of new interactions that are
not yet part of a given model
4. Allow the user to become an active participant of
the analytical process
quantify confidence
in the events
27
Soto, A. et al, 2017, Bioinformatics
1. Search
• A pathway model can be
searched by providing:
– event types,
– entities,
– and/or roles for each entity in
the reaction
• Multiple queries can be
combined in a Boolean
search 28
2. Network viewer
Reading against the model
29
Entities
Reactions/
Events
• Colour encodes event type
• Size encodes confidence
30
3. Inspector, event confidence
computation
Mapping IDs for
entities and events
Overall event
confidence
3. Inspector, quantifying the confidence
31
Confidence
breakdown
Adjusting event confidence
32
4. Text Analyzer – Articles & sentences
33
Sentence-level language
confidence
Article-level language
confidence
Network Viewer: Discovery mode
Extending the model with events found in the
literature
34
3
5
Discovery mode
Difficult to explore when too many candidate events are found
Verifying mentions in text
36
Summary
• Text mining important to overcome silos,
fragmentation of information
• Complex events in context and visualisation
can corroborate and extend models (deep
reading)
• Text mining infrastructure supporting
customisation
• Towards mechanisms and system
understanding
37
National Centre for Text Mining
• 1st publicly funded national text
mining centre
• Location: Manchester Institute of
Biotechnology, University of
Manchester
• Since 2004-
• Fully sustainable 2011-
• Biology, Medicine, Biodiversity
www.nactem.ac.uk
The National Centre for Text Mining
NaCTeM
Presented at the Global Pharma
R&D Informatics Congress.
To find out more, visit:
www.global-engage.com

More Related Content

What's hot

Project Presentation
Project PresentationProject Presentation
Project Presentation
butest
 
Towards the study of sentiment in the public opinion of science in Spanish
Towards the study of sentiment in the public opinion of science in SpanishTowards the study of sentiment in the public opinion of science in Spanish
Towards the study of sentiment in the public opinion of science in Spanish
Technological Ecosystems for Enhancing Multiculturality
 
Spam email filtering
Spam email filteringSpam email filtering
Spam email filtering
National Institute
 
G0434045
G0434045G0434045
G0434045
IOSR Journals
 
Odsc 2018 detection_classification_of_fake_news_using_cnn_venkatraman
Odsc 2018 detection_classification_of_fake_news_using_cnn_venkatramanOdsc 2018 detection_classification_of_fake_news_using_cnn_venkatraman
Odsc 2018 detection_classification_of_fake_news_using_cnn_venkatraman
venkatramanJ4
 
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MININGFAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
ijnlc
 
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
IJDKP
 
data mining for terror attacks
data mining for terror attacksdata mining for terror attacks
data mining for terror attacks
Nilu Desai
 
Assessing the quality of online news
Assessing the quality of online newsAssessing the quality of online news
Assessing the quality of online news
ijaia
 
Simulator
SimulatorSimulator
Simulator
Nestor
 
Simulator
SimulatorSimulator
Simulator
922010
 
Simulator
SimulatorSimulator
Simulator
supremo1243
 
Simulator
SimulatorSimulator
Simulator
Eduardo
 
Simulator
SimulatorSimulator
Simulator
james
 

What's hot (14)

Project Presentation
Project PresentationProject Presentation
Project Presentation
 
Towards the study of sentiment in the public opinion of science in Spanish
Towards the study of sentiment in the public opinion of science in SpanishTowards the study of sentiment in the public opinion of science in Spanish
Towards the study of sentiment in the public opinion of science in Spanish
 
Spam email filtering
Spam email filteringSpam email filtering
Spam email filtering
 
G0434045
G0434045G0434045
G0434045
 
Odsc 2018 detection_classification_of_fake_news_using_cnn_venkatraman
Odsc 2018 detection_classification_of_fake_news_using_cnn_venkatramanOdsc 2018 detection_classification_of_fake_news_using_cnn_venkatraman
Odsc 2018 detection_classification_of_fake_news_using_cnn_venkatraman
 
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MININGFAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
 
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
 
data mining for terror attacks
data mining for terror attacksdata mining for terror attacks
data mining for terror attacks
 
Assessing the quality of online news
Assessing the quality of online newsAssessing the quality of online news
Assessing the quality of online news
 
Simulator
SimulatorSimulator
Simulator
 
Simulator
SimulatorSimulator
Simulator
 
Simulator
SimulatorSimulator
Simulator
 
Simulator
SimulatorSimulator
Simulator
 
Simulator
SimulatorSimulator
Simulator
 

Similar to Machine reading for cancer biology

ReComp for genomics
ReComp for genomicsReComp for genomics
ReComp for genomics
Paolo Missier
 
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
Pete Burnap
 
DATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITODATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITO
MarcoMellia
 
Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?
Manuel Martín
 
KnetMiner - EBI Workshop 2017
KnetMiner - EBI Workshop 2017KnetMiner - EBI Workshop 2017
KnetMiner - EBI Workshop 2017
Keywan Hassani-Pak
 
Java tutorial: Programmatic Access to Molecular Interactions
Java tutorial: Programmatic Access to Molecular InteractionsJava tutorial: Programmatic Access to Molecular Interactions
Java tutorial: Programmatic Access to Molecular Interactions
Rafael C. Jimenez
 
CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECA webinar slides: Modular and reproducible workflows for federated molec...CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECAProject
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
The Statistical and Applied Mathematical Sciences Institute
 
KnetMiner Overview Oct 2017
KnetMiner Overview Oct 2017KnetMiner Overview Oct 2017
KnetMiner Overview Oct 2017
Keywan Hassani-Pak
 
Web Apollo Tutorial for the i5K copepod research community.
Web Apollo Tutorial for the i5K copepod research community.Web Apollo Tutorial for the i5K copepod research community.
Web Apollo Tutorial for the i5K copepod research community.
Monica Munoz-Torres
 
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
ssuser4b1f48
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
butest
 
Paper presentations: UK e-science AHM meeting, 2005
Paper presentations: UK e-science AHM meeting, 2005Paper presentations: UK e-science AHM meeting, 2005
Paper presentations: UK e-science AHM meeting, 2005
Paolo Missier
 
Study on security and quality of service implementations in p2 p overlay netw...
Study on security and quality of service implementations in p2 p overlay netw...Study on security and quality of service implementations in p2 p overlay netw...
Study on security and quality of service implementations in p2 p overlay netw...
eSAT Publishing House
 
LatentCross.pdf
LatentCross.pdfLatentCross.pdf
LatentCross.pdf
NilanjanSarkar25
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"
Pinar Alper
 
Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...
BaoTramDuong2
 
Automatically Generating Wikipedia Articles: A Structure-Aware Approach
Automatically Generating Wikipedia Articles:  A Structure-Aware ApproachAutomatically Generating Wikipedia Articles:  A Structure-Aware Approach
Automatically Generating Wikipedia Articles: A Structure-Aware Approach
George Ang
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Prof. Wim Van Criekinge
 
NRNB Annual Report 2018
NRNB Annual Report 2018NRNB Annual Report 2018
NRNB Annual Report 2018
Alexander Pico
 

Similar to Machine reading for cancer biology (20)

ReComp for genomics
ReComp for genomicsReComp for genomics
ReComp for genomics
 
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
 
DATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITODATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITO
 
Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?
 
KnetMiner - EBI Workshop 2017
KnetMiner - EBI Workshop 2017KnetMiner - EBI Workshop 2017
KnetMiner - EBI Workshop 2017
 
Java tutorial: Programmatic Access to Molecular Interactions
Java tutorial: Programmatic Access to Molecular InteractionsJava tutorial: Programmatic Access to Molecular Interactions
Java tutorial: Programmatic Access to Molecular Interactions
 
CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECA webinar slides: Modular and reproducible workflows for federated molec...CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECA webinar slides: Modular and reproducible workflows for federated molec...
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 
KnetMiner Overview Oct 2017
KnetMiner Overview Oct 2017KnetMiner Overview Oct 2017
KnetMiner Overview Oct 2017
 
Web Apollo Tutorial for the i5K copepod research community.
Web Apollo Tutorial for the i5K copepod research community.Web Apollo Tutorial for the i5K copepod research community.
Web Apollo Tutorial for the i5K copepod research community.
 
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
 
Paper presentations: UK e-science AHM meeting, 2005
Paper presentations: UK e-science AHM meeting, 2005Paper presentations: UK e-science AHM meeting, 2005
Paper presentations: UK e-science AHM meeting, 2005
 
Study on security and quality of service implementations in p2 p overlay netw...
Study on security and quality of service implementations in p2 p overlay netw...Study on security and quality of service implementations in p2 p overlay netw...
Study on security and quality of service implementations in p2 p overlay netw...
 
LatentCross.pdf
LatentCross.pdfLatentCross.pdf
LatentCross.pdf
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"
 
Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...
 
Automatically Generating Wikipedia Articles: A Structure-Aware Approach
Automatically Generating Wikipedia Articles:  A Structure-Aware ApproachAutomatically Generating Wikipedia Articles:  A Structure-Aware Approach
Automatically Generating Wikipedia Articles: A Structure-Aware Approach
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
 
NRNB Annual Report 2018
NRNB Annual Report 2018NRNB Annual Report 2018
NRNB Annual Report 2018
 

More from Laura Berry

Domains of unknown function are essential in yeast
Domains of unknown function are essential in yeastDomains of unknown function are essential in yeast
Domains of unknown function are essential in yeast
Laura Berry
 
Data-driven design of cell factories and communities
Data-driven design of cell factories and communitiesData-driven design of cell factories and communities
Data-driven design of cell factories and communities
Laura Berry
 
Synthetic Biology via programmable directed evolution
Synthetic Biology via programmable directed evolutionSynthetic Biology via programmable directed evolution
Synthetic Biology via programmable directed evolution
Laura Berry
 
Illuminating the druggable genome and the quest for new drug targets
Illuminating the druggable genome and the quest for new drug targetsIlluminating the druggable genome and the quest for new drug targets
Illuminating the druggable genome and the quest for new drug targets
Laura Berry
 
3decision®: Bringing structural data analytics to the masses
3decision®: Bringing structural data analytics to the masses3decision®: Bringing structural data analytics to the masses
3decision®: Bringing structural data analytics to the masses
Laura Berry
 
Network-driven drug discovery: A computational network biology approach to dr...
Network-driven drug discovery: A computational network biology approach to dr...Network-driven drug discovery: A computational network biology approach to dr...
Network-driven drug discovery: A computational network biology approach to dr...
Laura Berry
 
Measuring project success and Shannon's maxim: the enemy knows the system
Measuring project success and Shannon's maxim: the enemy knows the systemMeasuring project success and Shannon's maxim: the enemy knows the system
Measuring project success and Shannon's maxim: the enemy knows the system
Laura Berry
 
The Largest General Translational Informatics Public Private Partnership to Date
The Largest General Translational Informatics Public Private Partnership to DateThe Largest General Translational Informatics Public Private Partnership to Date
The Largest General Translational Informatics Public Private Partnership to Date
Laura Berry
 
The challenges of Analytical Data Management in R&D
The challenges of Analytical Data Management in R&DThe challenges of Analytical Data Management in R&D
The challenges of Analytical Data Management in R&D
Laura Berry
 
Will data scientists lead the discovery of cancer therapeutics?
Will data scientists lead the discovery of cancer therapeutics?Will data scientists lead the discovery of cancer therapeutics?
Will data scientists lead the discovery of cancer therapeutics?
Laura Berry
 
Improving exome sequencing, targeted sequencing, and low frequency variant de...
Improving exome sequencing, targeted sequencing, and low frequency variant de...Improving exome sequencing, targeted sequencing, and low frequency variant de...
Improving exome sequencing, targeted sequencing, and low frequency variant de...
Laura Berry
 
Population scale sequencing by cost-efficient targeted NGS
Population scale sequencing by cost-efficient targeted NGSPopulation scale sequencing by cost-efficient targeted NGS
Population scale sequencing by cost-efficient targeted NGS
Laura Berry
 
Disease interpretation of whole genome sequence variants
Disease interpretation of whole genome sequence variantsDisease interpretation of whole genome sequence variants
Disease interpretation of whole genome sequence variants
Laura Berry
 
Targeting giants - the pharmacology of adhesions GPCRs
Targeting giants - the pharmacology of adhesions GPCRsTargeting giants - the pharmacology of adhesions GPCRs
Targeting giants - the pharmacology of adhesions GPCRs
Laura Berry
 
Inventing, developing and commercialising targeted small-molecule drugs for p...
Inventing, developing and commercialising targeted small-molecule drugs for p...Inventing, developing and commercialising targeted small-molecule drugs for p...
Inventing, developing and commercialising targeted small-molecule drugs for p...
Laura Berry
 
Using antitumor agents to probe the sensitivity contexts of cancer cells and ...
Using antitumor agents to probe the sensitivity contexts of cancer cells and ...Using antitumor agents to probe the sensitivity contexts of cancer cells and ...
Using antitumor agents to probe the sensitivity contexts of cancer cells and ...
Laura Berry
 
Accessing genetically tagged heterocycle libraries via a chemoresistant DNA s...
Accessing genetically tagged heterocycle libraries via a chemoresistant DNA s...Accessing genetically tagged heterocycle libraries via a chemoresistant DNA s...
Accessing genetically tagged heterocycle libraries via a chemoresistant DNA s...
Laura Berry
 
Next-Gen Drug Discovery: An Integrated Micro-Droplet Based Platform
Next-Gen Drug Discovery: An Integrated Micro-Droplet Based PlatformNext-Gen Drug Discovery: An Integrated Micro-Droplet Based Platform
Next-Gen Drug Discovery: An Integrated Micro-Droplet Based Platform
Laura Berry
 
Lighting Rockets at the UChicago Microbiome Launchpad
Lighting Rockets at the UChicago Microbiome LaunchpadLighting Rockets at the UChicago Microbiome Launchpad
Lighting Rockets at the UChicago Microbiome Launchpad
Laura Berry
 
Human Microbiota: Proof of Concept to Production
Human Microbiota: Proof of Concept to ProductionHuman Microbiota: Proof of Concept to Production
Human Microbiota: Proof of Concept to Production
Laura Berry
 

More from Laura Berry (20)

Domains of unknown function are essential in yeast
Domains of unknown function are essential in yeastDomains of unknown function are essential in yeast
Domains of unknown function are essential in yeast
 
Data-driven design of cell factories and communities
Data-driven design of cell factories and communitiesData-driven design of cell factories and communities
Data-driven design of cell factories and communities
 
Synthetic Biology via programmable directed evolution
Synthetic Biology via programmable directed evolutionSynthetic Biology via programmable directed evolution
Synthetic Biology via programmable directed evolution
 
Illuminating the druggable genome and the quest for new drug targets
Illuminating the druggable genome and the quest for new drug targetsIlluminating the druggable genome and the quest for new drug targets
Illuminating the druggable genome and the quest for new drug targets
 
3decision®: Bringing structural data analytics to the masses
3decision®: Bringing structural data analytics to the masses3decision®: Bringing structural data analytics to the masses
3decision®: Bringing structural data analytics to the masses
 
Network-driven drug discovery: A computational network biology approach to dr...
Network-driven drug discovery: A computational network biology approach to dr...Network-driven drug discovery: A computational network biology approach to dr...
Network-driven drug discovery: A computational network biology approach to dr...
 
Measuring project success and Shannon's maxim: the enemy knows the system
Measuring project success and Shannon's maxim: the enemy knows the systemMeasuring project success and Shannon's maxim: the enemy knows the system
Measuring project success and Shannon's maxim: the enemy knows the system
 
The Largest General Translational Informatics Public Private Partnership to Date
The Largest General Translational Informatics Public Private Partnership to DateThe Largest General Translational Informatics Public Private Partnership to Date
The Largest General Translational Informatics Public Private Partnership to Date
 
The challenges of Analytical Data Management in R&D
The challenges of Analytical Data Management in R&DThe challenges of Analytical Data Management in R&D
The challenges of Analytical Data Management in R&D
 
Will data scientists lead the discovery of cancer therapeutics?
Will data scientists lead the discovery of cancer therapeutics?Will data scientists lead the discovery of cancer therapeutics?
Will data scientists lead the discovery of cancer therapeutics?
 
Improving exome sequencing, targeted sequencing, and low frequency variant de...
Improving exome sequencing, targeted sequencing, and low frequency variant de...Improving exome sequencing, targeted sequencing, and low frequency variant de...
Improving exome sequencing, targeted sequencing, and low frequency variant de...
 
Population scale sequencing by cost-efficient targeted NGS
Population scale sequencing by cost-efficient targeted NGSPopulation scale sequencing by cost-efficient targeted NGS
Population scale sequencing by cost-efficient targeted NGS
 
Disease interpretation of whole genome sequence variants
Disease interpretation of whole genome sequence variantsDisease interpretation of whole genome sequence variants
Disease interpretation of whole genome sequence variants
 
Targeting giants - the pharmacology of adhesions GPCRs
Targeting giants - the pharmacology of adhesions GPCRsTargeting giants - the pharmacology of adhesions GPCRs
Targeting giants - the pharmacology of adhesions GPCRs
 
Inventing, developing and commercialising targeted small-molecule drugs for p...
Inventing, developing and commercialising targeted small-molecule drugs for p...Inventing, developing and commercialising targeted small-molecule drugs for p...
Inventing, developing and commercialising targeted small-molecule drugs for p...
 
Using antitumor agents to probe the sensitivity contexts of cancer cells and ...
Using antitumor agents to probe the sensitivity contexts of cancer cells and ...Using antitumor agents to probe the sensitivity contexts of cancer cells and ...
Using antitumor agents to probe the sensitivity contexts of cancer cells and ...
 
Accessing genetically tagged heterocycle libraries via a chemoresistant DNA s...
Accessing genetically tagged heterocycle libraries via a chemoresistant DNA s...Accessing genetically tagged heterocycle libraries via a chemoresistant DNA s...
Accessing genetically tagged heterocycle libraries via a chemoresistant DNA s...
 
Next-Gen Drug Discovery: An Integrated Micro-Droplet Based Platform
Next-Gen Drug Discovery: An Integrated Micro-Droplet Based PlatformNext-Gen Drug Discovery: An Integrated Micro-Droplet Based Platform
Next-Gen Drug Discovery: An Integrated Micro-Droplet Based Platform
 
Lighting Rockets at the UChicago Microbiome Launchpad
Lighting Rockets at the UChicago Microbiome LaunchpadLighting Rockets at the UChicago Microbiome Launchpad
Lighting Rockets at the UChicago Microbiome Launchpad
 
Human Microbiota: Proof of Concept to Production
Human Microbiota: Proof of Concept to ProductionHuman Microbiota: Proof of Concept to Production
Human Microbiota: Proof of Concept to Production
 

Recently uploaded

Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
RASHMI M G
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
Texas Alliance of Groundwater Districts
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
TinyAnderson
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
Daniel Tubbenhauer
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
Hitesh Sikarwar
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
Sérgio Sacani
 

Recently uploaded (20)

Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
 

Machine reading for cancer biology

  • 1. Machine Reading for Cancer Biology Sophia Ananiadou National Centre for Text Mining School of Computer Science University of Manchester
  • 2. End-to-end text mining system • Machine reading for pathways – Event extraction – Uncertainty detection for ranking • Integration of machine reading tools – Argo and OpenMinTeD plaDorms • Interactive visual analytics – LitPathExplorer
  • 3. Motivation 3 To support pathway construction and design of experiments • extract evidence from literature • events, entities, contextual interpretation For these, we need to • understand pathway representations • bridge the gap between knowledge and text • read against models (deep reading)
  • 4. From concepts to events 1 Concept recognition 2 Interaction recognition 3 Concept and interaction identification DrugBank:DB06712 DrugBank:DB00682 DrugBank:DB04610
  • 5. 5 The Big Mechanism: reading, assembly, experiments Courtesy: Paul Cohen http://nactem.ac.uk/big_mechanism/
  • 6. Tools for Event Extraction
  • 7. EventMine • EventMine: a machine learning pipeline event extraction system – Several parse results, dictionaries – Coreference resolution, domain adaptation 7 http://www.nactem.ac.uk/EventMine/ Miwa, M. & Ananiadou, S. (2015) Adaptable, high recall, event extraction system with minimal configuration, BMC Bioinformatics, 16(10), S7 Miwa, M., Thompson, P. and Ananiadou, S. (2012) Boosting automatic event extraction from the literature using domain adaptation and coreference resolution. Bioinformatics, 28(13)
  • 8. Linking interactions (events) to pathways 1. The mitotic arrest-deficient protein Mad1 forms a complex with Mad2, which is required for imposing mitotic arrest on cells in which the spindle assembly is perturbed. PMID: 18981471 2. Mad1, an upstream regulator of Mad2, forms a tight core complex with Mad2 and facilitates Mad2 binding to Cdc20. PMID: 18318601 8 2013 Beyond linking reactions to documents at coarse level
  • 9. Event interpretation Protein MUC1 Theme 1 in RAS Binding binding to Event trigger Protein PKM2 Theme 2 Event argument Entity argument Chemical BRAF Cause is notResults suggest that SIMPLE EVENT COMPLEX EVENT Theme *Complex events have at lest one argument that is an event on its own Regulation required for Event trigger
  • 10. Textual Mentions in Context • Our results prove that BRAF is required for MUC1 binding to PKM2 – Strong certainty • Our results suggest that BRAF is required for MUC1 binding to PKM2 – Some hedging/speculation • Our results indicate that BRAF may be required for MUC1 binding to PKM2 – Strong hedging/speculation • There is scarce evidence that BRAF is required for MUC1 binding to PKM2 – Hedging • We are going to test whether BRAF is required for MUC1 binding to PKM2 – Investigation • Oeen BRAF is required for MUC1 binding to PKM2 – Frequency/time limitation • Whether BRAF is required for MUC1 binding to PKM2 is out of the scope of this work – Admission of lack of knowledge
  • 12. Hybrid approach Dependency relations between cues and event triggers Automated Rule Induction (from corpus) 1. EventMine (to identify event triggers) 2. Deep parsing (to identify dependencies) 3. Cue lists Machine Learner (Random Forest) 1. Lexical (e.g. cues, POS tags, event-trigger surface form) 2. Syntactic (e.g. shortest path, dependency cue-trigger) 3. Semantic (e.g. event type, argument type/role)
  • 13. Multiple mentions of the same event • Our results prove that BRAF is required for MUC1 binding to PKM2 • Our results suggest that BRAF is required for MUC1 binding to PKM2 • Our results indicate that BRAF may be required for MUC1 binding to PKM2 • There is scarce evidence that BRAF is required for MUC1 binding to PKM2 • We are going to test whether BRAF is required for MUC1 binding to PKM2 • Oeen BRAF is required for MUC1 binding to PKM2 • Whether BRAF is required for MUC1 binding to PKM2 is out of the scope of this work The same interaction can be mentioned: • In multiple sentences of the same paper • In multiple papers • With different levels of certainty in each mention We need to consolidate different uncertainty values from each mention to one “confidence” score
  • 14. Consolidation over several mentions Ex Ex Ex Adapting subjective logic framework: Ex Ex ? Each mention of Ex mapped to a pathway is considered as the subjective opinion of the author for the interaction described by the Ex. Ex ωx = (bx , dx, ux,α) belief disbelief uncertainty base rate uncertainty identification prior probabilities negation identification
  • 15. Evaluation - pathway models • B-cell acute lymphoblastic leukemia model (Pathway studio) – 72 interactions, 260 evidence passages manually selected, 1-20 sentences per interaction – 12% flagged uncertain by our system Zerva, C., Batista-Navarro, R., Day, P. and S. Ananiadou (2017) Using uncertainty to link and rank evidence from biomedical literature for model reconstruction, Bioinformatics
  • 16. Results • Leukemia Pathway (7 annotators) ~ Pathway Studio • Average accuracy on sentence level: 0.96 • Average accuracy on interaction level: 0.87 – 1-20 sentences per interaction
  • 17. Event interpretation • Uncertainty scoring as an expressive confidence measure • Hybrid framework • Value for each event mentioned in a sentence – Consolidated uncertainty values from different papers • Effort to decrease manual effort and select more certain events 17
  • 18. • Web-based, graphical TM workbench • Unstructured Information Management Architecture (UIMA) standard • Rich library of TM components • Allows Cloud and high-performance computing • StraighDorward integration of TM analytics – modular, extensible, reconfigurable, reusable workflows Source: LEGO DUPLO Database: The Journal of Biological Databases and Curation (2014) Text Mining-assisted Biocuration Workflows in Argo. Rak, R., Batista-Navarro, R. T. B., Rowley, A., Carter, J. and Ananiadou, S. openminted.eu
  • 20. Sample workflow (Cancer Mechanisms) existing components custom components existing components supplied with custom resources
  • 21. ✓ highly extensible ✓ can be optimised by interchanging components Sample machine reading workflow
  • 22. 22 Annotation Viewer/Editor This is where the footer goes
  • 23. Ensemble reading through federation • What do we gain by combining text mining tools from different groups? – enriching/updating results – comparison – merging annotations e.g., by taking the union, intersection, majority vote – taking advantage of best-of-breed tools 23
  • 24. Federated system IASON Registry of services Text mining tool developer Text mining tool as a web service Descriptor file Text mining workflow 24
  • 25. Deep Reading: Integrating uncertainty • LitPathExplorer – Visual analytics tool; maps events from literature to pathway interactions – Includes uncertainty measure Soto, A., Zerva, C., Batista-Navarro, R., and S. Ananiadou (2017) LitPathExplorer Bioinformatics
  • 26. LitPathExplorer: A Visual Tool for Exploring Literature- Enriched Pathway Models • Due to their size and complexity, pathway models are typically neither complete nor error-free • Revising, updating models • Curators and consumers of these models need to contrast and revise large collections of research articles – Costly, time consuming 26
  • 27. LitPathExplorer: a confidence-based tool for exploring pathway models 1. Enabling flexible search and exploration of biomolecular pathway networks – different views of the data – various interactive functionalities 2. Provide a means for making existing evidence in the scientific literature available to support corroboration 3. Facilitate the discovery of new interactions that are not yet part of a given model 4. Allow the user to become an active participant of the analytical process quantify confidence in the events 27 Soto, A. et al, 2017, Bioinformatics
  • 28. 1. Search • A pathway model can be searched by providing: – event types, – entities, – and/or roles for each entity in the reaction • Multiple queries can be combined in a Boolean search 28
  • 29. 2. Network viewer Reading against the model 29 Entities Reactions/ Events • Colour encodes event type • Size encodes confidence
  • 30. 30 3. Inspector, event confidence computation Mapping IDs for entities and events Overall event confidence
  • 31. 3. Inspector, quantifying the confidence 31 Confidence breakdown
  • 33. 4. Text Analyzer – Articles & sentences 33 Sentence-level language confidence Article-level language confidence
  • 34. Network Viewer: Discovery mode Extending the model with events found in the literature 34
  • 35. 3 5 Discovery mode Difficult to explore when too many candidate events are found
  • 37. Summary • Text mining important to overcome silos, fragmentation of information • Complex events in context and visualisation can corroborate and extend models (deep reading) • Text mining infrastructure supporting customisation • Towards mechanisms and system understanding 37
  • 38. National Centre for Text Mining • 1st publicly funded national text mining centre • Location: Manchester Institute of Biotechnology, University of Manchester • Since 2004- • Fully sustainable 2011- • Biology, Medicine, Biodiversity www.nactem.ac.uk The National Centre for Text Mining NaCTeM
  • 39. Presented at the Global Pharma R&D Informatics Congress. To find out more, visit: www.global-engage.com