Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

OSFair2017 Workshop | Big Mechanism: deep reading for cancer biology


Published on

Sophia Ananiadou talks about big mechanisms (from text to experiments using their text mining)

Training title:TDM unlocking a goldmine of information

Training overview:
Text and Data Mining (TDM) is a natural ‘next step’ in open science. It can lead to new and unexpected discoveries and increase the impact of publications and repositories. This workshop showcases examples of successful TDM and infrastructural solutions for researchers. We will also discuss what is needed to make most of infrastructures and how publishers and repositories can open up their content.


Published in: Science
  • Be the first to comment

  • Be the first to like this

OSFair2017 Workshop | Big Mechanism: deep reading for cancer biology

  1. 1. Big Mechanism: deep reading for cancer biology Sophia Ananiadou Na-onal Centre for Text Mining School of Computer Science Manchester Ins-tute of Biotechnology The University of Manchester
  2. 2. Overview •  From text to knowledge –  Events –  Big Mechanism •  Mining textual (un)certainty •  Visualising and ranking evidence (LitPathExplorer) 2
  3. 3. Pathway construc-on 3 mTOR pathway: 964 en--es, 777 reac-ons, 519 papers Caron, et al. Mol Syst Biol., 6(1) A MANUAL PROCESS Inevitable gaps building models
  4. 4. Knowledge 4 •  Key to understanding biological systems •  Models need verifica-on and maintenance (i.e., annota-on/cura-on) •  Scale and speed of literature challenging •  Annota-on/cura-on remains largely a manual task of incorpora-ng knowledge from scien-fic publica-ons Pathways
  5. 5. Mo-va-on 5 To support pathway construc-on and design of experiments •  Extrac-ng evidence from literature •  Events, en--es, contextual interpreta-on
  6. 6. 6 The Big Mechanism: reading, assembly, experiments hZp://
  7. 7. From concepts to events 1 Concept recogni-on 2 Interac-on recogni-on 3 Concept and interac-on iden-fica-on DrugBank:DB06712 DrugBank:DB00682 DrugBank:DB04610
  8. 8. EventMine •  Machine learning pipeline event extrac-on system –  Rich linguis-c features •  Several parse results: deep parser (Enju), dependency parser •  Dic-onaries •  Coreference resolu-on, domain adapta-on, filtering 8 hZp:// Miwa, M. & Ananiadou, S. (2015) Adaptable, high recall, event extrac-on system with minimal configura-on, BMC Bioinforma,cs, 16(10), S7 Miwa, M., Thompson, P. and Ananiadou, S. (2012) Boos+ng automa+c event extrac+on from the literature using domain adapta+on and coreference resolu+on. Bioinforma,cs, 28(13) Miwa, M., Pyysalo, S., Ohta, T. and Ananiadou, S. (2013). Wide coverage biomedical event extrac+on using mul+ple par+ally overlapping corpora. BMC Bioinforma<cs, 14(175)
  9. 9. Event interpreta-on for Binding Protein in binding to MUC1 Theme 1 RAS suggest Results Event trigger PKM2 Protein Theme 2 Event argument En-ty argument Chemical is Regula<on BRAF required Cause not that SIMPLE EVENT COMPLEX EVENT Theme *Complex events have at lest one argument that is an event on its own Event trigger
  10. 10. Event interpreta-on •  Supports users of search systems –  Discovery of new knowledge, research hypotheses –  Detec-on of uncertainty as confidence measure •  Mul-ple dimensions (meta-knowledge) –  Knowledge Type (observa-on, inves-ga-on, analysis, method, fact) –  Knowledge Source (current, other) –  Polarity (posi-ve, negated) 10 Thompson, P., Nawaz, R., McNaught, J. and Ananiadou, S. 2011. Enriching a biomedical event corpus with meta-knowledge annota+on. BMC Bioinforma<cs 12, 393
  11. 11. (Un)certainty: a measure of confidence •  Is this a fact, a hypothesis, a speculated outcome, a case under inves-ga-on, a certain or uncertain interac-on? •  How this informa+on can help pathway construc+on? 11 Zerva, C., Ba-sta-Navarro, R., Day, P. and S. Ananiadou (2017) Using uncertainty to link and rank evidence from biomedical literature for model reconstruc-on, Bioinforma+cs
  12. 12. Uncertainty Examples from Big Mechanism data These results indicate that FLCN can interact directly with RagA via its GTPase domain. Altogether, these results show that cobalt could affect both p53 and HIPK2 ac-vity. To test if endogenous hPGAM5 interacts with hPINK1, we first generated an an--hPGAM5 an-body We hypothesize that unphosphorylated cdr2 interacts with c-myc to prevent c-myc degrada-on Therefore, AFP may interact with STAT3 in the signal pathway for chemotherapeu-c efficiency of agents on AFPGC. These data suggest that PI3K and βARK1 form a macromolecular complex within the cell. Therefore, LiCl might inhibit GSK3β in different ways We then examined whether netrin-2 enhances the interac-on between Cdo and S-m1.
  13. 13. Uncertainty cues 13 BioNLP-ST, GENIA-MK
  14. 14. Hybrid model: Machine learning + Rules Hybrid model Machine Learner (Random Forest) 1.  Lexical (e.g. cues, POS tags, event-trigger surface form) 2.  Syntac-c (e.g. shortest path, dependency cue-trigger) 3.  Seman-c (e.g. event type, argument type/role) Automated Rule Induc<on (from corpus) 1.  EventMine (to iden-fy event triggers) 2.  Enju (to iden-fy dependencies) 3.  Cue lists BioNLP-ST GENIA-MK
  15. 15. Dependency rela-ons •  Dependency rela-ons between cues and event triggers •  Rule induc-on: generic rule paZerns capturing dependency rela-ons between cues and trigger words
  16. 16. Rule induc-on 16
  17. 17. Dealing with mul-ple event men-ons
  18. 18. Results •  Event-annotated corpora: All results obtained using 10-fold cross valida-on
  19. 19. Evalua-on - pathway models •  B-cell acute lymphoblas-c leukemia model (Pathway studio) – 72 interac-ons, 260 evidence passages manually selected – 12% flagged uncertain by our system
  20. 20. Results Leukemia Pathway (7 annotators) ~ Pathway Studio •  Average accuracy on sentence level: 0.96 •  Average accuracy on interac-on level: 0.87 –  1-20 sentences per interac+on
  21. 21. Event interpreta-on •  Uncertainty scoring as an expressive confidence measure •  Hybrid framework •  Value for each event men-oned in a sentence –  Consolidated uncertainty values from different papers •  Effort to decrease manual effort and select more certain events 21
  22. 22. Deep Reading: Integra-ng uncertainty •  LitPathExplorer –  Visual analy-cs tool; maps events from literature to pathway interac-ons –  Includes uncertainty measure •  Robot Scien-st –  Selec-on of Gene expression and Regula+on events for wet- lab experiments –  Selec-on of interac-ons (using LitPathExplorer) to assemble network and predict drug effect on cell-lines
  23. 23. LitPathExplorer: a confidence-based tool for exploring pathway models 1.  Enabling flexible search and explora-on of biomolecular pathway networks –  different views of the data –  various interac-ve func-onali-es 2.  Provide a means for making exis-ng evidence in the scien-fic literature available to support corrobora-on 3.  Facilitate the discovery of new interac-ons that are not yet part of a given model 4.  Allow the user to become an ac-ve par-cipant of the analy-cal process quan-fy confidence in the events 23 Video: hZp://
  24. 24. 1. Search •  A pathway model can be searched by providing: •  event types, •  en--es, •  and/or roles for each en-ty in the reac-on •  Mul-ple queries can be combined in a Boolean search 24
  25. 25. 2. Network viewer Reading against the model 25 En--es Reac-ons/ Events •  Colour encodes event type •  Size encodes confidence
  26. 26. 3. Inspector, event confidence computa-on 26 Mapping IDs for en--es and events Overall event confidence
  27. 27. 3. Inspector, quan-fying the confidence 27 Confidence breakdown
  28. 28. Adjus-ng event confidence 28
  29. 29. 4. Text Analyzer – Ar-cles & sentences 29 Sentence-level language confidence Ar-cle-level language confidence
  30. 30. 4. Word tree visualisa-on: Contrast event men-ons across the corpus 30 Sentences can be inspected further upon interac-on Ver-cal arrangement and gray scale denotes event confidence
  31. 31. Use case •  A pathway model •  contains reac-ons involving the Ras protein •  output of querying PathwayCommons for one- and two-hop reac-ons centred on Ras •  A corpus of 12,660 full papers •  Retrieved from the PubMed Central Open Access repository •  using as queries “breast cancer” and its synonyms as keywords, combined with names of breast cancer cell lines, e.g., “T-47D”, “MCF-7” (and their variants). •  Methods for event extrac-on, model mapping and confidence computa-on were applied on the events extracted from the corpus 31
  32. 32. Network Viewer: Discovery mode Extending the model with events found in the literature 32
  33. 33. 33 Discovery mode Difficult to explore when too many candidate events are found
  34. 34. Verifying men-ons in text 34 hZp://
  35. 35. Na#onal Centre for Text Mining •  1st publicly funded na-onal text mining centre •  Loca-on: Manchester Ins-tute of Biotechnology •  Since 2004- •  Fully sustainable 2011- •  Biology, Medicine, Biodiversity, Humani-es, Social Sciences BBSRC, AHRC, EPSRC, MRC, JISC, NIH, DARPA, H2020 AZ, Unilever, Pfizer, Elsevier, Nature, BBC, KISTI, AIST