C-SHALS 2010: representing scientific discourse, or: why triples are not enough


On semantic annotation for science publications

  1. 1. representing scientific discourse, or: why triples are not enough Anita de Waard Disruptive Technologies Director, Elsevier Labs Casimir Researcher, Utrecht Institute of Linguistics
  2. 2. what is your problem?
  3. 3. why triples are not enough (1): commercial tool insulin maintaining glucose homeostasis When insulin secretion cannot be increased adequately (type I diabetes defect) to overcome insulin resistance in maintaining glucose homeostasis, hyperglycemia and glucose intolerance ensues. insulin may be involved glucose homeostasis Because PANDER is expressed by pancreatic beta-cells and in response to glucose in a similar way to those of insulin, PANDER may be involved in glucose homeostasis. the triples are often wrong. you cannot check if they are true.
  4. 4. why triples are not enough (2): biocreative challenge compare: - In Xenopus oocyte maturation, cytoplasmic polyadenylation mediated by cytoplasmic polyadenylation element binding protein (CPEB) induces the translation of maternal mRNA [5]. - In mouse testis, another novel member of the CPEB protein family (CPEB2) and a homolog of xGLD-2 (mGLD-2) have been identified [7] and [8] to: - TPAP was present in GSG1 immunoprecipitates (Fig. 2B). The in vivo data suggest that TPAP–GSG1 interactions occur in mammalian cells. how do you know this is true? what is new?
  5. 5. why triples are not enough (3): medie how do you know this is true? what is new? Alteration of nm23, P53, and S100A4 expression may contribute to the development of gastric Previous studies have implicated miR-34a as a tumor suppressor gene whose transcription is activated by p53.
  6. 6. why is this so difficult?
  7. 7. issue # 1: science is rhetoric Aristotle Quintilian Scientific Paper The introduction of a speech, where one announces the subject Introduction and purpose of the discourse, and where one usually employs Introduction: prooimion / exordium the persuasive appeal to ethos in order to establish positioning credibility with the audience. Statement of The speaker here provides a narrative account of what has Introduction: research prothesis Facts/narratio happened and generally explains the nature of the case. question Summary/ The propositio provides a brief summary of what one is about   propostitio to speak on, or concisely puts forth the charges or accusation. Summary of contents Proof/ The main body of the speech where one offers logical pistis confirmatio arguments as proof. The appeal to logos is emphasized here. Results Refutation/ As the name connotes, this section of a speech was devoted to   refutatio answering the counterarguments of one's opponent. Related Work Following the refutatio and concluding the classical oration, the Discussion: summary, epilogos peroratio  peroratio conventionally employed appeals through pathos, and often included a summing up. implications. - goal of the paper is to be published; it uses us as a host system - format has co-evolved as predator-prey system with reviewers
  8. 8. issue # 2: science is a story Story Grammar The Story of Goldilocks Paper The AXH Domain of Ataxin-1 Mediates and the Three Bears Grammar Neurodegeneration through Its Interaction with Gfi-1/Senseless Proteins Setting Time Once upon a time Background The mechanisms mediating SCA1 pathogenesis are still not fully understood, but some general principles have emerged. Character a little girl named Goldilocks Objects of the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract, study Location She went for a walk in the forest. Pretty soon, she came upon a Experimental studied and compared in vivo effects and interactions to those of house. setup the human protein Theme Goal She knocked and, when no one Research Gain insight into how Atx-1's function contributes to SCA1 answered, goal pathogenesis. How these interactions might contribute to the disease process and how they might cause toxicity in only a subset Attempt she walked right in. Hypothesis Atx-1 may play a role in the regulation of gene expression of neurons in SCA1 is not fully understood. Episode Name At the table in the kitchen, there Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When were three bowls of porridge. Overexpressed in Files Subgoal Goldilocks was hungry. Subgoal test the function of the AXH domain Attempt She tasted the porridge from the Method overexpressed dAtx-1 in flies using the GAL4/UAS system (Brand first bowl. and Perrimon, 1993) and compared its effects to those of hAtx-1. Outcome This porridge is too hot! she Results Overexpression of dAtx-1 by Rhodopsin1(Rh1)-GAL4, which drives exclaimed. expression in the differentiated R1-R6 photoreceptor cells (Mollereau et al., 2000 and O'Tousa et al., 1985), results in neurodegeneration in the eye, as does overexpression of hAtx-1 [82Q]. Although at 2 days after eclosion, overexpression of either So, she tasted the porridge from Atx-1not shown), (data does not show obvious morphological changes in the Attempt Data the second bowl. photoreceptor cells Outcome This porridge is too cold, she said Results both genotypes show many large holes and loss of cell integrity at 28 days Attempt So, she tasted the last bowl of Data (Figures 1B-1D). porridge. Outcome Ahhh, this porridge is just right, Results Overexpression of dAtx-1 using the GMR-GAL4 driver also induces
  9. 9. issue #3: science happens in language (and language happens in our heads) Figure 4A shows that following RASV12 stimulation, p53 was stabilized and activated, and its target gene, p21cip1, was induced in all cases, indicating an intact p53 pathway in these cells. a. Figure 4a shows that Intratextual b. following RASV12 stimulation Method c. p53 was stabilized and activated Result d. and the target gene, p21cip1, was induced in all cases, Result e. indicating an intact p53 pathway in these cells. Implication
  10. 10. language happens in our head: tense use in biology Conceptual Realm: State Pre s e n t Argumentational Realm: Instantaneous Presen t Discourse Progression Axi s : Instantaneous Present Research Progression Axis: Experimental Realm: Present Per f e c t Event Past
  11. 11. tense use in science and mythology Facts in the Endogenous small RNAs (miRNAs) regulate I sing of golden-throned Hera whom Rhea eternal gene expression by mechanisms conserved bare. Queen of the immortals is she, surpassing present across metazoans. all in beauty: she is the sister and the wife of loud-thundering Zeus, --the glorious one whom all the blessed throughout high Olympus reverence and honor. Events in the Vehicle-treated animals spent equivalent Now the wooers turned to the dance and to simple past time investigating a juvenile in the first and gladsome song, and made them merry, and second sessions in experiments waited till evening should come; and as they conducted in the NAC and the striatum: made merry dark evening came upon them. T1 values were 122 ± 6 s and 114 ± 5 s. Events with We also generated BJ/ET cells expressing And she took her mighty spear, tipped with embedded the RASV12-ERTAM chimera gene, which is sharp bronze, heavy and huge and strong, facts only active when tamoxifen is added (De wherewith she vanquishes the ranks of men-of Vita et al, 2005). warriors, with whom she is wroth, she, the daughter of the mighty sire. Attribution in miRNAs have emerged as important In this book I have had old stories written the present regulators of development and control down, as I have heard them told by intelligent perfect processes such as cell fate determination people, concerning chiefs who have held and cell death (Abrahante et al., 2003, dominion in the northern countries, and who Brennecke et al., 2003, Chang et al., 2004, spoke the Danish tongue; and also concerning Chen et al., 2004, Johnston and Hobert, some of their family branches, according to 2003, Lee et al., 1993, ... what has been told me. Implications These results indicate that although Now it is said that ever since then are hedged, miR-372&3 confer complete protection to whenever the camel sees a place where ashes and in the oncogene-induced senescence in a manner have been scattered, he wants to get revenge present tense similar to p53 inactivation, the cellular with his enemy the rat and stomps and rolls in response to DNA damage remains intact the ashes hoping to get the rat
  12. 12. #4: ‘A fact is a claim, agreed by a committee’ Yabuta, JBioChem 2007 miR-372 and miR-373 target the Lats2 tumor suppressor To investigate the possibility Voorhoeve et al., (Voorhoeve et al., 2006) that miR-372 and miR-373 suppress the expression of LATS2, we... 2006 Raver-Shapira et.al, JMolCell 2007 Therefore, these results point to two miRNAs, miRNA-372 and-373, function as LATS2 as a mediator of the miR-372 and potential novel oncogenes in testicular germ cell miR-373 effects on cell proliferation and tumors by inhibition of LATS2 expression, which tumorigenicity, suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006). KnownFact KnownFact Concepts Hypothesis Implication Fact Goal Goal Method Result Method Result Data Data Experiment 1 Experiment 2
  13. 13. so should we just keep reading papers?
  14. 14. possible representation: hypotheses, evidence and relationships PHC undergo Growth arrest Paper A: Paper B: implication implication g n nin method method link fact rpi method de fact un goal fact goal fact results results data 1 data 4 data 2 data 3 data 5 data 6
  15. 15. HYPotheses, Evidence and Relationships - Goal: Align and expand existing efforts on detection and analysis of Hypotheses, Evidence & Relationships - Partners: - Harvard/MGH: SWAN, ARF Hypothesis 22: Intramembrenous Aβ dimer may be toxic. - Open University: Cohere Derived from: POSTAT_CONTRIBUTION(This essay explores the possibility that a fraction of these Abeta peptides never leave the membrane lipid bilayer after they are - Oxford University: CiTO, eLearning/Rhetoric generated, but instead exert their toxic effects by competing with and compromising the functions of intramembranous segments of membrane-bound proteins that serve - many criticalaTags DERI: SALT, functions. - University of Trento: LiquidPub - Xerox Research: XIP hypothesis identifier - U Tilburg: ML for Science - Elsevier, UUtrecht: Discourse analysis of biology
  16. 16. W3C HCLS Sig Rhetorical Document Task - Part of subgroup on discourse structure - Goal: come up with a format for authors to explicitly create rhetorical/argumentational structure - Make life of annotators easier! - Please correct our ‘Pharma Use case’! - http://esw.w3.org/topic/HCLSIG/SWANSIOC/Actions/ RhetoricalStructure/
  17. 17. some things elsevier is doing
  18. 18. collection three dimensions of annotation data document claim triple Automated Copy Editing entity author/editor typesetter/EW/SD reader/curator/data mining manual Reflect semi-automated automated
  19. 19. .XMP RDF in all our PDFs: Dublin Core + PRISM
  20. 20. Linked Data for Elsevier XML but we all know she was wrong that day said @anita on Feb 25 2010 this section argues that <ce:section id=#123> ‘the moon is made of cheese’ immutable, $$, proprietary dynamic, personal, task-driven, - open?
  21. 21. some things to mark up: EMTREE
  22. 22. ‘Community effort to establish an open, independent registry of Researcher Identifiers’
  23. 23. questions? a.dewaard@elsevier.com http://elsatglabs.com/labs/anita @anitawaard on Twitter