Suppor&ng Scien&ﬁc Sensemaking Anita de Waard VP Research Data Collabora&ons, Elsevier firstname.lastname@example.org Visit Microso* Research, January 23, 2013
Outline • A model of scien&ﬁc sensemaking: – Stories, that persuade with data – Discourse segments and verb tense • Towards extrac&ng claim-‐evidence networks: – Hedging in science – Crea&ng claim-‐evidence networks • Data: – Why life is so complicated – Connec&ng biological experiments into collaboratories
A paper is a story… Story Grammar The Story of Goldilocks and Paper The AXH Domain of Ataxin-1 Mediates the Three Bears Grammar Neurodegeneration through Its Interaction with Gﬁ-1/ Senseless Proteins Setting Time Once upon a time Background The mechanisms mediating SCA1 pathogenesis are still not fully understood, but some general principles have emerged. Character a little girl named Goldilocks Objects of the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract, study Location She went for a walk in the forest. Pretty soon, she came upon a Experimental studied and compared in vivo effects and interactions to those of the house. setup human protein Theme Goal She knocked and, when no one Research Gain insight into how Atx-1s function contributes to SCA1 answered, goal pathogenesis. How these interactions might contribute to the disease process and how they might cause toxicity in only a subset of neurons in SCA1 is not fully understood. Attempt she walked right in. Hypothesis Atx-1 may play a role in the regulation of gene expression Episode Name At the table in the kitchen, there Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When Overexpressed were three bowls of porridge. in Files Subgoal Goldilocks was hungry. Subgoal test the function of the AXH domain Attempt She tasted the porridge from the Method overexpressed dAtx-1 in ﬂies using the GAL4/UAS system (Brand and ﬁrst bowl. Perrimon, 1993) and compared its effects to those of hAtx-1. Outcome This porridge is too hot! she Results Overexpression of dAtx-1 by Rhodopsin1(Rh1)-GAL4, which drives exclaimed. expression in the differentiated R1-R6 photoreceptor cells (Mollereau et al., 2000 and OTousa et al., 1985), results in neurodegeneration in Attempt So, she tasted the porridge from the the eye, as does overexpression of hAtx-1[82Q]. Although at 2 days second bowl. after eclosion, overexpression of either Atx-1 does not show obvious morphological changes in the photoreceptor cells Outcome This porridge is too cold, she said Data (data not shown), Attempt So, she tasted the last bowl of porridge. Results both genotypes show many large holes and loss of cell integrity at 28 days Outcome Ahhh, this porridge is just right, she (Figures 1B-1D).
…that persuades… Aristotle Quin-lian Scien-ﬁc Paper The introducon of a speech, where one announces the subject Introducon and purpose of the discourse, and where one usually employs Introducon: prooimion / exordium the persuasive appeal to ethos in order to establish credibility posioning with the audience. Statement of The speaker here provides a narrave account of what has Introducon: research prothesis Facts/ happened and generally explains the nature of the case. narrao queson Summary/ The proposio provides a brief summary of what one is about proposo to speak on, or concisely puts forth the charges or accusaon. Summary of contents Proof/ The main body of the speech where one oﬀers logical piss conﬁrmao arguments as proof. The appeal to logos is emphasized here. Results Refutaon/ As the name connotes, this secon of a speech was devoted to refutao answering the counterarguments of ones opponent. Related Work Following the refutao and concluding the classical oraon, the Discussion: summary, epilogos perorao perorao convenonally employed appeals through pathos, and oUen included a summing up. implicaons. Goal of the paper is to be published; it uses author/journal as a host Format has co-‐evolved: predator-‐prey relaonship with reviewers
In defense of the clause as the unit of thought: 1. Importantly, our results so far indicate that the expression of miR-‐3723 did not reduce the acvity of RASV12, as these cells were sll growing faster than normal cells and were tumorigenic, for which RAS acvity is indispensable (Hahn et al, 1999 and Kolfschoten et al, 2005). 2. To shed more light on this aspect, we examined the eﬀect of miR-‐3723 expression on p53 acvaon in response to oncogenic smulaon. 3. We used for this experiment BJ/ET cells containing p14ARFkd because, following RASV12 treatment, in those cells p53 is sll acvated but more clearly stabilized than in parental BJ/ET cells (Voorhoeve and Agami, 2003), resulng in a sensized system for slight alteraons in p53 in response to RASV12. 4. Figure 4A shows that following RASV12 smulaon, p53 was stabilized and acvated, and its target gene, p21cip1, was induced in all cases, indicang an intact p53 pathway in these cells. • More than one ‘thought unit’ per sentence. • Verb tense changes within sentence (several mes). • Airibuon, acons/states, and preposions all contained within a sentence.
In defense of the clause as the unit of thought: 1. Importantly, our results so far indicate that the expression of miR-‐3723 did not reduce the acvity of RASV12, as these cells were sll growing faster than normal cells and were tumorigenic, for which RAS acvity is indispensable (Hahn et al, 1999 and Kolfschoten et al, 2005). 2. To shed more light on this aspect, we examined the eﬀect of miR-‐3723 expression on p53 acvaon in response to oncogenic smulaon. 3. We used for this experiment BJ/ET cells containing p14ARFkd because, following RASV12 treatment, in those cells p53 is sll acvated but more clearly stabilized than in parental BJ/ET cells (Voorhoeve and Agami, 2003), resulng in a sensized system for slight alteraons in p53 in response to RASV12. 4. Figure 4A shows that following RASV12 smulaon, p53 was stabilized and acvated, and its target gene, p21cip1, was induced in all cases, indicang an intact p53 pathway in these cells. Head: premise, movaon, Middle: main End: interpretaon, elaboraon, airibuon (matrix clause) biological statement airibuon (reference)
In defense of the clause as the unit of thought: 1. Importantly, our results so far indicate that the expression of miR-‐3723 did not reduce the acvity of RASV12, as these cells were sll growing faster than normal cells and were tumorigenic, for which RAS acvity is indispensable (Hahn et al, 1999 and Kolfschoten et al, 2005). 2. To shed more light on this aspect, we examined the eﬀect of miR-‐3723 expression on p53 acvaon in response to oncogenic smulaon. 3. We used for this experiment BJ/ET cells containing p14ARFkd because, following RASV12 treatment, in those cells p53 is sll acvated but more clearly stabilized than in parental BJ/ET cells (Voorhoeve and Agami, 2003), resulng in a sensized system for slight alteraons in p53 in response to RASV12. 4. Figure 4A shows that following RASV12 smulaon, p53 was stabilized and acvated, and its target gene, p21cip1, was induced in all cases, indicang an intact p53 pathway in these cells. Regulatory Fact Goal Method Result Implicaon clause
Clause, realm and tense: Conceptual Both seminomas and the EC component ofofBoth seminomas and the EC component knowledge Fact nonseminomas share features withwithcells. cells. Tononseminomas share features ES ESTo exclude thatthe detection of miR-371-3 merelyexclude that Goal the detection of miR-371-3 in ES cells, we testedreflects its expression pattern merely reflects its Hypothesis expression pattern in ES cells,by RPA miR-302a-d, another ES cells-specificwe tested by RPA miR-302a-d, another ES cells-miRNA cluster (Suh et al, 2004). In many of thespecific miRNA clustere(Suhn g al,s2004). o m a s a n d Method m i R - 3 7 1 - 3 e x p r s s i et emin Experimental In many of the miR-371-3 expressing seminomas (Figsnonseminomas, miR-302a-d was undetectable Evidence and nonseminomas, miR-302a-d was undetectableS7 and S8), suggesting that miR-371-3 expression is Result (Figs S7 and S8),a selective event during tumorigenesis.suggesting that Reg-‐Implicaon miR-371-3 expression is a selective event during Implicaon tumorigenesis.
Clause, realm and tense: Concepts, models, ‘facts’: Present tense Fact Problem Implicaon (1) Both seminomas (3) c. miR-371-3 (2) b. the detection ofand the EC component expression is a miR-371-3 merelyof nonseminomas selective event reflects its expressionshare features with ES during pattern in ES cells,cells. tumorigenesis. Goal Regulatory-‐Implicaon (3) b. suggesting (2) a. To exclude that Transions: present tense that Method Result (3) a. In many of the miR-371-3 (2) c. we tested by RPA expressing seminomas and miR-302a-d, another ES nonseminomas, miR-302a-d cells-specific miRNA cluster was undetectable (Figs S7 and (Suh et al, 2004). S8), Experiment: Past tense
Tense use in science and mythology: Facts in the Endogenous small RNAs (miRNAs) regulate I sing of golden-‐throned Hera whom Rhea bare. eternal present gene expression by mechanisms conserved Queen of the immortals is she, surpassing all in across metazoans. beauty: she is the sister and the wife of loud-‐ thundering Zeus, -‐-‐the glorious one whom all the blessed throughout high Olympus reverence and honor. Events in the Vehicle-‐treated animals spent equivalent Now the wooers turned to the dance and to simple past me invesgang a juvenile in the ﬁrst and gladsome song, and made them merry, and waited second sessions in experiments conducted in ll evening should come; and as they made merry the NAC and the striatum: T1 values were dark evening came upon them. 122 ± 6 s and 114 ± 5 s. Events with We also generated BJ/ET cells expressing the And she took her mighty spear, pped with sharp embedded RASV12-‐ERTAM chimera gene, which is only bronze, heavy and huge and strong, wherewith facts acve when tamoxifen is added (De Vita et al, she vanquishes the ranks of men-‐of warriors, with 2005). whom she is wroth, she, the daughter of the mighty sire. Aribu-on in miRNAs have emerged as important In this book I have had old stories wriien down, as the present regulators of development and control I have heard them told by intelligent people, perfect processes such as cell fate determinaon and concerning chiefs who have held dominion in the cell death (Abrahante et al., 2003, Brennecke northern countries, and who spoke the Danish et al., 2003, Chang et al., 2004, Chen et al., tongue; and also concerning some of their family 2004, Johnston and Hobert, 2003, Lee et al., branches, according to what has been told me. 1993] Implica-ons These results indicate that although Now it is said that ever since then whenever the are hedged, miR-‐3723 confer complete protecon to camel sees a place where ashes have been and in the oncogene-‐induced senescence in a manner scaiered, he wants to get revenge with his enemy present tense similar to p53 inacvaon, the cellular the rat and stomps and rolls in the ashes hoping to response to DNA damage remains intact get the rat
From ﬁcon to fact: Hedging “[Y]ou can transform .. ﬁcon into fact just by adding or subtracng references”, Bruno Latour • Voorhoeve et al., 2006: These miRNAs neutralize p53-‐ mediated CDK inhibion, possibly through direct inhibion of the expression of the tumor suppressor LATS2. • Kloosterman and Plasterk, 2006: In a genec screen, miR-‐372 and miR-‐373 were found to allow proliferaon of primary human cells that express oncogenic RAS and acve p53, possibly by inhibing the tumor suppressor LATS2 (Voorhoeve et al., 2006). • Yabuta et al., 2007: [On the other hand,] two miRNAs, miRNA-‐372 and-‐373, funcon as poten-al novel oncogenes in tescular germ cell tumors by inhibion of LATS2 expression, which suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006). • Okada et al., 2011: Two oncogenic miRNAs, miR-‐372 and miR-‐373, directly inhibit the expression of Lats2, thereby allowing tumorigenic growth in the presence of p53 (Voorhoeve et al., 2006).
Hedging in science: • Why do authors hedge? – Make a claim ‘pending […] acceptance in the community’  – ‘Create A Research Space’ – hedging allows authors to insert themselves into the discourse in a community  – ‘the strongest claim a careful researcher can make’  • Hedging cues, speculave language, modality/negaon: – Light et al : ﬁnding speculave language – Wilbur et al : focus, polarity, certainty, evidence, and direconality – Thompson et al : level of speculaon, type/source of the evidence and level of certainty • Senment detecon (e.g. Kim and Hovy  a.m.o.): – Holder of the opinion, strength, polarity as ‘mathemacal funcon’ acng on main proposional content – Wide applicaons in product reviews; but not (yet) in science!
A model for epistemic evaluaons: For a Proposion P, an epistemically marked clause E is an evaluaon of P, where EV, B, S(P), with: – V = Value: 3 = Assumed true, 2 = Probable, 1 = Possible, 0 = Unknown, (-‐ 1= possibly untrue, -‐ 2 = probably untrue, -‐3 = assumed untrue) – B = Basis: Reasoning Data – S = Source: A = speaker is author A, explicit IA = speaker author, A, implicit N = other author N, explicit NN = other author NN, implicit Model suggested by Eduard Hovy, Informaon Sciences Instute University South Califormia
Reporng verbs vs. epistemic value: Value = 0 establish, (remain to be) elucidated, (unknown) be (clear/useful), (remain to be) examined/determined, describe, make diﬃcult to infer, report Value = 1 be important, consider, expect, hypothesize (5x), give (hypothecal) insight, raise possibility that, suspect, think Value = 2 appear, believe, implicate (2x), imply, indicate (12x), play a (probable) role, represent, suggest (18x), validate (2x), Value = 3 be able/apparent/important /posive/visible, compare (presumed true) (2x), conﬁrm (2x), deﬁne, demonstrate (15x), detect (5x), discover, display (3x), eliminate, ﬁnd (3x), idenfy (4x), know, need, note (2x), observe (2x), obtain (success/ results-‐ 3x), prove to be, refer, report(2x), reveal (3x), see(2x), show(24x), study, view
Most prevalent clause type: These results suggest that... Adverb/Connecve thus, therefore, together, recently, in summary Determiner/Pronoun it, this, these, we/our Adjecve previous, future, beYer Noun phrase data, report, study, result(s); method or reference Modal form of ‘to be’, may, remain Adjecve o*en, recently, generally Verb show, obtain, consider, view, reveal, suggest, hypothesize, indicate, believe Preposion that, to
Ontology for Reasoning, Certainty and Airibuon  vocab.deri.ie/orca
Adding metadiscourse to triples: Biological statement with BEL/ epistemic BEL representa-on: Epistemic markup evalua-on These miRNAs neutralize p53-‐mediated CDK r(MIR:miR-‐372) -‐| Value = inhibion, possibly through direct inhibion (tscript(p(HUGO:Trp53)) -‐| Possible of the expression of the tumor-‐suppressor kin(p(PFH:”CDK Family”))) Source = LATS2. Increased abundance of Unknown miR-‐372 decreases Basis = abundance of LATS2 Unknown r(MIR:miR-‐372) -‐| r(HUGO:LATS2) Biological statement with Medscan/ MedScan Analysis: Epistemic epistemic markup evalua-on Furthermore, we present evidence that the IL-‐6 è NUCB2 (nesfan-‐1) Value = secreon of nesfaTn-‐1 into the culture Relaon: MolTransport Probable media was dramacally increased during the Eﬀect: Posive Source = diﬀerenaon of 3T3-‐L1 preadipocytes into CellType: Adipocytes Author adipocytes (P 0.001) and aUer treatments Cell Line: 3T3-‐L1 Basis = Data with TNF-‐alpha, IL-‐6, insulin, and dexamethasone (P 0.01).
Claim-‐Evidence example: Data2Semancs Goal: improve speed of integraon of research pracce Step 1: Patient data + diagnosis link to Guideline recommendation B. Elsevier-‐published A. Philips’ Electronic Patient Records Clinical Guideline Step 2: Guideline recommendation links to evidence in report or data C. Elsevier (or other publisher’s) Research Report or Data
Claim-‐Evidence Chains in Drug-‐drug wiide collecon oaf nd drug names in nteracons Step 1: Manually idenfy DDIs content sources Step 2: Develop a model of Drug-‐Drug Interacon and deﬁne candidates Step 3: Automate this process and store as Linked Data 20
Claimed Knowledge Updates Deﬁnion: 1) A CKU expresses a proposion about biological enes 2) A CKU is a new proposion 3) The authors present the CKU as factual: = Strength = Certainty 4) A CKU is derived from experimental work described in the arcle: = Basis = Data 5) The ownership is aiributed to the author(s) of the arcle. ⇒ Source = Author, Explicit Sandor/de Waard, 
A corpus for citaon analysis: Type Voorhoeve text CiTng text Method We subsequently created a human Voorhoeve et al. (116) employed a novel strategy by miRNA expression library (miR-‐Lib) by combining an miRNA vector library and corresponding bar cloning almost all annotated human code array Using a novel retroviral miRNA expression miRNAs into our vector (Rfam release library, 6) (Figure S3) Agami and co-‐workers performed a cell-‐based screen Result we idenﬁed miR-‐372 and miR-‐373, miR-‐372 and miR-‐373 were consequently found to permit each permi|ng proliferaon and proliferaon and tumorigenesis of these primary cells tumorigenesis of primary human carrying both oncogenic RAS and wild-‐type p53, cells that harbor both oncogenic Voorhoeve et al. (2006) idenﬁed miR-‐372 and miR-‐373 RAS and acve wild -‐ type p53. miR-‐372 has been recently described as potenal oncogene that collaborate with oncogenic RAS in cellular transformaon Interpretaon These miRNAs neutralize p53-‐ probably through direct inhibion of the expression of the mediated CDK inhibion, possibly tumor-‐suppressor LATS2 and subsequent neutralizaon of through direct inhibion of the the p53 pathway. expression of the tumor suppressor Compromised Lats2 funconality might reduce the selecve LATS2 . pressure for p53 inacvaon during tumor progression. Work done with Lucy Vanderwende
Data sharing in biology • Interspecies variability A specimen is not a species! • Gene expression variability Knowing genes is not knowing how they are expressed! • Microbiome An animal is an ecosystem! • Systems biology Whole is more than the sum of its parts! • Models vs. experiment Are we talking about the same things? In a way we can all use? • Dynamics Life is not in equilibrium! = Life is complicated! Reduconism doesn’t work for living systems. hip://en.wikipedia.org/wiki/File:Duck_of_Vaucanson.jpg
Stascs to the rescue! With enough observaons, trends and anomalies can be detected: • “Here we present resources from a populaon of 242 healthy adults sampled at 15 or 18 body sites up to three mes, which have generated 5,177 microbial taxonomic proﬁles from 16S ribosomal RNA genes and over 3.5 terabases of metagenomic sequence so far.” The Human Microbiome Project Consorum, Structure, funcon and diversity of the healthy human microbiome, Nature 486, 207–214 (14 June 2012) doi:10.1038/ nature11234 • “The large sample size — 4,298 North Americans of European descent and 2,217 African Americans — has enabled the researchers to mine down into the human genome.” Nidhi Subbaraman, Nature News, 28 November 2012, High-‐resoluon sequencing study emphasizes importance of rare variants in disease.
Enable ‘incidental collaboratories’: • Collect: store data at the level of the experiment: – Accessible through a single interface – Add enough metadata to know what was done/seen • Connect: allow analyses over: – Similar experiment types – Experiments done with/on similar biological ‘things’ (species, strains, systems, cells etc.) – In a way that can be used by modelers! • Keep: – Long-‐term preservaon of data and soUware – Fulﬁll Data Management Plan requirements – Allow ‘gated’ access when and to whom researcher wants
Let’s look at a typical lab: • How to get the right anbody IDs • And messy bits • From the lab notebook • Into the PI’s command center?
Objecons and rebuials re. data sharing Objec-on: Rebual: “But our lab notebooks are all on Develop smart phone/tablet apps for data paper” input “I need to see a direct beneﬁt from Develop ‘data manipula-on dashboard’ for something I spend my me on” PI to allow beier access to full experimental output for his/her lab “I want things to be peer reviewed Allow reviewers access to experimental before I expose them” database before publicaon (of data or paper) “I don’t really trust anyone else’s Add a social networking component to this data – well, except for the guys I data repository so you know who (to the went to Grad School with…” individual) created that data point. “I am afraid other people = Reward system moves from a might scoop my discoveries” compe--on to a ‘shared mission’
Problem: biological research is quite insular • Biology is small: size 10^-‐5 – 10^2 m, scienst can work alone (‘King’ and ‘subjects’). • Biology is messy: it doesn’t happen Prepare behind a terminal. • Biology is compeve: many Ponder Observe people with similar skill sets, Communicate vying for the same grants Analyze • In summary: the structure of biological research does not inherently promote collaboraon (vs., for instance, big physics or astronomy).
So we can do joint experiments: Across labs, experiments: track reagents and how they are used Observaons Observaons Observaons Prepare Prepare Analyze Communicate Analyze Communicate
So we can do joint experiments: Compare outcome of interacons with these enes Observaons Observaons Observaons Prepare Prepare Analyze Communicate Analyze Communicate
So we can do joint experiments: Build a ‘virtual reagent spectrogram’ by comparing how diﬀerent enes Observaons interacted in diﬀerent experiments Observaons Observaons Prepare Prepare Analyze Communicate Analyze Communicate
Elsevier Research Data Services: 1. Help increase the amount of data shared from the lab, enabling incidental collaboratories 2. Help increase the value of the data shared by increasing annotaon, normalizaon, provenance enabling enhanced interoperability 3. Help measure and deliver credit for shared data, the researchers, the instute, and the funding body, enabling more sustainable pla‚orms
Summary – Possible Collaboraons? • A model of scienﬁc sensemaking: Thesis: joint – Stories, that persuade with data research? – Discourse segments and verb tense • Towards claim-‐evidence networks: Labs: research collaboraons? – Hedging in science – Creang claim-‐evidence networks • Data: RDS: joint – Why life is so complicated development? – Connecng experiments into collaboratories
References:  J Am Med Inform Assoc. 2010 September; 17(5): 514–518 hip://dx.doi.org/10.1136/jamia.2010.003947  Quanzhi Li, Yi-‐Fang Brook Wu (2006): Idenfying important concepts from medical documents, Journal of Biomedical Informacs 39 (2006) 668–679  Useful list of resources in bioinformacs hip://www.bioinformacs.ca/  Biological Expression Language – hip://www.openbel.org  Latour, B. and Woolgar, S., Laboratory Life: the Social Construcon of Scienﬁc Facts, 1979, Sage Publicaons  Light M, Qiu XY, Srinivasan P. (2004). The language of bioscience: facts, speculaons, and statements in between. BioLINK 2004: Linking Biological Literature, Ontologies and Databases 2004:17-‐24.  Wilbur WJ, Rzhetsky A, Shatkay H (2006). New direcons in biomedical text annotaons: deﬁnions, guidelines and corpus construcon. BMC Bioinformacs 2006, 7:356.  Thompson P., Venturi G., McNaught J, Montemagni S, Ananiadou S. (2008). Categorising modality in biomedical texts. Proc. LREC 2008 Wkshp Building and Evaluang Resources for Biomedical Text Mining 2008.  Kim, S-‐M. Hovy, E.H. (2004). Determining the Senment of Opinions. Proceedings of the COLING conference, Geneva, 2004.  de Waard, A. and Schneider, J. (2012) Formalising Uncertainty: An Ontology of Reasoning, Certainty and Airibuon (ORCA), Semanc Technologies Applied to Biomedical Informacs and Individualized Medicine workshop at ISWC 2012 (submiYed)  Data2Semancs project: hip://www.data2semancs.org/  Boyce R, Collins C, Horn J, Kalet I. (2009) Compung with evidence Part I: A drug-‐mechanism evidence taxonomy oriented toward conﬁdence assignment. J Biomed Inform. 2009 Dec;42(6):979-‐89. Epub 2009 May 10, see also hip://dbmi-‐icode-‐01.dbmi.pii.edu/dikb-‐evidence/front-‐page.html