Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Reinventing the Research Article - Seven Challenges in Science Publishing Anita de Waard Researcher Disruptive Technologies, Elsevier Labs NWO - Casimir Grantee, Utrecht University
  2. 2. Seven ’known knowns’ in online science publishing: 1. The internet has caused an information overload. 2. Science papers contain facts. 3. The narrative research article is outdated and needs to be replaced. 4. Since words contain meaning, 5. And words (and logic) contain scientific fact, 6. We just need to model them with xml + rdf; 7. And the publishers should stop making all these papers.
  3. 3. 1. The internet has caused an information overload - My own experience (as a researcher): - Easy: find what I know exists - OK: Finding things I expect hope exist - Hard: making sure I haven’t missed anything - However, none of these make me feel overwhelmed. - Infuriating: - Trying to respond to people who ask me something - Managing three email accounts on 4 computers - Following up on plans and projects - However, we can improve the delivery of science content online.
  4. 4. 1. The internet has caused an information overload - Pick (carve out) a first set of user needs, e.g.: - Locate - Understand - Believe (Be convinced) - Explore - But this does not address WHAT you want to Locate, Understand, .. - Semantic network in pharmacology: ‘Grey out what I already know’ 1. How can we model a user’s interest?
  5. 5. 2. Science papers contain facts - With FEBS Letters Editorial Office in Heidelberg/ MINT Database in Rome - Structured Digital Abstract [Gerstein et. al]: ‘machine-readable XML summary of pertinent facts’ - For FEBS: provide proteins, methods, protein-protein interactions, as given in MINT: - 2008: authors provide, editors check - 2009: Word Plug-in tool suggests, authors (and editors) check 2. Can we create an ontology of doubt?
  6. 6. 2. Science papers contain facts 2. Can we create an ontology of doubt?
  7. 7. 3. The narrative RA should be replaced Aristotle Quintilian Cell APA Style Guide The introduction of a speech, where one announces the subject and purpose prooimion Introduction exordium of the discourse, and where one usually employs the persuasive appeal of Introduction Introduction ethos in order to establish credibility with the audience. The second part of a classical oration, following the introduction or exordium. The speaker here provides a narrative account of what has happened and Statement of prothesis narratio generally explains the nature of the case. Quintilian adds that the narratio is Introduction Introduction Facts followed by the propositio, a kind of summary of the issues or a statement of the charge. Coming between the narratio and the partitio of a classical oration, the Summary propostitio propositio provides a brief summary of what one is about to speak on, or Abstract Abstract concisely puts forth the charges or accusation. Following the statement of facts, or narratio, comes the partitio or divisio. In Division/ this section of the oration, the speaker outlines what will follow, in accordance Table of partitio Article Outline outline with what's been stated as the status, or point at issue in the case. Quintilian Contents suggests the partitio is blended with the propositio and also assists memory. Following the division / outline or partitio comes the main body of the speech pistis Proof confirmatio where one offers logical arguments as proof. The appeal to logos is Results Methods, Results emphasized here. Following the the confirmatio or section on proof in a classical oration, comes Refutation refutatio the refutation. As the name connotes, this section of a speech was devoted to Discussion Discussion answering the counterarguments of one's opponent. Following the refutatio and concluding the classical oration, the peroratio epilogos peroratio conventionally employed appeals through pathos, and often included a Discussion Discussion summing up (see the figures of summary, below).
  8. 8. 3. The narrative RA should be replaced The Story of Goldilocks Story Grammar Paper The AXH Domain of Ataxin-1 Mediates and the Three Bears Neurodegeneration through Its Interaction with Gfi-1/ - Narrative is how stories are told; ‘the truth can onlly be told in Once upon a time Time Setting Background Senseless Proteins The mechanisms mediating SCA1 pathogenesis are still not fully stories’.... Characters a little girl named Goldilocks Objects of study understood, but some general principles have emerged. the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract, - Narrative is essential for persuasion studied andprotein in vivo effects and interactions to those of She went for a walk in the forest. Pretty soon, she came Location Experimental setup the human compared upon a house. 3. How can we represent narrative online? interactions contributes to SCA1 She knocked and, when no one Goal answered, Theme Research goal Gain insight into how Atx-1's function pathogenesis. How these might contribute to the disease process and how they might cause toxicity in only a subse of neurons in SCA1 is not fully understood. she walked right in. Attempt Hypothesis Atx-1 may play a role in the regulation of gene expression At the table in the kitchen, there Name Episode 1 Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When were three bowls of porridge. Overexpressed in Files Goldilocks was hungry. Subgoal Subgoal test the function of the AXH domain She tasted the porridge from Attempt Method overexpressed dAtx-1 in flies using the GAL4/UAS system (Brand the first bowl. and Perrimon, 1993) and compared its effects to those of hAtx-1. This porridge is too hot! she Outcome Results Overexpression of dAtx-1 by Rhodopsin1(Rh1)-GAL4, which drives exclaimed. expression in the differentiated R1-R6 photoreceptor cells (Mollereau et al., 2000 and O'Tousa et al., 1985), results in neurodegeneration in the eye, as does overexpression of hAtx-1 [82Q]. Although at 2 days after eclosion, overexpression of either Atx-1 does not show obvious morphological changes in the So, she tasted the porridge   Data (data not shown), photoreceptor cells from the second bowl. This porridge is too cold, she Outcome Results both genotypes show many large holes and loss of cell integrity at said 28 days So, she tasted the last bowl of   Data (Figures 1B-1D). porridge.
  9. 9. 4. Words contain meaning Sicilian? - ‘A word is worth a thousand pictures’ (Don Loritz) - The meaning of words occurs in context and is dependent on knowledge and experience - This is even more so in science: PSA = Prostate-Specific Antigen or Pot Smokers Association of America?
  10. 10. 4. Words contain meaning - Cognitive linguistics: language and cognition cannot be separated - language acts are cognitive acts - Lakoff, metaphor: ‘anger is heat’ - Meaning is created in the mind: a word is not (only) a ‘particle’ but (also) a ‘wave’: Hearing/reading is not unpacking a package, but resonating at a specific frequency - context is its medium - context-free language does not exist! 4. How do we model cognitive context?
  11. 11. 5. Words (and logic) contain scientific fact • “[Y]ou can transform a fact into fiction or a fiction into fact just by adding or subtracting references [and data]” – Bruno Latour, ‘Science in Action’,1987 24. M. Scheffner, B.A. Werness, J.M. Huibregtse, A.J. Levine and “We generated an MCF-7 P.M. Howley, The E6 oncoprotein encoded by human papillomavirus types 16 and 18 promotes the degradation of derivative that expresses the p53. Cell 63 (1990), pp. 1129–1136. SummaryPlus | Full Text + Links | PDF (1728 K) | Abstract + References in Scopus | HPV16 E6 protein, which Cited By in Scopus mediates degradation of p53 ([24]).” “In the presence of E6, p53 stabilization in response to IR was almost completely prevented in MCF-7 cells (Figure 1A).” Figure 1. Initiation and Maintenance of G1 Arrest Induced by IR(A) Stable MCF-7 clones containing either pCDNA3.1 (Neo) or pCDNA3.1-E6 were irradiated (20 Gy), and cellular protein extracts were made 2 hr later, separated on 10% SDS PAGE, and immunoblotted to detect p53 and cyclin D1 proteins. 11
  12. 12. 5. Words (and logic) contain scientific fact - Main goal of article is to persuade - The author is a medium that enables the article to get itself published (a la selfish gene/meme) - Essential persuasive elements are non-textual 5. How do we represent non-textual elements?
  13. 13. Discourse Segments - “A text is made up of Discourse Segments and the relations between them” - Grosz and Sidner, Mann-Thomson, Marcu, Swales - Discourse Segment Purpose: element that has a consistent rhetorical/pragmatic goal. - Define for Biological Research Article
  14. 14. 6. Just model the facts with xml + rdf A model of a biology research article: <EXPERIMENTS> <Experiment> <Header header="h1">p53-Independent Initiation of G1 Arrest Induced by IR</Header> <Fact fact="fa1" factref="br26">Since the transcriptional response by p53 is a relatively slow process,</Fact> <Problem problem="p1">we asked whether initiation of a G1 arrest following genotoxic stress requires p53. </Problem> <Method method="m1">We generated an MCF-7 derivative </Method> <Fact fact="fa2" factref="br24">that expresses the HPV16 E6 protein, which mediates degradation of p53 (<Bibref bib="br24">[24]</Bibref>).</Fact> <Result result="r1">In the presence of E6, p53 stabilization in response to IR was almost completely prevented in MCF-7 cells (<Figref figref="agami1.gif">Figure 1A).</Figref></Result> <Result result="r2">Consistent with this, no induction of p21cip1 by IR was seen in the E6-expressing MCF-7 cells <Figref figref="none.gif">(data not shown).</Figref></Result> ...
  15. 15. 12
  16. 16. Segments vs. Sections Introduction Method Results Discussion Total Fact 63 0 104 37 204 Problem 20 0 10 15 45 Goal 2 0 72 6 80 Method 2 all 129 6 137 Result 10 0 230 44 284 Implication 14 0 100 36 150 Hypothesis 10 0 33 26 69 Total 121 0 678 170 969
  17. 17. Segment Tense Fact Problem Goal Method Result Implication Hypothesis Present active 72 46% 27 60% 15 23% 7 7% 37 16% 69 51% 38 55% Present 5 3% 2 4% 2 3% 1 1% 1 0% 11 8% 1 1% passive Past active 18 11% 5 11% 11 17% 48 47% 122 54% 16 12% 8 12% Past passive 25 16% 2 4% 1 2% 17 17% 21 9% 1 1% 5 7% Future 2 1% 3 7% 0 0% 0 0% 1 0% 0 0% 0 0% Imperfect: "to" 13 8% 2 4% 32 50% 2 2% 20 9% 14 10% 7 10% Gerund ("ing") 22 14% 4 9% 3 5% 28 27% 23 10% 24 18% 10 14% Total 157 100% 45 100% 64 100% 103 100% 225 100% 135 100% 69 100%
  18. 18. Segment order Fact Hypothesis Problem Goal Method Result Implication End Total Start 18 3 1 8 2 2 4 0 38 Fact 83 22 13 17 9 31 12 1 188 Hypothesis 20 5 3 7 6 2 6 3 52 Problem 9 7 7 2 3 5 3 3 39 Goal 7 0 2 4 46 6 0 0 65 Method 13 2 3 10 25 54 3 0 110 Result 23 9 4 6 16 85 78 6 227 Implication 13 6 4 12 11 30 12 25 113 Total 186 54 37 61 118 215 118 38 827
  19. 19. Discourse: A Fact(ory) hypothetical realm: realm of activity: (might, would) (to test, to see) goal to problem results we realm of introduction method experience: past resulting in result incongruity/ignorance hypothesis suggests that discussion realm of models: fact fact fact implication present Shared view Own view discussion
  20. 20. 6. Just model the facts with xml + rdf - In practice: ScienceDirect does not use our XML... (shhh....) - At Elsevier: Project Harpoon: ‘stab’ the document with metadata, asynchronous, linked in (XPath/XQuery), distributed - In XML - how to access a phrase inside an article: - access inside a PDF by coordinates? Format, content changes - add IDs to every single element? Format, content, version changes? - How to represent relations, even if we know where they link? 6. How can we better model discourse elements (and relations)?
  21. 21. 7. And publishers should stop making all those papers. - 6 uses of a RA: - job application - report card - thesis - conference tickets - research assessment - and yes, by the way, reporting on scientific work. - Scientists are evaluated largely based on publications: this enables their production to be evaluated by non- specialists - This places an undue stress on quantity, conformity (for risk of being rejected), publishing for its own sake. 7. How can we disentangle communication and evaluation?
  22. 22. Seven ‘Known Unknowns’ in Online Science Publishing 1. How can we model a user’s interest? 2. Can we create an ontology of doubt? 3. How can we represent narrative online? 4. How do we model cognitive context? 5. How do we represent non-textual elements? 6. How can we better model discourse elements and relations? 7. How can we disentangle communication and evaluation?
  23. 23. http://www.elseviergrandchallenge.com/ The Elsevier Grand Challenge: Knowledge Enhancement in the Life Sciences is a contest created to improve the way scientific information is communicated and used. The contest invites members of the scientific community to describe and prototype a tool to improve the interpretation and identification of meaning in (online) journals and text databases relating to the life sciences. Specifically we are looking for new ways to: 1. improve the process/methods/results of creating, reviewing and editing scientific content 2. interpret, visualize or connect the knowledge more effectively, and/or 3. provide tools/ideas for measuring the impact of these improvements. Abstracts are now invited - Submissions will close on July 15th, 2008. -Finalists will be invited to present their vision papers in a public symposium, at which the Panel of Judges will announce the winners. -The first place winner will be awarded a cash prize of US$35,000 -The second place winner a cash prize of US$15,000. -All finalists will receive free trial access to ScienceDirect and Scopus for a year.
  24. 24. Unknown unknowns? Would you care to correct/contradict/join me? Anita de Waard, http://people.cs.uu.nl/anita anita@cs.uu.nl.