Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

Published in: Education, Technology


  1. 1. Next GenerationScientific Publishing:Challenges and DirectionsEuropean Bioinformatics Institute21 June 2013Tim ClarkMassachusetts General HospitalMassGeneral Institute of Neurodegenerative DiseaseHarvard Medical School© 2013 Massachusetts General Hospital
  2. 2. Contents• Historical background• What is a scientific article?• Some problems in scientific communication• Next generation scientific publishing (NGSP)• Taking NGSP forward• Conclusion
  3. 3. Historical background
  4. 4. Linear document format1665 2012
  5. 5. Origins of linear format• Linear format originated pre-1665 withpersonal correspondence amongstexperimentalists & mathematicians.• 1665 scientific paper format was transportedto the Web, PDFs• Lives in a complex ecosystem• Incomplete Web exploitation & transition• Tension between linear & object formats
  6. 6. circle @ Oxford 1640-59circle @ Gresham College, London 1645-60Royal Society 1660-present“Invisible Colleges”
  7. 7. Scientific journalsRoyal Society 1660-presentAcadémie des Sciences 1666-presentJan 1665Mar 1665
  8. 8. Then and nowprintingc. 1450ScientificJournal1665GeneralPostOffice1660IBM S/360 Internet Web1964 1980s 1991PrintcultureWebculture
  9. 9. InformationTechnologythe WebtheInternet
  10. 10. Incomplete transition to Web• Scientific article information model islimited, because it is mostly narrative.• Critical information should ideally becomputationally extractable and re-mixable.• Yet as humans we require narratives.• We need narratives + computable objects.
  11. 11. What is a scientificarticle?
  12. 12. Definition: A scientific article is a defeasible argumentfor assertions, based on a detailed narrative ofobservations, which are reproducible in principle,supported by exhibited data and supporting methods,and contextualized with other relevant findings in thedomain. It exists in a complex ecosystem oftechnologies, people and activities.
  13. 13. Defeasible argument• May be challenged and proven wrong.• May be “true” today but not tomorrow.• Inference to best explanation (IBE),abductive reasoning (Peirce), etc.• Defeasible reasoning is a big topic in AI.
  14. 14. Exhibited data...Philos Trans R Soc Lond 1(4):56Brain. 2010 Nov;133(Pt 11) 3336-3348.(at least, enough to be convincing!)
  15. 15. ...and reproduciblemethodsBoyle’s air pump, fromNew Experiments (1660)Illumina NGS system
  16. 16. Scientific communicationsecosystem
  17. 17. Interlocking systems of activity
  18. 18. Some problems inscientificcommunication c. 2013
  19. 19. Some problems in the ecosystem• Intractable publication volumes [1]• Invalid, distorted and copied citations [3,4,5]• Growing volume of retractions [5,6]• 2/3 of retractions due to misconduct [7]• Research non-reproducibility [8]• Lack of transparency in publication process [9]• Methods non-re-usability [10]• Flawed assessment metrics [11-12]
  20. 20. Non-reproduciblity11%Begley CG and Ellis LM, Nature 2012, 483(7391):531-533
  21. 21. Citation distortionadapted from supporting data, Greenberg SA, British Medical Journal 2009, 339:b2680
  22. 22. The copied citation• Citation analysis of one sample of publications (inethnobotany) found that “the majority of citingtexts do not consider the theoreticalcontributions made by the articles cited”.• I.e., author of Work A makes statement, cites WorkB, and then copies several references, unread, fromWork B as well, assuming they are relevant too.• Ramos et al. Scientometrics 2012, 92(3):711-719
  23. 23. Not to mention...• Closed access publishing model• Walled garden systems,• Text mining & remixing prohibitions, and• Insane rising costs imposed on libraries.• Open access publishing model• Researcher cost burden unaccounted forby funding agencies.
  24. 24. Some efforts at coping• Mandatory open access (US, UK, Universities)• Data access: archiving and citation, institutional datapolicies, “data papers”, etc. (various)• Methods: cataloging & annotation (NIF, publishers)• Open annotation (W3C Community) & tools• Velocity: Alzforum, StemBook, Open Wetware, blogs,webinars,Wikipedia coordination, etc.• Velocity: preprint servers (ArXiv, DASH, PMC, etc.)• Advocacy groups: FORCE11, DELSA, DORA, AmsterdamManifesto, etc.
  25. 25. Next GenerationScientific Publishing
  26. 26. What does NextGen ScientificPublishing look like?• There is transparency of all data & methods.• Big data + small data (the very long tail).• Articles are deconstructable * text-minable *remixable * computable.• Information moves quickly and is verifiable.• Open annotation for narrative + objects.• There are no walled gardens: a service-oriented open-access economy.
  27. 27. Data re-usability• The main reason to exhibit data is not necessarilyto reuse is (minimally) to prove that1. you have it and are willing to show it,2. it is reasonable to think that you derived it as yousay you did, and you openly share these methods.• Data that is re-usable is special:• Re-usable data is itself a research method with itsown special requirements.• See: Data Papers.
  28. 28. Data papers• Data should be surfaced in a re-usable way.• Incentivize the extra effort required.• Concept being developed by a few publisherswith differing implementation ideas.• Questions: what is reusability? at what level?
  29. 29. Our Data Papers requirements• Only inherently reusable data is publishedas a Data Paper• Normalize identifiers• Reverse normal “ratio” of text:data• Amsterdam data citation principles• All data is searchable w/ or w/o the paper• Global metadata catalog in stable archive
  30. 30. Methods re-usability• Open methods are the basis of science.• “Standing on the shoulders of giants” =• reusing maths, software, instruments,reagents, models, protocols, etc.• But method citations can be very obscure;• you cannot reuse a secret.• See: alchemy, necromancy, divination.
  31. 31. Computational semantics• Entity-extraction: NIF, Utopia, etc.• Topic-based:Threads• Statement-based: SWAN, nanopublications• Argument-based: micropublications
  32. 32. Open annotation• Open model• Annotate any web document• Transferable, selectively sharable• Highlights, comments, semantics, video• Entities, topics, statements, arguments• W3C Open Annotation Community•
  33. 33. Open annotation model
  34. 34. Complex annotation
  35. 35. Discussion as annotation
  36. 36. Annotation tools
  37. 37. Creating digital abstractsin Domeo
  38. 38. Digital article summary
  39. 39. Digital article summary{:MP3 rdf:type mp:Micropublication;mp:name "MP(a3)";mp:description "Digital summary of Spillman et al. 2010";pav:authoredBy [ a foaf:Person ; foaf:name "Tim Clark" ];pav:createdBy [ a foaf:Person ; foaf:name "Tim Clark" ];pav:createdOn "2013-03-06T09:49:12-05:00"^^xsd:dateTime ;mp:argues :C3;mp:supportedBy <info:doi:10.1371/journal.pone.0009979> .} .:MP3 = {:S1 rdf:type mp:Statement;mp:hasContent "Rapamycin [is] an inhibitor of the mTOR pathway." ;mp:supportedBy <info:doi/10.1038/nature08221> .:S2 rdf:type mp:Statement;mp:hasContent "PDAPP mice accumulate soluble and deposited Aβ and develop AD-like synaptic deficits as well as cognitiveimpairment and hippocampal atrophy." ;mp:supportedBy <info:doi/10.1073/pnas.96.6.3228> .:S3 rdf:type mp:Statement;mp:hasContent "Rapamycin-fed transgenic PDAPP mice showed improved learning (Figure 1a) and memory (Figure 1b). Weobserved significant deficits in learning and memory in control-fed transgenic PDAPP animals." ;mp:supportedBy <> .:M1 rdf:type mp:Procedure;mp:hasName "Rapamycin-supplemented mouse diet protocol" ;mp:hasContent "We fed a rapamycin-supplemented diet... or control chow to groups of PDAPP mice and littermate non-transgenic controls for 13 weeks. At the end of treatment (7 mo), learning and memory were tested using the Morris water maze." .:M2 rdf:type mp:Material;mp:hasName "PDAPP J20";mp:hasDescription "Lennart Muckes PDAPP J20 transgenic mice, as obtained from JAX, stock#006293" ;mp:describedBy: <> .:D1 rdf:type mp:Data;pav:retrievedFrom <>;mp:supportedBy :M1, :M2 .:C3 rdf:type mp:Claim;mp:hasContent "Inhibition of mTOR by rapamycin can slow or block AD progression in a transgenic mouse model of thedisease." ;mp:supportedBy :S1, :S2, :S3, :D1.} .
  40. 40. Mixing nano, micro, entities, topics
  41. 41. Navigable citation networksFigure from Greenberg SA, British Medical Journal 2009, 339:b2680
  42. 42. Taking NGSP forward
  43. 43. The Future of ResearchCommunications andeScholarship• Open community of scholars, librarians, archivists,publishers and research funders.• Goal is to facilitate more rapid change &improvement in scholarly communications througheffective use of information technologies.• Founded 2011 at a workshop held at LeibnizZentrum für Informatik, Schloss Dagstuhl, DE.• Check it out & join online at
  44. 44. Summary• Incomplete transition of scientificpublishing to the Web• Big problems with the current system• NextGen Scientific Publishing will be:• open, transparent, remixable, fast• and we will annotate it on the Web.
  45. 45. Acknowledgements• Lab: Paolo Ciccarese, Stephane Corlosquet, Sudeshna Das, PattiDavis, Emily Merrill, Marco Ocana• Collaborators: Brad Allen, Neil Andrews,Anita Bandrowski, PhilBourne, Suzanne Brewerton, Monika Byrne, Merce Crosas,AnitaDe Waard, Lisa Girard, Carole Goble,Tudor Grosza, Paul Groth,Keith Gutfreund, Hamed Hassanzadeh, Ivan Herman, BradHyman,Adrian Ivinson, Derek Marren, Maryann Martone, PatMcCaffery, Steve Pettifer, Brock Reeve, Rob Sanderson, HollySchmidt, HerbertVan de Sompel and Thomas Wilkin; and ourcolleagues at the Mass.Alzheimer Disease Research Center• Funding: Eli Lilly, Elsevier, Harvard Neuro Discovery Center,Harvard Stem Cell Institute, EMD Serono, NIH (NIA, NIDA), andtwo anonymous foundations.• Very special thanks to: Carole Goble & Brad Hyman
  46. 46. References1. Hunter L, Cohen KB: Biomedical language processing: whats beyond PubMed? Molecular cell2006, 21(5):589-594.2. Greenberg SA: How citation distortions create unfounded authority: analysis of a citationnetwork. British Medical Journal 2009, 339:b2680.3. Greenberg SA: Understanding belief using citation networks. Journal of Evaluation in Clinical Practice2011, 17(2):389-393.4. Ramos, M., J. Melo, and U. Albuquerque, Citation behavior in popular scientific papers: what isbehind obscure citations? The case of ethnobotany. Scientometrics, 2012. 92(3): p. 711-719.5. Lawless J: The bad science scandal: how fact-fabrication is damaging UKs global name forresearch. In: The Independent. 2013.6. Noorden RV: Science publishing: The trouble with retractions. Nature 2011, 478:26-28.7. Fang FC, et al: Misconduct accounts for the majority of retracted scientific publications.Proceedings of the National Academy of Sciences 2012, 109(42):17028-17033.8. Begley CG, Ellis LM: Drug development: Raise standards for preclinical cancer research. Nature2012, 483(7391):531-533.9. Marcus A, Oransky I: Bring On the Transparency Index. In: The Scientist. Midland, Ontario, CA: LabXMedia Group; 2012.10. Bandrowski AE, et al: A hybrid human and machine resource curation pipeline for theNeuroscience Information Framework. Database 2012: bas005.11. Randy S, Mark P: Reforming research assessment. eLife 2013, 2.12. Alberts B: Impact Factor Distortions. Science 2013, 340(6134):787.