In the Future Will a Biological
Database Really be Different than a
Biological Journal?
Philip E. Bourne PhD
pbourne@ucsd....
I am speaking to you today as
someone who..
• Maintains a major biological database – the
PDB – used by over 300,000 scien...
A Question First Posed in August 2005

PLOS Comp Biol 2005 1(3): e34

iDASH October 18, 2013

3
Here is one reason why the question is
important….

iDASH October 18, 2013

4
The Paper As Experiment
0. Full text of PLoS papers stored
in a database

4. The composite view has
links to pertinent blo...
The answer 8 years ago, as is now is…

In principle there is no difference, but
the way in which each is perceived is
stil...
Why Bother?
Better integration of data and the
knowledge derived from it can
accelerate discovery and improve the
comprehe...
Lets take a step back ...
What got me thinking this way?

iDASH October 18, 2013

8
Data Are Becoming More Complex:
Witness The World Wide Protein Data Bank

http://www.wwpdb.org

• The single worldwide
rep...
The World Wide Protein Data Bank
Places High Value on Data
• Paper not published
unless data are
deposited – strong
data t...
The PLoS Corpus
• Established in 2000
• Identified as a high
quality publications
• Currently 8 journals
with healthy grow...
Similar Processes Lead to Similar Resources
Author Submission via the Web

Depositor Submission via the Web

Syntax Checki...
The scientific process for handling data
and publications are not that
different, but the end product is
perceived very di...
Unfortunately the Metrics of
Success Remain…

[Carole Goble]

iDASH October 18, 2013

14
This makes no sense when you ask
yourself the question:
What is more valuable a dataset used
and cited by 100 scientists o...
What can you do today to change the
situation?

iDASH October 18, 2013

16
Think Globally Act Locally
• Support emergent community commons/portals
• Be involved in the support and development of
me...
Pressure Your Institutions to Play a
Greater Role
• We need institutional data/knowledge sharing
plans
• We need digital u...
Committee on Academic
Promotions
• What Counts
–
–
–
–
–

Money
Grants
Papers
Teaching
Service

• What Does Not
–
–
–
–
–
...
We Need to Bend the Traditional System
The Wikipedia Experiment – Topic Pages

 Identify areas of Wikipedia that
relate t...
We Need Innovative Contributions
to the Research Lifecycle
Authoring
Tools

Data
Capture

Lab
Notebooks

Software
Reposito...
We Need Innovative Contributions
to the Research Lifecycle
Authoring
Tools

Data
Capture

Lab
Notebooks

Software
Reposito...
Example Interoperability: The Database View
www.rcsb.org/pdb/explore/literature.do?structureId=1TIM

BMC Bioinformatics 20...
This is asking a lot of us, but our job is
being made easier by what is going on
around us

iDASH October 18, 2013

24
Open Access to Data and the
Literature is no Longer a Curiosity, but
Mainstream

iDASH October 18, 2013

25
Conservative Bodies Are Recognizing
Change
• Anyone, anything, anyt
ime
• publication
access, data, models, sour
ce
codes,...
Governments Are Recognizing Change
G8 Open Data Charter

http://opensource.com/government/13/7/open-data-charter-g8
iDASH ...
Funding Agencies are Changing

iDASH October 18, 2013

28
Publishing is Changing
• Today:
• Approx 10,000 publishers
• Publishing approx 25,000 journals
• Which publish approx 1.5 ...
Witness the ‘Open Access Mega
Journal'
1. Very very large
– Publishing thousands of articles per year
– and benefiting fro...
3500

3000

Publications by PLOSONE per quarter since launch
Publications by PLoS ONE per quarter
since launch

2500

2000...
“Open Access Mega Journals”
– One Name, Two Flavours
• ‘Clones’ of PLoS ONE (not selective)
–
–
–
–
–
–

SAGE Open
BMJ Ope...
Attitudes are Changing

datasets
data collections
algorithms
configurations
tools and apps
codes
workflows
scripts
code li...
Flaws Are Becoming More
Obvious

Out of 18 microarray papers, results
from 10 could not be reproduced

More retractions:
>...
Science is Being Deinstitutionalized

Daniel Hulshizer/Associated Press

iDASH October 18, 2013

35
Science is Being Deinstitutionalized

Daniel Hulshizer/Associated Press

iDASH October 18, 2013

36
In Summary
• Question (2005): In the Future Will a Biological
Database Really be Different than a Biological
Journal?
• An...
pbourne@ucsd.edu

Questions?
Upcoming SlideShare
Loading in …5
×

Is a Biological Database Really Different than a Biological Journal?

1,031 views

Published on

Presentation on the changing face of scholarly communication and the interplay between data and the knowledge derived from that data.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,031
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
12
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Is a Biological Database Really Different than a Biological Journal?

  1. 1. In the Future Will a Biological Database Really be Different than a Biological Journal? Philip E. Bourne PhD pbourne@ucsd.edu iDASH October 18, 2013 1
  2. 2. I am speaking to you today as someone who.. • Maintains a major biological database – the PDB – used by over 300,000 scientists per month • Is the Founding Editor in Chief of PLOS Computational Biology iDASH October 18, 2013 2
  3. 3. A Question First Posed in August 2005 PLOS Comp Biol 2005 1(3): e34 iDASH October 18, 2013 3
  4. 4. Here is one reason why the question is important…. iDASH October 18, 2013 4
  5. 5. The Paper As Experiment 0. Full text of PLoS papers stored in a database 4. The composite view has links to pertinent blocks of literature text and back to the PDB 4. 1. 1. A link brings up figures from the paper 2. 3. A composite view of journal and database content results 3. 1. User clicks on thumbnail 2. Metadata and a webservices call provide a renderable image that can be annotated 3. Selecting a features provides a database/literature mashup 4. That leads to new papers PLoS Comp. Biol. 2005 1(3) e34 2. Clicking the paper figure retrieves data from the PDB which is analyzed 5
  6. 6. The answer 8 years ago, as is now is… In principle there is no difference, but the way in which each is perceived is still very different… Yet progress has been made and we will focus on what we can do to further accelerate change iDASH October 18, 2013 6
  7. 7. Why Bother? Better integration of data and the knowledge derived from it can accelerate discovery and improve the comprehension and dissemination of science iDASH October 18, 2013 7
  8. 8. Lets take a step back ... What got me thinking this way? iDASH October 18, 2013 8
  9. 9. Data Are Becoming More Complex: Witness The World Wide Protein Data Bank http://www.wwpdb.org • The single worldwide repository for data on the structure of biological macromolecules • Vital for drug discovery and the life sciences • 43 years old • Free to all iDASH October 18, 2013 9
  10. 10. The World Wide Protein Data Bank Places High Value on Data • Paper not published unless data are deposited – strong data to literature correspondence • Highly structured data conforming to an extensive ontology • DOI’s assigned to every structure http://www.wwpdb.org iDASH October 18, 2013 10
  11. 11. The PLoS Corpus • Established in 2000 • Identified as a high quality publications • Currently 8 journals with healthy growth • Open Access – free to all • PLOS ONE a huge success iDASH October 18, 2013 11
  12. 12. Similar Processes Lead to Similar Resources Author Submission via the Web Depositor Submission via the Web Syntax Checking Syntax Checking Review by Scientists & Editors Review by Annotators Corrections by Depositor Corrections by Author Release – Web Accessible Publish – Web Accessible iDASH October 18, 2013 12
  13. 13. The scientific process for handling data and publications are not that different, but the end product is perceived very differently iDASH October 18, 2013 13
  14. 14. Unfortunately the Metrics of Success Remain… [Carole Goble] iDASH October 18, 2013 14
  15. 15. This makes no sense when you ask yourself the question: What is more valuable a dataset used and cited by 100 scientists or a paper you wrote that only you cite? Case in point… iDASH October 18, 2013 15
  16. 16. What can you do today to change the situation? iDASH October 18, 2013 16
  17. 17. Think Globally Act Locally • Support emergent community commons/portals • Be involved in the support and development of metadata standards • Contribute to workflow development etc. to drive an open research lifecycle • Educate your mentors on the importance of open science and scholarly communication • Write software thinking of an App model iDASH October 18, 2013 17
  18. 18. Pressure Your Institutions to Play a Greater Role • We need institutional data/knowledge sharing plans • We need digital universities • We need data/information scientists to be better recognized by institutions – its not all about papers – this implies new metrics iDASH October 18, 2013 18
  19. 19. Committee on Academic Promotions • What Counts – – – – – Money Grants Papers Teaching Service • What Does Not – – – – – – Sharing data Sharing software Open access Collaboration Patents Startups Ten Simple Rules for Getting Ahead as a Computational Biologist in Academia 2011 PLOS Comp Biol 7(1) e1002001 iDASH October 18, 2013 19
  20. 20. We Need to Bend the Traditional System The Wikipedia Experiment – Topic Pages  Identify areas of Wikipedia that relate to the journal that are missing of stubs  Develop a Wikipedia page in the sandbox  Have a Topic Page Editor Review the page  Publish the copy of record with associated rewards  Release the living version into Wikipedia iDASH October 18, 2013 20
  21. 21. We Need Innovative Contributions to the Research Lifecycle Authoring Tools Data Capture Lab Notebooks Software Repositories Analysis Tools Scholarly Communication Visualization IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION Commercial & Public Tools DisciplineBased Metadata Standards Community Portals Git-like Resources By Discipline Data Journals New Reward Systems Training Institutional Repositories iDASH October 18, 2013 Commercial Repositories 21
  22. 22. We Need Innovative Contributions to the Research Lifecycle Authoring Tools Data Capture Lab Notebooks Software Repositories Analysis Tools Scholarly Communication Visualization IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION Commercial & Public Tools DisciplineBased Metadata Standards Community Portals Git-like Resources By Discipline Data Journals New Reward Systems Training Institutional Repositories iDASH October 18, 2013 Commercial Repositories 22
  23. 23. Example Interoperability: The Database View www.rcsb.org/pdb/explore/literature.do?structureId=1TIM BMC Bioinformatics 2010 11:220 iDASH October 18, 2013 23
  24. 24. This is asking a lot of us, but our job is being made easier by what is going on around us iDASH October 18, 2013 24
  25. 25. Open Access to Data and the Literature is no Longer a Curiosity, but Mainstream iDASH October 18, 2013 25
  26. 26. Conservative Bodies Are Recognizing Change • Anyone, anything, anyt ime • publication access, data, models, sour ce codes, resources, transpar ent methods, standards, forma ts, identifiers, apis, license s, education, policies • “accessible, intelligible, assessable, reusable” [Carole Goble] http://royalsociety.org/policy/projects/science-public-enterprise/report/
  27. 27. Governments Are Recognizing Change G8 Open Data Charter http://opensource.com/government/13/7/open-data-charter-g8 iDASH October 18, 2013 27
  28. 28. Funding Agencies are Changing iDASH October 18, 2013 28
  29. 29. Publishing is Changing • Today: • Approx 10,000 publishers • Publishing approx 25,000 journals • Which publish approx 1.5 million articles per year (almost 1 million of which appear in PubMed) iDASH October 18, 2013 29
  30. 30. Witness the ‘Open Access Mega Journal' 1. Very very large – Publishing thousands of articles per year – and benefiting from economies of scale 2. Open Access – Because no one will pay a subscription fee for a journal that large (and growing that fast) – and using an OA Business Model where each article pays for its own costs 3. (Preferably) without any ‘artificial’ constraints on its ability to grow – For example, a desire to only publish ‘high impact; papers [Pete Binfield] iDASH October 18, 2013 30
  31. 31. 3500 3000 Publications by PLOSONE per quarter since launch Publications by PLoS ONE per quarter since launch 2500 2000 1500 1000 500 [Pete Binfield] Q 1 20 Q 07 2 20 Q 07 3 20 Q 07 4 20 Q 07 1 20 Q 08 2 20 Q 08 3 20 Q 08 4 20 Q 08 1 20 Q 09 2 20 Q 09 3 20 Q 09 4 20 Q 09 1 20 Q 10 2 20 Q 10 3 20 Q 10 4 20 Q 10 1 20 Q 11 2 20 11 0
  32. 32. “Open Access Mega Journals” – One Name, Two Flavours • ‘Clones’ of PLoS ONE (not selective) – – – – – – SAGE Open BMJ Open Scientific Reports (Nature) AIP Advances (Am Inst Physics) G3 (Genetics Soc of America) Biology Open (Company of Biologists) • ‘Pseudo-Clones’ of PLoS ONE (probably selective) – Physical Review X (Am Physical Society) – Open Biology (Royal Society) – Cell Reports (Elsevier, Cell Press) [Pete Binfield] iDASH October 18, 2013 32
  33. 33. Attitudes are Changing datasets data collections algorithms configurations tools and apps codes workflows scripts code libraries services, system software infrastructure, compilers hardware [Carole Goble] “An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment, [the complete data] and the complete set of instructions which generated the figures.” David Donoho, “Wavelab and Reproducible Research,” 1995 Morin et al Shining Light into Black Boxes Science 13 April 2012: 336(6078) 159-160 Ince et al The case for open computer 33 programs, Nature 482, 2012
  34. 34. Flaws Are Becoming More Obvious Out of 18 microarray papers, results from 10 could not be reproduced More retractions: >15X increase in last decade At current % > by 2045 as many papers published as retracted [Carole Goble] 1. Ioannidis et al., 2009. Repeatability of published microarray gene expression analyses. Nature Genetics 41: 14 2. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html 34 3. Bjorn Brembs: Open Access and the looming crisis in science https://theconversation.com/open-access-and-the-looming-crisis-in-science-14950
  35. 35. Science is Being Deinstitutionalized Daniel Hulshizer/Associated Press iDASH October 18, 2013 35
  36. 36. Science is Being Deinstitutionalized Daniel Hulshizer/Associated Press iDASH October 18, 2013 36
  37. 37. In Summary • Question (2005): In the Future Will a Biological Database Really be Different than a Biological Journal? • Answer: – – – – Less different that they were in 2005 We still have a long way to go improve science Change is accelerating What one does on a daily basis as a scholar is very different from when I was in graduate school and it will be very different again iDASH October 18, 2013 37
  38. 38. pbourne@ucsd.edu Questions?

×