A Few Overarching Thoughts on
Digital Publishing and How You
Can Participate
Philip E. Bourne
University of California San Diego
pbourne@ucsd.edu
Opinionated Moi?
• The Internet demanded new business
models to support scholarly
communication
• Open access was one such sustainable
model:
– Began with the community
– Was driven by new organizations (PLOS,
BMC, F1000, eLife, Dryad, Mendeley etc.)
– Was NOT driven by academic institutions
– Was driven by policies and funders
Got Us Thinking About…
• A paper as only one form of knowledge
discovery
• The use of interaction and rich media from
which to learn and actually do science
• Reproducibility
• Reward structures
• Better management of the research lifecycle
P.E. Bourne 2005 In the Future will a Biological Database Really be Different
from a Biological Journal? PLOS Comp. Biol. 1(3) e34
The Research Lifecycle
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
Authoring
Tools
Lab
Notebooks
Data
Capture
Software
Repositories
Analysis
Tools
Visualization
Scholarly
Communication
Commercial &
Public Tools
Git-like
Resources
By Discipline
Data Journals
Discipline-
Based Metadata
Standards
Community Portals
Institutional Repositories
New Reward
Systems
Commercial Repositories
Training
The Research Lifecycle
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
Authoring
Tools
Lab
Notebooks
Data
Capture
Software
Repositories
Analysis
Tools
Visualization
Scholarly
Communication
Commercial &
Public Tools
Git-like
Resources
By Discipline
Data Journals
Discipline-
Based Metadata
Standards
Community Portals
Institutional Repositories
New Reward
Systems
Commercial Repositories
Training
Most Laboratories
• We are the long tail
• Goodbye to the student is
goodbye to the data
• Very few of us have
complied (or will comply
with the data
management plans we
write into grants)
• Too much software is
unusable
S.Veretnik, J.L.Fink, and P.E. Bourne 2008 Computational Biology Resources Lack
Persistence and Usability. PLoS Comp. Biol. . 4(7): e1000136
Today’s Research Lifecycle is
Digitally Fragmented at Best
• Proof:
– I cant immediately reproduce the research in
my own laboratory
• It took an estimated 280 hours for an average user
to approximately reproduce the paper
– Workflows are maturing and becoming helpful
– Data and software versions and accessibility
prevent exact reproducability
Daniel Garijo et al. 2013 Quantifying Reproducibility in Computational Biology:
The Case of the Tuberculosis Drugome PLOS ONE under review.
• In the US alone..
– March 2012 OSTP
commits $200M to Big
Data
– OSTP demands
sharing plans by
August 2013
– GBMF/Sloan provide
institutional awards for
data science
– NCBI considers data
catalog and
MyBibliography
And the Disruption Continues
Where Will It End?
First We Should Ask What It Is
We Wish to Accomplish
1. A link brings up figures
from the paper
0. Full text of PLoS papers stored
in a database
2. Clicking the paper figure retrieves
data from the PDB which is
analyzed
3. A composite view of
journal and database
content results
Here is What I Want – The Paper
As Experiment
1. User clicks on thumbnail
2. Metadata and a
webservices call provide
a renderable image that
can be annotated
3. Selecting a features
provides a
database/literature
mashup
4. That leads to new
papers
4. The composite view has
links to pertinent blocks
of literature text and back to the PDB
1.
2.
3.
4.
PLoS Comp. Biol. 2005 1(3) e34
Here is What I Want –
Knowledge Push
• Each evening the labs “Evernote”
notebooks are scanned for commonalities
from the days activities. These are seeds
in a deep search of the webs research
lifecycles that has become available since
last searched. Results are ranked and
presented for consideration over coffee
the next morning
http://www.discoveryinformaticsinitiative.org/diw2012
Will End With …
• Infrastructure:
– Science, Nature, Cell and megajournals all
“open access”
– An array of coupled institutional repositories
– A central repository – PubMed Central
– Open software in full support of the research
lifecycle
– The research lifecycle in the cloud
Will End With …
• Sociologically:
– An end to build it and they will come
– Alternative metrics accepted by the
community
– Alternative reward systems that recognize the
realities of today’s scholarship, namely:
• Open data availability
• Software availability
• Collaborative research
We Have a Way to Go
• Good News
– We have NCBI/EBI
– Publishers are starting
to embrace data
– Workflows in support
of the research
lifecycle are catching
on
• Bad News
– Data are organized by
type not by questions
asked (silos)
– Tenure committees
are still in the dark
ages
What Can You Do?
Think Globally Act Locally
• Support emergent community portals
• Be involved in the support and
development of metadata standards
• Contribute to workflow development etc. to
drive an open research lifecycle
• Educate your mentors on the importance
of open science and scholarly
communication
• Write software thinking of an App model
What Do We Need to Do to
Get There? An App+ Store?
• The App model
– Think of it operating on a content base
rather than a mobile device
– Simple and consistent user interface
– Needs to pass some quality control
– Has a reward
• The App+ Model
– Apps interoperate through a generic
workflow interface
In Summary
• We have at hand the means to accelerate
the rate of discovery
• To do so we need to place more value on
the data, the individuals that produce it
and the institutions that maintain it
• We are all stakeholders in this endeavor
• Here is one way to get involved….
Get Involved: FORCE11
• Tools and Resource
catalog
• Article database in
Mendeley
• Discussion Forum via
Google
• Blogs courtesy of blog
sites and RSS feeds
• Web site via Drupal
• Announcements via
Twitter
http://force11.org
pbourne@ucsd.edu
• Force11 Manifesto
• Fourth Paradigm: Data Intensive Scientific
Discovery
http://research.microsoft.com/enus/collabora
tion/fourthparadigm/

Overview of Digital Publishing

  • 1.
    A Few OverarchingThoughts on Digital Publishing and How You Can Participate Philip E. Bourne University of California San Diego pbourne@ucsd.edu
  • 2.
    Opinionated Moi? • TheInternet demanded new business models to support scholarly communication • Open access was one such sustainable model: – Began with the community – Was driven by new organizations (PLOS, BMC, F1000, eLife, Dryad, Mendeley etc.) – Was NOT driven by academic institutions – Was driven by policies and funders
  • 3.
    Got Us ThinkingAbout… • A paper as only one form of knowledge discovery • The use of interaction and rich media from which to learn and actually do science • Reproducibility • Reward structures • Better management of the research lifecycle P.E. Bourne 2005 In the Future will a Biological Database Really be Different from a Biological Journal? PLOS Comp. Biol. 1(3) e34
  • 4.
    The Research Lifecycle IDEAS– HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION Authoring Tools Lab Notebooks Data Capture Software Repositories Analysis Tools Visualization Scholarly Communication Commercial & Public Tools Git-like Resources By Discipline Data Journals Discipline- Based Metadata Standards Community Portals Institutional Repositories New Reward Systems Commercial Repositories Training
  • 5.
    The Research Lifecycle IDEAS– HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION Authoring Tools Lab Notebooks Data Capture Software Repositories Analysis Tools Visualization Scholarly Communication Commercial & Public Tools Git-like Resources By Discipline Data Journals Discipline- Based Metadata Standards Community Portals Institutional Repositories New Reward Systems Commercial Repositories Training
  • 6.
    Most Laboratories • Weare the long tail • Goodbye to the student is goodbye to the data • Very few of us have complied (or will comply with the data management plans we write into grants) • Too much software is unusable S.Veretnik, J.L.Fink, and P.E. Bourne 2008 Computational Biology Resources Lack Persistence and Usability. PLoS Comp. Biol. . 4(7): e1000136
  • 7.
    Today’s Research Lifecycleis Digitally Fragmented at Best • Proof: – I cant immediately reproduce the research in my own laboratory • It took an estimated 280 hours for an average user to approximately reproduce the paper – Workflows are maturing and becoming helpful – Data and software versions and accessibility prevent exact reproducability Daniel Garijo et al. 2013 Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome PLOS ONE under review.
  • 8.
    • In theUS alone.. – March 2012 OSTP commits $200M to Big Data – OSTP demands sharing plans by August 2013 – GBMF/Sloan provide institutional awards for data science – NCBI considers data catalog and MyBibliography And the Disruption Continues
  • 9.
    Where Will ItEnd? First We Should Ask What It Is We Wish to Accomplish
  • 10.
    1. A linkbrings up figures from the paper 0. Full text of PLoS papers stored in a database 2. Clicking the paper figure retrieves data from the PDB which is analyzed 3. A composite view of journal and database content results Here is What I Want – The Paper As Experiment 1. User clicks on thumbnail 2. Metadata and a webservices call provide a renderable image that can be annotated 3. Selecting a features provides a database/literature mashup 4. That leads to new papers 4. The composite view has links to pertinent blocks of literature text and back to the PDB 1. 2. 3. 4. PLoS Comp. Biol. 2005 1(3) e34
  • 11.
    Here is WhatI Want – Knowledge Push • Each evening the labs “Evernote” notebooks are scanned for commonalities from the days activities. These are seeds in a deep search of the webs research lifecycles that has become available since last searched. Results are ranked and presented for consideration over coffee the next morning http://www.discoveryinformaticsinitiative.org/diw2012
  • 12.
    Will End With… • Infrastructure: – Science, Nature, Cell and megajournals all “open access” – An array of coupled institutional repositories – A central repository – PubMed Central – Open software in full support of the research lifecycle – The research lifecycle in the cloud
  • 13.
    Will End With… • Sociologically: – An end to build it and they will come – Alternative metrics accepted by the community – Alternative reward systems that recognize the realities of today’s scholarship, namely: • Open data availability • Software availability • Collaborative research
  • 14.
    We Have aWay to Go • Good News – We have NCBI/EBI – Publishers are starting to embrace data – Workflows in support of the research lifecycle are catching on • Bad News – Data are organized by type not by questions asked (silos) – Tenure committees are still in the dark ages
  • 15.
    What Can YouDo? Think Globally Act Locally • Support emergent community portals • Be involved in the support and development of metadata standards • Contribute to workflow development etc. to drive an open research lifecycle • Educate your mentors on the importance of open science and scholarly communication • Write software thinking of an App model
  • 16.
    What Do WeNeed to Do to Get There? An App+ Store? • The App model – Think of it operating on a content base rather than a mobile device – Simple and consistent user interface – Needs to pass some quality control – Has a reward • The App+ Model – Apps interoperate through a generic workflow interface
  • 17.
    In Summary • Wehave at hand the means to accelerate the rate of discovery • To do so we need to place more value on the data, the individuals that produce it and the institutions that maintain it • We are all stakeholders in this endeavor • Here is one way to get involved….
  • 18.
    Get Involved: FORCE11 •Tools and Resource catalog • Article database in Mendeley • Discussion Forum via Google • Blogs courtesy of blog sites and RSS feeds • Web site via Drupal • Announcements via Twitter http://force11.org
  • 19.
    pbourne@ucsd.edu • Force11 Manifesto •Fourth Paradigm: Data Intensive Scientific Discovery http://research.microsoft.com/enus/collabora tion/fourthparadigm/