Force11: Enabling transparency and efficiency in the research landscape

Melissa Haendel, PhD
Oregon Health & Science University
Future of Research Communications and E-Scholarship
Enabling transparency and efficiency
in the research landscape
@force11rescomm@ontowonka

Do an
experiment
Publish your
results
Research pre-Web:
Document in a
lab notebook

The Research Life Cycle
TECHNIQUE
COLLABORATION
PUBLICATIONDATASET
GRANT

Impetus for change: Is our
current method serving science?
47/50 major preclinical
published cancer studies
could not be replicated
“The scientific community assumes
that the claims in a preclinical
study can be taken at face value-
that although there might be some
errors in detail, the main message
of the paper can be relied on and
the data will, for the most part,
stand the test of time.
Unfortunately, this is not always
the case.”
Begley and Ellis, 29 MARCH 2012 | VOL 483 |
NATURE | 531

Not all content is available for
synthesis and discovery
Search PubMed: Spinal
Muscular Atrophy

The scientific corpus is
fragmented
 ~25 million articles
total, each covering
a fragment of the
biomedical space
 Each publisher owns
a fragment of a
particular field
 The current process
is inefficient and
slow
Wiley
Elsevier
MacMillian
Oxford
Spinal Muscular Atrophy

Committee on Academic
Promotions
What Counts
 Money
 Grants
 Papers
 Teaching
 Service
What Does Not
 Sharing data
 Sharing software
 Open access
 Collaboration
 Patents
 Startups
Getting Ahead as a Computational Biologist in Academia PLOS Comp Biol
doi:10.1371/journal.pcbi.1002001

Beyond the PDF
 Conference/unconference
where all stakeholders come
together as equals to
discuss issues
– Publishers
– Technologists
– Scholars
– Library scientists
– Humanists
– Policy makers
– Funders
 Incubator for change
 What would you do to
change scholarly
communication?
San Diego, Jan 2011 ...... Amsterdam, March 2013........Oxford, 2015
http://www.force11.org/beyondthepdf2

FORCE11
Future of Research Communications and E-
Scholarship:
A grass roots effort to accelerate the pace and nature
of scholarly communications and e-scholarship through
technology, education and community
Why 11? We were born in 2011 in Dagstuhl,
Germany
Principles laid out in the FORCE11 Manifesto
FORCE11 launched in July 2012
www.force11.org @

Promote community, cross-
fertilization and interoperability
 FORCE11 helps facilitate
communications across
disciplines and communities
 Issues are not identical but we
can learn from each other
 Community platform
– Meetings
– Discussions
– Tools and resources
– Blogs
– Event calendar
– Community projects
 Working groups
– Data Citation
– Resource identification
initiative
– Attribution
– Data
standards/Biosharing

Data Citation Working Group
 FORCE11 provides a neutral
space for bringing groups
together
 35 individuals
representing > 20
organizations concerned
with data citation
 Conducted a review of
current data citation
recommendations from
4 different organizations
 Arrived at consensus
principles
http://www.force11.org/datacitation

Data Citation Principles
 Consensus Data
Citation
principles ready
for comment
 Designed to be
high level and
easy to
understand
1. Importance
2. Credit and
Attribution
3. Evidence
4. Unique
identifiers
5. Access
6. Persistence
7. Versioning
8. Interoperability
and flexibility

Data Citation Implementation
https://www.force11.org/datacitationimplementation
https://peerj.com/preprints/697/

BioCADDIE Data Discovery Index
https://www.force11.org/group/biocaddie/cewg

Challenge: Working with Web Data
 Often have inadequate descriptions so we don’t know what they
are about or how they were constructed
 Datasets change over time, but often don’t come with versioning
information
 May have been constructed using other data, but it’s not clear
which version of data was used or whether these were modified
 Data may be available in a variety of formats
 There may be multiple copies of data from different providers,
but it’s unclear if they are exact copies or derivatives
 Version of standard or vocabulary used not indicated
 Data registries are not synchronized and can contain conflicting
information

W3C HCLS Dataset Description
 Develop a guidance note for reusing existing
vocabularies to describe datasets with RDF
– Mandatory, recommended, optional descriptors
– Identifiers
– Versioning
– Attribution
– Provenance
– Content summarization
 Recommend vocabulary-linked attributes and
value sets
 Provide reference editor and validation

Metadata Model:
description – version – distribution
http://tiny.cc/hcls-datadesc

On another planet the FORCE was
strong…..

Journal guidelines for methods are often poor and
space is limited
“All companies from which materials were obtained should
be listed.” - A well-known journal
Reproducibility is dependent at a minimum, on
using the same resources. But…

How identifiable are resources in the
published literature?

Only ~50% of resources were identifiable
Vasilevsky et al, 2013, PeerJ

There is no correlation between impact factor and
resource identification
Journal Impact Factor
0 10 20 30 40
Fractionofresourcesidentified
0.0
0.2
0.4
0.6
0.8
1.0 Antibodies
Cell Lines
Constructs
Knockdown reagents
Organisms

http://www.force11.org/Resource_Identification_Initiative
Numerous endorsers https://www.force11.org/RII/SignUp
Implementation of the new standard http://biosharing.org/bsg-000532
RRIDs should be:
Machine Readable
Consistent across publishers and
journals
Free to generate and access

Sample citation:
Polyclonal rabbit anti-
MAPK3
antibody, Abgent, Cat#
AP7251E,
RRID:AB_2140114
1.
Research
er
submits a
manuscri
pt for
publicatio
n
2. Editor or
Publisher
asks for
inclusion of
RRID
3. Author goes to
Research
Identification
Portal to locate
RRID
4. RRID is
included
in
Methods
section
and
as
Keyword
Publishing Workflow

What is the relationship of a
person to a publication?

Example Scenario
 Melissa creates mouse1
 David creates mouse2
 Layne uses performs RNAseq analysis on
mouse1 and mouse2 to generate
dataset3, which he subsequently
curates and analyzes
 Layne writes publication pmid:12345
about the results of his analysis
 Layne explicitly credits Melissa as an
author but not David.

Credit is connected
=> Credit to Melissa is asserted, but credit to David can be inferred

Attribution Working Group
https://www.force11.org/group/attributionwg
Project CredIT
VIVO-ISF ontology
PROV
the Becker model
Transitive credit
The Scholarly Contributions and Roles ontology
Goal is catalyze rapid convergence on requirements, approaches, and
practical implementation of a system for tracking contributions to any
scholarly product.

The 1K Challenge
What would you do with £1k today to make
research communication better, anticipating
the increasing scale of people and
machines?

Starting at Ground Zero
CONSULTATIONS
Researcher + 2-3 from
Data Stewardship Team

 Researchers DO need
assistance:
 Finding and choosing data
standards
 File versioning
 Applying metadata to
facilitate data sharing
 “Gummi Bear” themed
data management
exercise resonated well
with students
 Lack of awareness of
services and expertise
offered by the Library
 OHSU Library is
developing data
services for researchers
http://laughingsquid.com/the-anatomy-of-a-
gummy-bear-by-jason-freeny/
Conclusions and new
directions
DOI:10.6083/M4QC0273

https://www.force11.org/force2015/1k-challenge-vote
Join the Force11: https://www.force11.org/
“Meta Makes My Machine Marvellous (5M)”
“Crowdreviewing: the sharing economy at its finest”
“Science bots”
“scientific articles are too expensive to publish and to read”

FORCE11 Vision
• Modern technologies enable vastly improve knowledge transfer and far wider
impact; freed from the restrictions of paper, numerous advantages appear
• We see a future in which scientific information and scholarly communication more
generally become part of a global, universal and explicit network of knowledge
• To enable this vision, we need to create and use new forms of scholarly
publication that work with reusable scholarly artifacts
• To obtain the benefits that networked knowledge promises, we have to put in
place reward systems that encourage scholars and researchers to participate and
contribute
• To ensure that this exciting future can develop and be sustained, we have to
support the rich, variegated, integrated and disparate knowledge offerings
that new technologies enable
What is the 21st century equivalent of the library?

Acknowledgements
Maryann Martone
Phil Bourne
Michel Dumontier
Nicole Vasilevsky
Stephanie Hagstrom
And all 1000+ members of

Force11: Enabling transparency and efficiency in the research landscape

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Force11: Enabling transparency and efficiency in the research landscape

Similar to Force11: Enabling transparency and efficiency in the research landscape (20)

More from mhaendel

More from mhaendel (20)

Recently uploaded

Recently uploaded (20)

Force11: Enabling transparency and efficiency in the research landscape

Editor's Notes