Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

0000-0001-6444-1436
@SCEdmunds
scott@gigasciencejournal.com
Experiences from the front-line
of Open Access & Open Data
publishing.

www.gigasciencejournal.com
Journal, data-platform and
database for large-scale data
Editor-in-Chief: Laurie Goodman
Executive Editor: Scott Edmunds
Commissioning Editor: Nicole Nogoy
Lead Curator: Chris Hunter
Data Platform: Peter Li
in conjunction with

What do publishers do?
Apologies: http://scholarlykitchen.sspnet.org/2014/10/21/updated-80-things-publishers-do-2014-edition/
the scholarly chicken
(tl;dr version)

1. http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.1001747
Are publishers really adding value?

Need to move beyond 350 year old incentive systems
Buckheit & Donoho: Scholarly articles are
merely advertisement of scholarship. The
actual scholarly artifacts, i.e. the data and
computational methods, which support
the scholarship, remain largely
inaccessible.

Consequences: increasing number of retractions
>15X increase in last decade
1. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html
2. Retracted Science and the Retraction Index ▿
http://iai.asm.org/content/79/10/3855.abstract?

Consequences: increasing number of retractions
>15X increase in last decade
At current % > by 2045 as many
papers published as retracted
1. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html
2. Bjorn Brembs: Open Access and the looming crisis in science https://theconversation.com/open-access-and-the-looming-crisis-in-science-14950

STAP paper demonstrates problems:
Nature Editorial, 2nd
July 2014:
“We have concluded that we and the referees could
not have detected the problems that fatally
undermined the papers. The referees’ rigorous
reports quite rightly took on trust what was
presented in the papers.”
http://www.nature.com/news/stap-retracted-1.15488

STAP paper demonstrates problems:
…to publish protocols BEFORE analysis
…better access to supporting data
…more transparent & accountable review
…to publish replication studies
Need:

JIFBAIT Network
more
GWAS
GWAS
JIFBAIT NEWS
Arsenic Life forms, will
they take over the
planet?
Which Overhyped, Unreproducible
Experiment Are You?
Want rapid citations for 2 years only? Carry out this quiz.
You got: STAP Cells
Of course dipping cells in
coffee will make them
pluripotent. Even if the
research gets discredited,
it’ll still get 100’s of
citations in two years.

Reward the commons instead?
Open-DataOpen-Source
Open-Review Open-Access

HK: good with some parts of open…
http://hub.hku.hk/

Closed v Open Access [the HKU edition]
Ye Old
Journal
Closed Access, Subject Specific Open Access, public engaging

Closed v Open Access [the HKU edition]
Closed Access, Subject Specific Open Access, public engaging

What is impact?
• Accessed (some >84,000)
• Cited (some >500)
• Altmetric scored (some >100)
• Influential, educational
reproducible & reused
• Covered in Int. media (Wired,
LA Times, NYT, NBC…)
But no impact factor
Papers very highly:

What is the cost of the Journal Impact
Factor?

1. http://dx.doi.org/10.1087/20110203
2. http://blog.thegrandlocus.com/2014/10/a-flurry-of-copycats-on-pubmed
3. http://www.scientificamerican.com/article/for-sale-your-name-here-in-a-prestigious-science-journal/
What is the cost of the Journal Impact
Factor?
JIF 2 = $10,000 USD
JIF 5 = $20,000 USD
Buy Sell
C/N/S = $30,000 USD
JIF 10 = $1,500 USD

1. http://www.scmp.com/comment/insight-opinion/article/1758662/china-must-restructure-its-academic-incentiv
This could never happen in Hong Kong, right?
“While we are rightly proud of Hong Kong’s highly regarded and
ranked universities system, we are not immune to the same
pressures. While funders in Europe have moved away from using
citation based metrics such as JIF in their research assessments, the
Hong Kong University Grants Committee states in their Research
Assessment Exercise guidelines that they may informally use it.”

1. http://www.scmp.com/comment/insight-opinion/article/1758662/china-must-restructure-its-academic-incentiv
This is happening in Hong Kong!
JIF 2 = $8,000 USD
JIF 5 = $15,000 USD
Buy

Specific things we should be rewarding:

• Review
• Data
• Software
• Models
• Pipelines
• Re-use…
= Credit
}
Credit where credit is overdue:
“One option would be to provide researchers who release data to public
repositories with a means of accreditation.”
“An ability to search the literature for all online papers that used a particular data
set would enable appropriate attribution for those who share. “
Nature Biotechnology 27, 579 (2009)
New incentives/credit

Not just carrots…
“The data discovery index (DDI) enabled through
bioCADDIE is to do for data what PubMed (and
PubMed Central) did for the literature.”

GigaSolution: deconstructing the paper
www.gigadb.org
Utilizes big-data infrastructure and expertise from:
Combines and integrates (with DOIs):
Open-access journal
Data Publishing Platform
Data Analysis Platform
Open Review Platform

Open peer review
1. Transparency

The only drawback?
End reviewer 3 Downfall parody videos, now!
1. Transparency
Open peer review

Reward open & transparent review
Data from similar scope open/closed review journals in BMC Series shows ~5-
10% harder to get referees for open review. (data from Tim Sands at BMC)
• Good data showing no difference in acceptance/rejection rates, but
better quality reviews.
• Does take marginally longer to find reviewers (and for them to return
reports).
BMC Series
Medical Journals

Publons + AcademicKarma
= credit for reviewers efforts
http://publons.com/
1. Transparency/open peer review
http://academickarma.org/
NOW WITH DOIs

arXiv + blogged reviews = real-time open-review
1. Transparency

1. Transparency
Reward pre-prints

http://tmblr.co/ZzXdssfOMJfy
arXiv + blogged reviews = real-time open-review
1. Transparency

Data Publishing: nothing new…
Data & Metadata Collection/Experiments
Analysis/Hypothesis/Analysis
Conclusions
+ Area of Interest/Question
1839
1859
20 Yrs.

Data Publishing: Can be Life or Death
Climate change, global hunger, pollution,
cancer, disease outbreaks…
http://www.nature.com/news/data-sharing-make-outbreak-research-open-access-1.16966

To maximize its utility to the research community and aid those fighting
the current epidemic, genomic data is released here into the public
domain under a CC0 license. Until the publication of research papers on
the assembly and whole-genome analysis of this isolate we would ask you
to cite this dataset as:
Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang,
J; Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J;
Peng, Y; Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao,
X; Chen, F; Yin, X; Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and the
Escherichia coli O104:H4 TY-2482 isolate genome sequencing consortium
(2011)
Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI
Shenzhen. doi:10.5524/100001
http://dx.doi.org/10.5524/100001
Our first DOI:
To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to
Genomic Data from the 2011 E. coli outbreak. This work is published from: China.

Downstream consequences:
“Last summer, biologist Andrew Kasarskis was eager to help decipher the genetic origin of the
Escherichia coli strain that infected roughly 4,000 people in Germany between May and July. But he
knew it that might take days for the lawyers at his company — Pacific Biosciences — to parse the
agreements governing how his team could use data collected on the strain. Luckily, one team had
released its data under a Creative Commons licence that allowed free use of the data, allowing
Kasarskis and his colleagues to join the international research effort and publish their work without
wasting time on legal wrangling.”
1. Citations (~300) 2. Therapeutics (primers, antimicrobials) 3. Platform Comparisons
4. Example for faster & more open science

1.3 The power of intelligently open data
The benefits of intelligently open data were powerfully
illustrated by events following an outbreak of a severe gastro-
intestinal infection in Hamburg in Germany in May 2011. This
spread through several European countries and the US,
affecting about 4000 people and resulting in over 50 deaths.
All tested positive for an unusual and little-known Shiga-toxin–
producing E. coli bacterium. The strain was initially analysed
by scientists at BGI-Shenzhen in China, working together with
those in Hamburg, and three days later a draft genome was
released under an open data licence. This generated interest
from bioinformaticians on four continents. 24 hours after the
release of the genome it had been assembled. Within a week
two dozen reports had been filed on an open-source site
dedicated to the analysis of the strain. These analyses
provided crucial information about the strain’s virulence and
resistance genes – how it spreads and which antibiotics are
effective against it. They produced results in time to help
contain the outbreak. By July 2011, scientists published papers
based on this work. By opening up their early sequencing
results to international collaboration, researchers in Hamburg
produced results that were quickly tested by a wide range of
experts, used to produce new knowledge and ultimately to
control a public health emergency.

IRRI GALAXY
Beneficiaries/users of our work

IRRI GALAXY
Rice 3K project: 3,000 rice genomes, 13.4TB public data
Feed The World With (Big) Data

OMERO: providing access
to imaging data
Already used by JCB.
View, filter, measure raw
images with direct links
from journal article.
See all image data, not just
cherry picked examples.
Download and reprocess.
Need for better handling of imaging data

The alternative...
...look but don't touch
Need for better handling of imaging data

Methods
Answer
Metadata
softwareAnalysis
(Pipelines)
Workflows/
Environments
Idea
Study
Rewarding the
DOI, etc.
Publication
Publication
Publication
Data

Software
https://github.com/gigascience
Transparent
Open & able to build upon
Taking citeable snapshots
@jeejkang

gigagalaxy.net
Workflows
Reward Sharing of Workflows

Visualisations
& DOIs for workflows
http://www.gigasciencejournal.com/series/Galaxy 49

Facilitate reproducibility, reuse & sharing & publish outputs of:
Knitr, Sweave, Jupyter/iPython Notebook, etc.
Open Documents
Reward Open/Dynamic Workbooks

E.g.
http://www.gigasciencejournal.com/content/3/1/3

E.g.
Reviewer (Christophe Pouzat):
“It took me a couple of hours to get the data, the few
custom developed routines, the “vignette” and to
REPRODUCE EXACTLY the analysis presented in the
manuscript. With few more hours, I was able to modify
the authors’ code to change their Fig. 4. In addition to
making the presented research trustworthy, the
reproducible research paradigm definitely makes the
reviewer’s job much more fun!

Virtual Machines
• Downloadable as virtual harddisk/available as Amazon Machine Image
• Now publishing container (docker) submissions

Taking a microscope to the
publication process

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0127612

Lessons Learned
• Is possible to push button(s) & recreate a result from
a paper
• Most published research findings are false. Or at
least have errors
• Reproducibility is COSTLY. How much are you willing
to spend?
• Much easier to do this before rather than after
publication

The cost of staying with the status quo?
• Ioannidis estimate that 85% of research resources are wasted.
• ~US$28B year unnecessarily spent on preclinical research in US.
• Each retraction estimated to cost $400,000.
http://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1001747
http://elifesciences.org/content/3/e02956
http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002165

The cost to Hong Kong (and your career)
of staying with the status quo?
• Estimates lack of citation impact not being OA = 50% ($8.75B?)2
• Hong Kong ranked 54th
in Global Open Data Index
• How much are YOU losing through missing out on potential
collaborations, wider engagement & unrepeatable work?
HK UCG grant budget = $17.5 Billion HKD/yr (4% of Gov spending)
Taking lowest reported reproducibility rates (11%) = >$15 billion wasted1
$$
$
1. http://www.nature.com/nature/journal/v483/n7391/full/483531a.html
2. http://www.ecs.soton.ac.uk/~harnad/Temp/research-australia.doc

Death to the Publication. Long live the Research Object!
Manifesto for a reproducible publisher:
The era of the 1665-style publication is over
Open is the new black
Credit FAIR data, not JIF-bait narrative
Reward replication not advertising
We need a recognizable mark/badge/scores for replication
?

Ruibang Luo (BGI/HKU)
Shaoguang Liang (BGI-SZ)
Tin-Lap Lee (CUHK)
Qiong Luo (HKUST)
Senghong Wang (HKUST)
Yan Zhou (HKUST)
Thanks to:
@gigascience
facebook.com/GigaScience
blogs.biomedcentral.com/gigablog/
Peter Li
Chris Hunter
Jesse Si Zhe
Rob Davidson
Nicole Nogoy
Laurie Goodman
Amye Kenall (BMC)
Marco Roos (LUMC)
Mark Thompson (LUMC)
Jun Zhao (Lancaster)
Susanna Sansone (Oxford)
Philippe Rocca-Serra (Oxford)
Alejandra Gonzalez-Beltran (Oxford)
www.gigadb.org
gigagalaxy.net
CBIIT
Funding from:
Our collaborators:team: (Case study)
61

Where: MakerBay, Yau Tong, Kowloon
When: Monday, October 26th, 7:30pm
Come to our next Open Science meetup:
https://opendatahk.com/

Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

Similar to Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing. (20)

More from GigaScience, BGI Hong Kong

More from GigaScience, BGI Hong Kong (20)

Recently uploaded

Recently uploaded (20)

Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

Editor's Notes