Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing platform

Updates to the GigaDB open
access data publishing platform
Jesse Xiao
Jesse@gigasciencejournal.com
ORCID ID:0000-0003-3408-2852

About the Journal
GigaScience is an open access, open data,
open peer-review journal focusing on ‘big
data’ research from the life and biomedical
sciences

What is the point of publishing?
• To disseminate
information/knowledge/ideas.
• To present material so it can be
reasonably assessed for its level of
quality (and interest).
• To gain credit for career advancement.

Kahn, Goodman, & Mittleman. Dragging Scientific Publishing into the 21st Century 2014
http://genomebiology.com/2014/15/12/556
From Journal Delivery to PDF Delivery

Lack of Data and Software Availability
Impacts Reproducibility
1. Ioannidis et al., (2009). Repeatability of published microarray gene expression analyses. Nature Genetics 41: 14
2. Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8)
Out of 18 microarray papers, results
from 10 could not be reproduced

Deconstructing a paper into accessible,
useable, trackable, interlinked units
Need to provide credit to
reward sharing and proper
organization of:
• Narrative
• Data/Metadata
availability/curation
• Software availability
• Interoperability
• Availability of workflows
• Transparent analyses
Data/
MetaData
Software
Methods
Narrative

Deconstructing a paper into accessible,
useable, trackable, interlinked units
Currently we provide credit
for this:
• Narrative
• Data/Metadata
availability/curation
• Software availability
• Interoperability
• Availability of workflows
• Transparent analyses
Data/
MetaData
Software
Methods
Narrative
Sometimes we publish these
as Methods Papers

Beyond the Narrative
Data And Tools

Getting past…
…look but don't touch

Data publishing
http://gigasciencejournal.com/
Launched July 2012. Publishes “Data Notes” for CC0 data. Uses ISA-Tab.

Data publishing
APC covers 1TB storage in GigaDB

FAIR DATA in GigaDB
Findable Accessible Interoperable Reusable

Findable
We have 373 published datasets in GigaDB,
& around 30 TB data. Every dataset has a DOI
and the individual dataset page.
Provides powerful search engine
and API search function
e.g.
http://gigadb.org/api/search

Accessible
All data in GigaDB can be accessed in the public ftp server.
We provide three stable ftp sites in 2 geographic locations (HK & Shenzhen)
1. ftp://penguin.genomics.cn // The main ftp server
2. ftp://ftp.cngb.org/pub/gigadb/ // The mirror ftp server in the cloud
3. ftp://ftp2.cngb.org/pub/gigadb// The mirror ftp server in the cloud
Download Speed
We are working with China National Gene Bank and will to use UDP protocol software
(Data Expedition) to provide faster data download speed.
The source code for all software and tools published in GigaDB can access in the Github
https://github.com/gigascience

Accessible via API
We provide a REST API to allow user retrieve and search all metadata held in GigaDB.
The current API returns result in XML (the XML file based on the database schema), and
we plan to have the option to also return results in JSON or ISA2.0-JSON in our next
version

Accessible via API
The website
http://www.gigadb.org/site/help#0
.1_API provides detailed
instructions on how to use the
GigaDB API

Interoperable and reusable
Integrating tools (inc Jbrowse genome browser …) to visualize data

First journal with deep integration with
Launched 2nd June 2016
Reward better handling of “wet” protocols…
• Create, share, modify forkeable protocols in repo.
• Download & run on smartphone app.
• Get discoverability, credit, DOIs for sharing methods.
• Create your own, or let us set up & you claim.
http://protocols.io/

The GigaDB dataset page embeds
the protocol.io in the iframe.
e.g. RNA extraction protocol

GigaDB provides an online submission wizard and excel spreadsheet to help
users curate their own metadata

https://codeocean.com/
Cloud-based executable research platform
Browse, share & run code on AWS
Creates compute capsule: encapsulation of
the data, code, and computation
environment
Integration into the paper, share via DOIs
First examples published in GigaScience
Integrated plugin into GigaDB
Share your code this way!

gigagalaxy.net
Reward Sharing of Workflows

http://www.gigasciencejournal.com/content/3/1/23
http://www.gigasciencejournal.com/content/4/1/19
Virtual Machines/containers
• Downloadable as virtual harddisk/available as Amazon Machine Image
• Now publishing many container (docker) submissions

How FAIR can we get?
Data sets
Analyses
Open-Paper
Open-Review
DOI:10.1186/2047-217X-1-18
>50,000 accesses
& >1000 citations
Open-Code
7 reviewers tested data in ftp server & named reports published
DOI:10.5524/100044
Open-Pipelines
Open-Workflows
DOI:10.5524/100038
Open-Data
78GB CC0 data
Code in sourceforge under GPLv3: http://soapdenovo2.sourceforge.net/
>40,000 downloads
Enabled code to being picked apart by bloggers in wiki
http://homolog.us/wiki/index.php?title=SOAPdenovo2

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0127612
Quantifying how FAIR can we get

Methods
Answer
Metadata
softwareAnalysis
(Pipelines)
Workflows/
Environments
Idea
Study
Rewarding the
DOI, etc.
Publication
Publication
Publication
Data

www.gigasciencejournal.com
Give us your data, papers
& pipelines
Help GigaPanda
make it happen!
editorial@gigasciencejournal.com
database@gigasciencejournal.com
Contact us:

Thanks to:
Laurie Goodman, Editor in Chief
Nicole Nogoy, Editor
Hans Zauner, Assistant Editor
Peter Li, Lead Data Manager
Chris Hunter, Lead BioCurator
Xiao (Jesse) Si Zhe, Database Developer
Chen Qi, Shenzhen Office.
All of BGI
@GigaScience
facebook.com/GigaScience
gigasciencejournal.com/blog/
Follow us:
www.gigasciencejournal.com
www.gigadb.org
+
Weibo
& WeChat

Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing platform

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing platform

Similar to Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing platform (20)

More from GigaScience, BGI Hong Kong

More from GigaScience, BGI Hong Kong (20)

Recently uploaded

Recently uploaded (20)

Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing platform

Editor's Notes