Democratising biodiversity and genomics research: open and citizen science to build trust and fill the data gaps.

Democratising biodiversity and genomics
research: open and citizen science to
build trust and fill the data gaps.
Scott Edmunds
CNGB
18th December 2018

Scientists: need to convince public + politicians
科学家：取信于官民

Scientists: need to convince public + politicians
科学家：取信于官民
https://www.nature.com/articles/s41538-018-0018-4
“China’s Ministry of Agriculture and the science community generally expressed a positive attitude
toward GM food, but the percentage of respondents that trusted the government and scientists was
only 11.7 and 23.2%, respectively.”

1. http://www.scientificamerican.com/article/for-sale-your-name-here-in-a-prestigious-science-journal/
Paying for research ≠ Science
How not to regain trust?
失信的深渊?

https://www.ft.com/content/680ea354-5251-11e7-bfb8-997009366969
“One possible
reason for the
higher rate [in
China] is the large
bonuses paid to
researchers who
publish in
prestigious
journals,” said
Ivan Oransky, co-
founder of
Retraction Watch.
Paying for research ≠ Science
失信的深渊?

How to regain trust?
如何重获信任？
Areas we need to tackle to allow citizens to trust us
Citizen Science - Involve the public
in the scientific process
Open Science - Increase
transparency & fill the data gaps
Open Access - Change incentive
systems away from dead tree
advertising to reproducibility

How to build a community genome project using local pride

We need genetic literacy to make decisions on
Health Starting a family Shopping
What we need to know: 21st Century Edition
Context:

A solution: appeal to local pride?

HK Botanical &
Afforestation Dept.
"The mysterious origin
of the tree & its
magnificent flowers at
once arrest the interest.
Solve the Bauhinia Mystery?
1903
So far, all efforts to identify them with
any foreign species have failed"

Courtesy of: Archives des Missions Etrangère de Paris

http://igg.me/at/bauhinia
http://bauhiniagenome.hk
Crowdfunding

http://v.youku.com/v_show/id_XMjc4MzM5NDc2NA==.html
Awareness building by…

Taking genomics to 7 year olds

Results: answering scientific questions with students
B. Purpurea = motherB. Variegata = father

http://www.scmp.com/lifestyle/article/2017906/biohackers-diy-biologists-out-barcode-all-hong-kongs-plants-insects-and

Nothing new: Citizen Science
http://sabap2.adu.org.za/
http://www.hkbws.org.hk/

Need to fill biodiversity gaps
Expert predictions
of species richness
https://www.nature.com/articles/ncomms9221
Completeness of
biodiversity records

HK Citizens far outpacing academic research grade GBIF observations
https://www.gbif.org/country/HK/summary
…
• Much higher eBird (146,113) & iNaturalist (39,152) research grade observations than HKU
Herbarium (1,061)
• Korean International School made 10,792 iNaturalist observations during Inter-schools
Challenge, and CFSS saw 931 species

Beyond biodiversity…
Can citizen science versus world problems?

Into an information vacuum fills rumour
失信的深渊?
https://www.independent.co.uk/news/world/asia/japan-cracks-down-on-leaks-after-scandal-of-fukushima-nuclear-power-plant-8965296.html

Citizen monitoring success story: SafeCast

Made in China: Knowflow
https://publiclab.org/notes/shanlter/06-08-2017/knowflow-automatic-water-meter

http://www.nature.com/news/data-sharing-make-outbreak-research-open-access-1.16966
Example: Disease outbreaks
• Genome sequences from the West Africa outbreak of Ebola were first made
publicly available in April 2014
• Datasets were released sporadically when this became a hot research topic
• This led to gaps in the data
失信的深渊?

Zika: a “data gap” issue.
https://www.washingtonpost.com/world/the_americas/brazil-considers-reforming-biosecurity-law-amid-
criticism/2016/02/05/ba2108ba-cc80-11e5-b9ab-26591104bb19_story.html

Vector tracking: Hong Kong
http://www.fehd.gov.hk/english/safefood/dengue_fever/
52 locations = >98% of
HK not covered.

Citizens to the rescue: Mosquito Alert
http://www.mosquitoalert.com/en/

Citizens to the rescue: Mosquito Alert
http://www.mosquitoalert.com/en/the-first-mini-mosquito-alert-army-is-on-the-march-in-hong-kong/

HK children far outpacing academic research mosquito observations
https://www.gbif.org/dataset/1fef1ead-3d02-495e-8ff1-6aeb01123408

Regaining trust…open science

Buckheit & Donoho: Scholarly articles are merely advertisement of
scholarship. The actual scholarly artifacts, i.e. the data and
computational methods, which support the scholarship, remain largely
inaccessible.
失信的深渊?

Provide evidence not advertising
Transparency or bust
Show me the peer reviews
Give me the data/ code/protocols
Let me publish replication studies
Buckheit & Donoho: Scholarly articles are merely advertisement of
scholarship. The actual scholarly artifacts, i.e. the data and
computational methods, which support the scholarship, remain largely
inaccessible.
用证据说话

GigaScience Ethos/Policies: ‘Impact' is subjective. Data is quantitive.
Reward evidence (data), not advertising
鼓励证据（数据）而非包装
• Data
• Software
• Models
• Pipelines
• Reviews
• Re-use…
= Credit

Data Publishing: nothing new…
Data & Metadata Collection/Experiments
Analysis/Hypothesis/Analysis
Conclusions
+ Area of Interest/Question
1839
1859
20 Yrs.

Rewarding open data & code
鼓励开放数据和代码
http://gigasciencejournal.com/
Since July 2012. Publishes “Data Notes” for CC0 data, “Tech Notes” for OSI software.

Integrated GigaDB repository. DataCite DOIs. No size limits, APC covers storage.
http://gigadb.org/
Rewarding open data & code
鼓励开放数据和代码

http://gigasciencejournal.com/blog/shortcut-from-biorxiv-to-gigascience /
Now with bioRxiv integration
GigaScience embraces

Publons + PrePrint.Space
= credit for reviewers efforts
http://publons.com/
Credit transparency/open peer review
http://preprint.space/byjournal/gigascience

Visualisations
& DOIs for workflows
http://www.gigasciencejournal.com/series/Galaxy 46
Rewarding & enabling interaction
鼓励并实现互动

Workflows/Virtual Machines/containers
• Downloadable as virtual harddisk/available as Amazon Machine Image
• Now publishing container (docker) submissions
• CodeOcean widgets for code, “compute capsule” run on AWS

First journal with deep integration with
Launched 2nd June 2016
Reward better handling of “wet” protocols…
• Create, share, modify forkeable protocols in repo.
• Download & run on smartphone app.
• Widgets embedded in GigaDB
• Get discoverability, credit, DOIs for sharing methods.
• Create your own, or let us set up & you claim.
https://www.protocols.io/groups/gigascience-journal

Rewarding & enabling interaction
鼓励并实现互动
Building tools (inc Jbrowse for genomes, sketchfab for 3D images) on top of datasets…
[Insert Widget Here]

Democratising Data at GigaScience
• From Big Data to usable Data
• Example: WebTools for easy browsing and visualisation
• Pan-and-zoom map browser as a visual aid to allow the end user to
find datasets

• 3D viewer allows users to interact and explore image data prior to data
download
• 3D models are CC0, can be downloaded, and are printable
• Example: WebTools for easy browsing and visualisation
https://sketchfab.com/GigaDB

• Widening the target audience
• Bioinformaticians and ‘Big Data’ scientists are a
primary target audience
• Plugins and visualisations make access easier for
the less technically inclined
• Democratises access
through education
potential and ease of use
https://www.thingiverse.com/GigaScience/designs

Transparency to the rescue
Example 1
公开透明才能亡羊补牢。
案例研究 1

To maximize its utility to the research community and aid those fighting
the current epidemic, genomic data is released here into the public domain
under a CC0 license. Until the publication of research papers on the
assembly and whole-genome analysis of this isolate we would ask you to
cite this dataset as:
Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang,
J; Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J;
Peng, Y; Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X;
Chen, F; Yin, X; Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and the
Escherichia coli O104:H4 TY-2482 isolate genome sequencing consortium
(2011)
Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI
Shenzhen. doi:10.5524/100001
http://dx.doi.org/10.5524/100001
Our first DOI:
To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to
Genomic Data from the 2011 E. coli outbreak. This work is published from: China.
Open Data to the rescue…

Downstream consequences:
“Last summer, biologist Andrew Kasarskis was eager to help decipher the genetic origin of the Escherichia coli
strain that infected roughly 4,000 people in Germany between May and July. But he knew it that might take days
for the lawyers at his company — Pacific Biosciences — to parse the agreements governing how his team could
use data collected on the strain. Luckily, one team had released its data under a Creative Commons licence that
allowed free use of the data, allowing Kasarskis and his colleagues to join the international research effort and
publish their work without wasting time on legal wrangling.”
1. Many Citations 2. Therapeutics (primers, antimicrobials) 3. Platform Comparisons
4. Example for faster & more open science

1.3 The power of intelligently open data
The benefits of intelligently open data were powerfully
illustrated by events following an outbreak of a severe gastro-
intestinal infection in Hamburg in Germany in May 2011. This
spread through several European countries and the US,
affecting about 4000 people and resulting in over 50 deaths. All
tested positive for an unusual and little-known Shiga-toxin–
producing E. coli bacterium. The strain was initially analysed by
scientists at BGI-Shenzhen in China, working together with
those in Hamburg, and three days later a draft genome was
released under an open data licence. This generated interest
from bioinformaticians on four continents. 24 hours after the
release of the genome it had been assembled. Within a week
two dozen reports had been filed on an open-source site
dedicated to the analysis of the strain. These analyses
provided crucial information about the strain’s virulence and
resistance genes – how it spreads and which antibiotics are
effective against it. They produced results in time to help
contain the outbreak. By July 2011, scientists published papers
based on this work. By opening up their early sequencing
results to international collaboration, researchers in Hamburg
produced results that were quickly tested by a wide range of
experts, used to produce new knowledge and ultimately to
control a public health emergency.

Example 2
案例研究 2

Oxford Nanopore in the spotlight, Sept 2014. Does it work?
https://doi.org/10.1111/1755-0998.12324
http://omicsomics.blogspot.com/2014/09/oxford-takes-some-flak-fires-back.html
2014年9月面世的Oxford Nanopore，好用吗？

Nanopore MinION E. Coli genome
released via GigaDB 10-Sep-2014
Curated & converted to ISA-tab, &
worked with EBI to get raw data there
Data Note submitted & preprint version
out 26-Sept-2014
Peer reviewed & published 20-Oct-2014
http://dx.doi.org/10.5524/100102

Example 3
案例研究 3

Would you trust a BGI sequencer?
华大测序仪可信吗？

Try before you buy: inspect ALL the data yourselves
https://doi.org/10.1093/gigascience/gix024
• Comparisons with Illumina for
PE50, 100 & 150
• Raw sequencing data in NCBI SRA
• FASTQ files in GigaDB
• Raw image files also shared
先尝后买：亲自检查所有数据

Open, transparent and peer reviewed benchmarking
http://dx.doi.org/10.5524/review.100698
http://dx.doi.org/10.5524/review.100699Open
Review

Example 4
案例研究 4

Need to expand wildlife forensics

Transparency saves wildlife
User-friendly pipeline for the rapid identification of CITES-listed
species in forensic samples using Illumina data.
• International validation trial by 16 laboratories.
• All input sequence data + results available in GigaDB.
• SOPs available in protocols.io.

Example 5
案例研究 5

• Challenges of Food security
• Rice, Oryza sativa L., is the
staple food for half the world’s
population
• By 2030, rice production must
increase by at least 25% to keep
pace with population growth
• 80% of countries face a serious
burden of malnutrition,
especially in Africa and SE Asia

Rice 3K project
• 3,000 rice genomes
• 13.4TB public data
• 6 months to copy
data to Sequence
Read Archive (SRA)
• Data published 4
years before
analysis published

• Orphan Crops
• The African Orphan Crop
Consortium (AOCC) is
developing genomic resources
for 101 crops that represent a
significant part of African/Asian
diets.
• To-date, the AOCC working on
69 genomes, first 5 of which
just published in GigaScience.
Hyacinth bean
https://doi.org/10.1093/gigascience/giy152

• Each AOCC genome is a single GigaDB dataset (with DOI)

From Big Data to usable(ish) Data
• Although 13TB data in GigaDB was open (CC0), after analysing in
Tianhe supercomputer processed rice3K data = 100TB
• AWS hosted for free, but expensive to process
https://aws.amazon.com/public-data-sets/3000-rice-genome/

Processed data finally published 1st May 2018, Nature v557, p43–49
https://www.nature.com/articles/s41586-018-0063-9

• Example: Easy-to-use plug and play RiceGalaxy
• GUI means plant breeders can utilise genetic data without coding skills
• Funded to run at low cost (<100 USD/month) via AWS Singapore & local
servers (2 vCPUs, 8GB RAM, 2 mounted volumes, 200GB total storage)
• CGIAR Excellence in Plant Breeding Platform/model will roll out to other
crops

Other beneficiaries: you!
Piwowar HA, Day RS, Fridsma DB (2007)
PLoS ONE 2(3): e308.
doi:10.1371/journal.pone.0000308
Sharing Detailed Research
Data Is Associated with
Increased Citation Rate.
Every 10 datasets collected contributes to at least 4 papers in the
following 3-years.
Piwowar, HA, Vision, TJ, & Whitlock, MC (2011). Data archiving is a good investment Nature, 473
(7347), 285-285 DOI: 10.1038/473285a

Open Science = Science
• Science needed more than ever to tackle grave
environmental challenges and fight disease
• Stand on the shoulders of giants, and allow others
to stand on yours
• Choose evidence not branding
• Being closed provokes distrust, prevents
downstream use, and ultimately harms science
• Being open helps science, your immediate
community, and ultimately your career
• Preempt new EU Open Science and MOST rules on
“strengthening research integrity”…
http://most.gov.cn/mostinfo/xinxifenlei/fgzc/gfxwj/gfxwj2018/201805/t20180531_139731.htm

Help GigaScience make it happen
www.gigasciencejournal.com
Give us your data,
pipelines & papers
scott@gigasciencejournal.com
editorial@gigasciencejournal.com
database@gigasciencejournal.com
Contact us:
助力GigaScience实现科研过程全公开

Thanks to:
Laurie Goodman, Editor in Chief
Nicole Nogoy, Editor
Hans Zauner, Assistant Editor
Hongling Zhao, Assistant Editor
Peter Li, Lead Data Manager
Chris Hunter, Lead BioCurator
Chris Armit, Data Scientist
Mary Ann Tulli, Data Ediitor
Xiao (Jesse) Si Zhe, Database Developer
Chen Qi, Shenzhen Office.
@GigaScience
facebook.com/GigaScience
http://gigasciencejournal.com/blog/
Follow us:
www.gigadb.org
+
Weibo
& WeChat

Questions?

Democratising biodiversity and genomics research: open and citizen science to build trust and fill the data gaps.

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Democratising biodiversity and genomics research: open and citizen science to build trust and fill the data gaps.

Similar to Democratising biodiversity and genomics research: open and citizen science to build trust and fill the data gaps. (20)

More from GigaScience, BGI Hong Kong

More from GigaScience, BGI Hong Kong (20)

Recently uploaded

Recently uploaded (20)

Democratising biodiversity and genomics research: open and citizen science to build trust and fill the data gaps.