SlideShare a Scribd company logo
Big Data Publishing,
Handling, & Reuse
Laurie Goodman, PhD
Editor-in-Chief, GigaScience
Laurie@gigasciencejournal.com
ORCID ID: 0000-0001-9724-5976
Beyond Data Release Mandates
What is the point of publishing?
• To disseminate
information/knowledge/ideas.
• To present material so it can be reasonably
assessed for its level of quality (and interest).
• To gain credit for career advancement.
What goes into a research article?
+ Area of Interest/
Question
What goes into a research article?
+ Area of Interest/
Question
Data & Metadata Collection
Analysis/Hypothesis/Analysis
Conclusions
What goes into a research article?
Analysis/Hypothesis/Analysis
Conclusions
+ Area of Interest/
Question
Data & Metadata Collection
Scientific Communication
Via Publication
• Scholarly articles are merely advertisement of scholarship .
The actual scholarly artefacts, i.e. the data and
computational methods, which support the
scholarship, remain largely inaccessible --- Jon B.
Buckheit and David L. Donoho, WaveLab and reproducible
research, 1995
• Core scientific statements or assertions are intertwined and
hidden in the conventional scholarly narratives
• Lack of transparency, lack of credit for anything other than
“regular” dead tree publication
Kahn, Goodman, & Mittleman. Dragging Scientific Publishing into the 21st Century 2014
http://genomebiology.com/2014/15/12/556
From Journal Delivery to PDF Delivery
Lack of Data and Software Availability
Impacts Reproducibility
1. Ioannidis et al., (2009). Repeatability of published microarray gene expression analyses. Nature Genetics 41: 14
2. Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8)
Out of 18 microarray papers, results
from 10 could not be reproduced
Retractions are on the Rise
>15X increase in last decade
1. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html
2. Retracted Science and the Retraction Index ▿ http://iai.asm.org/content/79/10/3855.abstract?
Deconstructing a paper into accessible,
useable, trackable, interlinked units
Need to provide credit to
reward sharing and proper
organization of:
• Narrative
• Data/Metadata
availability/curation
• Software availability
• Interoperability
• Availability of workflows
• Transparent analyses
Data/
MetaData
Software
Methods
Narrative
Deconstructing a paper into accessible,
useable, trackable, interlinked units
Currently we provide credit
for this:
• Narrative
• Data/Metadata
availability/curation
• Software availability
• Interoperability
• Availability of workflows
• Transparent analyses
Data/
MetaData
Software
Methods
Narrative
Sometimes we publish these
as Methods Papers
Beyond the Narrative
Data And Tools
Promoting Data Release
Data Citation
But- publishing ‘Data’ is “Salami Slicing”!!
What is Salami Slicing?
• Publishing research in several different papers that
should form a single cohesive paper
Why is it ‘unethical’
• It fragments the scientific literature, wasting
researcher’s time as they try to get all the information
related to a very specific topic/dataset/method
• It can give the appearance (given there are multiple
publications) that there is large support for a particular
hypothesis
• It pads a researcher’s publication record unfairly
Publishing ‘Data’ is “Salami Slicing”!
Baloney
1. Those guidelines were developed prior to the year 2000:
• More than 15 years ago: at a time when data set sizes and data
types collected in the life sciences by a single research group
were relatively small and primarily suitable for a single or narrow
range of disciplines or hypotheses.
• Most journals were not online (which allows easier identification
and access to closely related articles ) until the late ‘90s.
2. In 2005, COPE* ruled that a paper that had data that had been used
and described, at least in part, in a previous publication was not
unethical *Council of Publication Ethics. http://www.publicationethics.org/case/salami-publication
3. Data collection can be (should be!!) a scholarly pursuit:
• Data that is broadly reusable requires care, thought, training,
time, and money to be properly collected, curated, stored, and
shared.
Contrary to popular belief…
There are very few
—if any—
‘push-a-button-and-get-it’
reuseable data resources
Your not supposed to just collect samples!
*Collect ALL available metadata*
Help Develop a Digital Data Curation Team at your
Institution’s Library (they may already have one…)
Back to Darwin
Data & Metadata Collection/Experiments
Analysis/Hypothesis/Analysis
Conclusions
+ Area of Interest/
Question
1839
1859
20 Yrs.
Say… was this a Data Publication?
Data & Metadata Collection/Experiments
Analysis/Hypothesis/Analysis
Conclusions
+ Area of Interest/
Question
1839
1859
The most curious fact is the
perfect gradation in the
size of the beaks in the
different species of
Geospiza, from one as large
as that of a hawfinch to
that of a chaffinch, and (if
Mr. Gould is right in including his sub-group, Certhidea, in
the main group) even to that of a warbler. The largest beak
in the genus Geospiza is shown in Fig. 1, and the smallest in
Fig. 3; but instead of there being only one intermediate
species, with a beak of the size shown in Fig. 2, there are no
less than six species with insensibly graduated beaks.
(Chapter 17)
DataCite and DOIs
• Aims to “increase acceptance of research data as
legitimate, citable contributions to the scholarly
record”.
• “data generated in the course of research are
just as valuable to the ongoing academic
discourse as papers and monographs”.
Citing Data Isn’t New
The Physical Sciences have been doing this for a while…
What we’re doing:
Mandating and Aiding for Data Release
Requiring all data supporting work to be Freely available in a
publically available repository
– How we’re helping to do this:
• Journal-dedicated data and software repository GigaDB
that hosts ALL data types.
• Have Biocurators to aid in handling Metadata
• All Datasets are provided a Digital Object Identifier
(DOI) making them citable and countable
• All Material in GigaDB is available under a CC0 Waiver
• Data with a publically approved database must be
submitted there as well
• Provide Direct links to all associated information
Requiring all software and work to be Freely available in a
publically available repository
– How we’re promoting this:
• All software created by authors must be 100% OSI
compliant
• Journal-Dedicated repository GigaDB hosts software so
it can be downloaded.
• Software and Workflows are provided a DOI making
them citable and countable (reward)
• Journal-dedicated Galaxy Platform to run tools
• Have a Data Manager and Data Scientist to wrap and
deploy software tools
• Have our own Github Repository
What we’re doing
Mandating and Aiding Software Release
Data Sets in
GigaDB
Analyses/
Workflows in
GigaGalaxy
Paper in
GigaScience
(Narrative + Methods)
Open-access journal Data Publishing Platform
Data Computation Analysis Platform
How we view publishing at GigaScience
Making the Data Itself Citable
We provide a linked journal database- this is done to link the data
directly to our papers to ease reproducibility, make it available at the
time of review, and provide authors a place to submit data with no
sustainable ‘home’.
Note: there are many community available databases- so in principle-
any journal can do this by taking advantage of such available
resources.
These include the usual suspects: EBI, NCBI, DDBJ etc.
Databases that take all data types and provide Data DOIs: Dryad,
FigShare, etc.
There are also numerous smaller community databases specific to
different fields or data types.
Some of the Journals Currently
Doing Data Publication
http://proj.badc.rl.ac.uk/preparde/blog/DataJournalsList
Citing Data in the
References Allows Tracking
This rewards authors for making data
available AND makes it easier to find
But is this being done?
Yes:
Yes:
Is Cited
Data
Being
Tracked?
Yes:
Improving Quality as
Well as Availability
How Hard is Data and Software Review?
Not really that much harder than narrative
review.
Fail – submitter is
provided error report
Pass – dataset is
uploaded to
GigaDB.
Curator makes dataset public
(can be set as future date if
required)
DataCite
XML file
Submission
Submitter logs in to
GigaDB website and
uploads Excel submission
or uses online wizard
DOI
assigned
Files
Submitter provides
files by ftp or
Aspera
XML is generated and
registered with DataCite
Curator Review
Curator contacts submitter with
DOI citation and to arrange file
transfer (and resolve any other
questions/issues).
DOI 10.5524/100003
Genomic data from the
crab-eating
macaque/cynomolgus
monkey (Macaca
fascicularis) (2011)
Public GigaDB dataset
Data must be available for review with the manuscript
(and at the very least get a sanity check…)
Reviewing Data in More Detail
Issue: We can’t ask our reviewers to do that!
Our finding: Reviewers don’t mind
Reviewer Dr. Christophe Pouzat on neuroscience
manuscript:
“In addition to making the presented research
trustworthy, the reproducible research
paradigm definitely makes the reviewers job
more fun!”
Can also use specific Data Reviewers (we have)
Reviewing DataAND Software
Code in sourceforge under GPLv3:
http://soapdenovo2.sourceforge.net/>5000 downloads
http://homolog.us/wiki/index.php?title=SOAPdenovo2
Data sets
Analyses
Open-Paper
Open-Review
DOI:10.1186/2047-217X-1-18
>35,000 accesses
Open-Code
8 reviewers tested data in ftp server & named reports published
DOI:10.5524/100044
Open-Pipelines
Open-Workflows
DOI:10.5524/100038
Open-Data
78GB CC0 data
Enabled code to being picked apart by bloggers in wiki
8 Reviewers! Holy Cow- that must have
taken forever!!
Submission
July 24
Final review
Aug 28
These were
reviewing
teams from
different labs,
assessing the
materials at
multiple levels
Is this really worth the effort?
Beyond Reproducibility:
REUSE
Data Availability and Tools
http://blogs.biomedcentral.com/gigablog/2014/05/14/the-latest-weapon-in-publishing-
data-the-polar-bear/
These data
were released
THREE YEARS
before
publication of
the analysis
article
The polar bear DATA were released –prepublication- in 2011
They were used and cited in the following studies- before the main paper on the
sequencing was published
Hailer, F et al., Nuclear genomic sequences reveal that polar bears are an old and distinct
bear lineage. Science. 2012 Apr 20;336(6079):344-7. doi:10.1126/science.1216424.
Cahill, JA et al., Genomic evidence for island population conversion resolves conflicting
theories of polar bear evolution. PLoS Genet. 2013;9(3):e1003345.
doi:10.1371/journal.pgen.1003345.
Morgan, CC et al., Heterogeneous models place the root of the placental mammal
phylogeny. Mol Biol Evol. 2013 Sep;30(9):2145-56. doi:10.1093/molbev/mst117.
Cronin, MA et al., Molecular Phylogeny and SNP Variation of Polar Bears (Ursus
maritimus), Brown Bears (U. arctos), and Black Bears (U. americanus) Derived from
Genome Sequences. J Hered. 2014; 105(3):312-23. doi:10.1093/jhered/est133.
Bidon, T et al., Brown and Polar Bear Y Chromosomes Reveal Extensive Male-Biased Gene
Flow within Brother Lineages. Mol Biol Evol. 2014 Apr 4. doi:10.1093/molbev/msu109
http://blogs.biomedcentral.com/gigablog/2014/05/14/the-latest-weapon-in-publishing-data-the-polar-bear/
Even though the data had
been released over 2 years
earlier and cited in other
papers- the main analysis
paper was published in Cell
Cell Press Journals had indicated
publishing a dataset prior to publication
could be considered as prior publication
• New Sequencing technology
• minION Oxford-Nanopore
• New Sequence Data Type
• EBI and NCBI Databases not ready
• High community interest for testing
data
• >100 GB of data
Real time use during the
publication process
• Uploaded prior to publication
• Deployed on Amazon Cloud Front
• Ongoing
testing/comparison/information
sharing prior to publication
• When ready for data EBI used our
cloud to upload data
• EBI transferred the data to NCBI when
they were ready
Getting past…
…look but don't touch
Reproduce and Reuse Needs Much More
• Data: GigaDB
• Software: Github
• Workflows
– Galaxy
– Executable Docs
– VMs
• Images: OMERO
• Cloud storage, tools, and
compute power…
• Need this to reach the smaller
labs
github.com/gigascience/gigadb-cogini
More Journals have or are starting to introduce
these and other tools: More is needed…
Currently… it feels like this…
Well…
…because it is like this
If we want to
move
forward, we
need to go
through that
to reach this:
It will require
researchers,
institutions,
publishers,
and funders
working
together.
Thanks to:
Scott Edmunds, Executive Editor
Nicole Nogoy, Commissioning Editor
Peter Li, Lead Data Manager
Chris Hunter, Lead BioCurator
Rob Davidson, Data Scientist
Xiao (Jesse) Si Zhe, Database Developer
Amye Kenall, Journal Development Manager
editorial@gigasciencejournal.com
database@gigasciencejournal.com
@GigaScience
facebook.com/GigaScience
blogs.openaccesscentral.com/blogs/gigablog
Contact us:
Follow us:
www.gigasciencejournal.com
www.gigadb.org

More Related Content

What's hot

2012 Fall Data Management Planning Workshop
2012 Fall Data Management Planning Workshop2012 Fall Data Management Planning Workshop
2012 Fall Data Management Planning Workshop
Lizzy_Rolando
 
DataONE Education Module 08: Data Citation
DataONE Education Module 08: Data CitationDataONE Education Module 08: Data Citation
DataONE Education Module 08: Data Citation
DataONE
 
The DataTags System: Sharing Sensitive Data with Confidence
The DataTags System: Sharing Sensitive Data with ConfidenceThe DataTags System: Sharing Sensitive Data with Confidence
The DataTags System: Sharing Sensitive Data with Confidence
Merce Crosas
 
20160414 23 Research Data Things
20160414 23 Research Data Things20160414 23 Research Data Things
20160414 23 Research Data Things
Katina Toufexis
 
Managing, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital EnvironmentManaging, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital Environment
philipdurbin
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
Carole Goble
 
Preparing your data for sharing and publishing
Preparing your data for sharing and publishingPreparing your data for sharing and publishing
Preparing your data for sharing and publishing
Varsha Khodiyar
 
DataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data Sharing
DataONE
 
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinai
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinaiDataTags: Sharing Privacy Sensitive Data by Michael Bar-sinai
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinai
datascienceiqss
 
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALSBROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
Micah Altman
 
dkNET Poster ENDO 2016
dkNET Poster ENDO 2016 dkNET Poster ENDO 2016
dkNET Poster ENDO 2016
dkNET
 
David Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordDavid Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published record
Jisc
 
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital TextsCase Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
Beth Plale
 
McGeary Data Curation Network: Developing and Scaling
McGeary Data Curation Network: Developing and ScalingMcGeary Data Curation Network: Developing and Scaling
McGeary Data Curation Network: Developing and Scaling
National Information Standards Organization (NISO)
 
Publishing perspectives on data management & future directions
Publishing perspectives on data management & future directionsPublishing perspectives on data management & future directions
Publishing perspectives on data management & future directions
ARDC
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
Robert Grossman
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...
Projeto RCAAP
 
DataONE Education Module 01: Why Data Management?
DataONE Education Module 01: Why Data Management?DataONE Education Module 01: Why Data Management?
DataONE Education Module 01: Why Data Management?
DataONE
 
HathiTrust Research Center Secure Commons
HathiTrust Research Center Secure CommonsHathiTrust Research Center Secure Commons
HathiTrust Research Center Secure Commons
Beth Plale
 

What's hot (20)

2012 Fall Data Management Planning Workshop
2012 Fall Data Management Planning Workshop2012 Fall Data Management Planning Workshop
2012 Fall Data Management Planning Workshop
 
DataONE Education Module 08: Data Citation
DataONE Education Module 08: Data CitationDataONE Education Module 08: Data Citation
DataONE Education Module 08: Data Citation
 
The DataTags System: Sharing Sensitive Data with Confidence
The DataTags System: Sharing Sensitive Data with ConfidenceThe DataTags System: Sharing Sensitive Data with Confidence
The DataTags System: Sharing Sensitive Data with Confidence
 
20160414 23 Research Data Things
20160414 23 Research Data Things20160414 23 Research Data Things
20160414 23 Research Data Things
 
Managing, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital EnvironmentManaging, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital Environment
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
Preparing your data for sharing and publishing
Preparing your data for sharing and publishingPreparing your data for sharing and publishing
Preparing your data for sharing and publishing
 
DataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data Sharing
 
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinai
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinaiDataTags: Sharing Privacy Sensitive Data by Michael Bar-sinai
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinai
 
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALSBROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
 
295 king
295 king295 king
295 king
 
dkNET Poster ENDO 2016
dkNET Poster ENDO 2016 dkNET Poster ENDO 2016
dkNET Poster ENDO 2016
 
David Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordDavid Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published record
 
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital TextsCase Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
 
McGeary Data Curation Network: Developing and Scaling
McGeary Data Curation Network: Developing and ScalingMcGeary Data Curation Network: Developing and Scaling
McGeary Data Curation Network: Developing and Scaling
 
Publishing perspectives on data management & future directions
Publishing perspectives on data management & future directionsPublishing perspectives on data management & future directions
Publishing perspectives on data management & future directions
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...
 
DataONE Education Module 01: Why Data Management?
DataONE Education Module 01: Why Data Management?DataONE Education Module 01: Why Data Management?
DataONE Education Module 01: Why Data Management?
 
HathiTrust Research Center Secure Commons
HathiTrust Research Center Secure CommonsHathiTrust Research Center Secure Commons
HathiTrust Research Center Secure Commons
 

Similar to Laurie Goodman at NDIC: Big Data Publishing, Handling & Reuse

Research-Data-Management-and-your-PhD
Research-Data-Management-and-your-PhDResearch-Data-Management-and-your-PhD
Research-Data-Management-and-your-PhD
University of Liverpool Library
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
Scott Edmunds
 
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Anita de Waard
 
Human Genome and Big Data Challenges
Human Genome and Big Data ChallengesHuman Genome and Big Data Challenges
Human Genome and Big Data Challenges
Philip Bourne
 
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
The University of Edinburgh
 
Research data life cycle
Research data life cycleResearch data life cycle
Research data life cycle
University of Arizona
 
Where's the Data?
Where's the Data?Where's the Data?
Where's the Data?
Andrea Payant
 
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Anita de Waard
 
How and Why to Share Your Data
How and Why to Share Your DataHow and Why to Share Your Data
How and Why to Share Your Data
kfear
 
Intro to RDM
Intro to RDMIntro to RDM
Intro to RDM
Sarah Jones
 
2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial
Josh Young
 
Laurie Goodman at #SSPBoston: Article+Data+Tools Reproducibility, Reuse, & Ra...
Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Ra...Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Ra...
Laurie Goodman at #SSPBoston: Article+Data+Tools Reproducibility, Reuse, & Ra...
GigaScience, BGI Hong Kong
 
Fsci 2018 thursday2_august_am6
Fsci 2018 thursday2_august_am6Fsci 2018 thursday2_august_am6
Fsci 2018 thursday2_august_am6
ARDC
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data Locally
Erin D. Foster
 
Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...
GarethKnight
 
Data Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach DataData Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach Data
cunera
 
Data Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumData Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access Symposium
Merce Crosas
 
Critical infrastructure to promote data synthesis
Critical infrastructure to promote data synthesis Critical infrastructure to promote data synthesis
Critical infrastructure to promote data synthesis
Soil and Water Conservation Society
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and Libaries
Rob Grim
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data Management
Anita de Waard
 

Similar to Laurie Goodman at NDIC: Big Data Publishing, Handling & Reuse (20)

Research-Data-Management-and-your-PhD
Research-Data-Management-and-your-PhDResearch-Data-Management-and-your-PhD
Research-Data-Management-and-your-PhD
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
 
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
 
Human Genome and Big Data Challenges
Human Genome and Big Data ChallengesHuman Genome and Big Data Challenges
Human Genome and Big Data Challenges
 
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
 
Research data life cycle
Research data life cycleResearch data life cycle
Research data life cycle
 
Where's the Data?
Where's the Data?Where's the Data?
Where's the Data?
 
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
 
How and Why to Share Your Data
How and Why to Share Your DataHow and Why to Share Your Data
How and Why to Share Your Data
 
Intro to RDM
Intro to RDMIntro to RDM
Intro to RDM
 
2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial
 
Laurie Goodman at #SSPBoston: Article+Data+Tools Reproducibility, Reuse, & Ra...
Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Ra...Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Ra...
Laurie Goodman at #SSPBoston: Article+Data+Tools Reproducibility, Reuse, & Ra...
 
Fsci 2018 thursday2_august_am6
Fsci 2018 thursday2_august_am6Fsci 2018 thursday2_august_am6
Fsci 2018 thursday2_august_am6
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data Locally
 
Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...
 
Data Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach DataData Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach Data
 
Data Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumData Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access Symposium
 
Critical infrastructure to promote data synthesis
Critical infrastructure to promote data synthesis Critical infrastructure to promote data synthesis
Critical infrastructure to promote data synthesis
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and Libaries
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data Management
 

More from GigaScience, BGI Hong Kong

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...
GigaScience, BGI Hong Kong
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByte
GigaScience, BGI Hong Kong
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
GigaScience, BGI Hong Kong
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
GigaScience, BGI Hong Kong
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
GigaScience, BGI Hong Kong
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
GigaScience, BGI Hong Kong
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
GigaScience, BGI Hong Kong
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
GigaScience, BGI Hong Kong
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
GigaScience, BGI Hong Kong
 
Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10
GigaScience, BGI Hong Kong
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
GigaScience, BGI Hong Kong
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
GigaScience, BGI Hong Kong
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
GigaScience, BGI Hong Kong
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
GigaScience, BGI Hong Kong
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
GigaScience, BGI Hong Kong
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
GigaScience, BGI Hong Kong
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
GigaScience, BGI Hong Kong
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
GigaScience, BGI Hong Kong
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...
GigaScience, BGI Hong Kong
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
GigaScience, BGI Hong Kong
 

More from GigaScience, BGI Hong Kong (20)

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByte
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
 

Recently uploaded

RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
AADYARAJPANDEY1
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
NathanBaughman3
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
sonaliswain16
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
subedisuryaofficial
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
aishnasrivastava
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 

Recently uploaded (20)

RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 

Laurie Goodman at NDIC: Big Data Publishing, Handling & Reuse

  • 1. Big Data Publishing, Handling, & Reuse Laurie Goodman, PhD Editor-in-Chief, GigaScience Laurie@gigasciencejournal.com ORCID ID: 0000-0001-9724-5976 Beyond Data Release Mandates
  • 2. What is the point of publishing? • To disseminate information/knowledge/ideas. • To present material so it can be reasonably assessed for its level of quality (and interest). • To gain credit for career advancement.
  • 3. What goes into a research article? + Area of Interest/ Question
  • 4. What goes into a research article? + Area of Interest/ Question Data & Metadata Collection Analysis/Hypothesis/Analysis Conclusions
  • 5. What goes into a research article? Analysis/Hypothesis/Analysis Conclusions + Area of Interest/ Question Data & Metadata Collection
  • 6. Scientific Communication Via Publication • Scholarly articles are merely advertisement of scholarship . The actual scholarly artefacts, i.e. the data and computational methods, which support the scholarship, remain largely inaccessible --- Jon B. Buckheit and David L. Donoho, WaveLab and reproducible research, 1995 • Core scientific statements or assertions are intertwined and hidden in the conventional scholarly narratives • Lack of transparency, lack of credit for anything other than “regular” dead tree publication
  • 7. Kahn, Goodman, & Mittleman. Dragging Scientific Publishing into the 21st Century 2014 http://genomebiology.com/2014/15/12/556 From Journal Delivery to PDF Delivery
  • 8. Lack of Data and Software Availability Impacts Reproducibility 1. Ioannidis et al., (2009). Repeatability of published microarray gene expression analyses. Nature Genetics 41: 14 2. Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8) Out of 18 microarray papers, results from 10 could not be reproduced
  • 9. Retractions are on the Rise >15X increase in last decade 1. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html 2. Retracted Science and the Retraction Index ▿ http://iai.asm.org/content/79/10/3855.abstract?
  • 10. Deconstructing a paper into accessible, useable, trackable, interlinked units Need to provide credit to reward sharing and proper organization of: • Narrative • Data/Metadata availability/curation • Software availability • Interoperability • Availability of workflows • Transparent analyses Data/ MetaData Software Methods Narrative
  • 11. Deconstructing a paper into accessible, useable, trackable, interlinked units Currently we provide credit for this: • Narrative • Data/Metadata availability/curation • Software availability • Interoperability • Availability of workflows • Transparent analyses Data/ MetaData Software Methods Narrative Sometimes we publish these as Methods Papers
  • 14. But- publishing ‘Data’ is “Salami Slicing”!! What is Salami Slicing? • Publishing research in several different papers that should form a single cohesive paper Why is it ‘unethical’ • It fragments the scientific literature, wasting researcher’s time as they try to get all the information related to a very specific topic/dataset/method • It can give the appearance (given there are multiple publications) that there is large support for a particular hypothesis • It pads a researcher’s publication record unfairly
  • 15. Publishing ‘Data’ is “Salami Slicing”! Baloney 1. Those guidelines were developed prior to the year 2000: • More than 15 years ago: at a time when data set sizes and data types collected in the life sciences by a single research group were relatively small and primarily suitable for a single or narrow range of disciplines or hypotheses. • Most journals were not online (which allows easier identification and access to closely related articles ) until the late ‘90s. 2. In 2005, COPE* ruled that a paper that had data that had been used and described, at least in part, in a previous publication was not unethical *Council of Publication Ethics. http://www.publicationethics.org/case/salami-publication 3. Data collection can be (should be!!) a scholarly pursuit: • Data that is broadly reusable requires care, thought, training, time, and money to be properly collected, curated, stored, and shared.
  • 16. Contrary to popular belief… There are very few —if any— ‘push-a-button-and-get-it’ reuseable data resources
  • 17. Your not supposed to just collect samples! *Collect ALL available metadata* Help Develop a Digital Data Curation Team at your Institution’s Library (they may already have one…)
  • 18. Back to Darwin Data & Metadata Collection/Experiments Analysis/Hypothesis/Analysis Conclusions + Area of Interest/ Question 1839 1859 20 Yrs.
  • 19. Say… was this a Data Publication? Data & Metadata Collection/Experiments Analysis/Hypothesis/Analysis Conclusions + Area of Interest/ Question 1839 1859 The most curious fact is the perfect gradation in the size of the beaks in the different species of Geospiza, from one as large as that of a hawfinch to that of a chaffinch, and (if Mr. Gould is right in including his sub-group, Certhidea, in the main group) even to that of a warbler. The largest beak in the genus Geospiza is shown in Fig. 1, and the smallest in Fig. 3; but instead of there being only one intermediate species, with a beak of the size shown in Fig. 2, there are no less than six species with insensibly graduated beaks. (Chapter 17)
  • 20. DataCite and DOIs • Aims to “increase acceptance of research data as legitimate, citable contributions to the scholarly record”. • “data generated in the course of research are just as valuable to the ongoing academic discourse as papers and monographs”. Citing Data Isn’t New The Physical Sciences have been doing this for a while…
  • 21.
  • 22. What we’re doing: Mandating and Aiding for Data Release Requiring all data supporting work to be Freely available in a publically available repository – How we’re helping to do this: • Journal-dedicated data and software repository GigaDB that hosts ALL data types. • Have Biocurators to aid in handling Metadata • All Datasets are provided a Digital Object Identifier (DOI) making them citable and countable • All Material in GigaDB is available under a CC0 Waiver • Data with a publically approved database must be submitted there as well • Provide Direct links to all associated information
  • 23. Requiring all software and work to be Freely available in a publically available repository – How we’re promoting this: • All software created by authors must be 100% OSI compliant • Journal-Dedicated repository GigaDB hosts software so it can be downloaded. • Software and Workflows are provided a DOI making them citable and countable (reward) • Journal-dedicated Galaxy Platform to run tools • Have a Data Manager and Data Scientist to wrap and deploy software tools • Have our own Github Repository What we’re doing Mandating and Aiding Software Release
  • 24. Data Sets in GigaDB Analyses/ Workflows in GigaGalaxy Paper in GigaScience (Narrative + Methods) Open-access journal Data Publishing Platform Data Computation Analysis Platform How we view publishing at GigaScience
  • 25. Making the Data Itself Citable We provide a linked journal database- this is done to link the data directly to our papers to ease reproducibility, make it available at the time of review, and provide authors a place to submit data with no sustainable ‘home’. Note: there are many community available databases- so in principle- any journal can do this by taking advantage of such available resources. These include the usual suspects: EBI, NCBI, DDBJ etc. Databases that take all data types and provide Data DOIs: Dryad, FigShare, etc. There are also numerous smaller community databases specific to different fields or data types.
  • 26. Some of the Journals Currently Doing Data Publication http://proj.badc.rl.ac.uk/preparde/blog/DataJournalsList
  • 27. Citing Data in the References Allows Tracking This rewards authors for making data available AND makes it easier to find But is this being done?
  • 28.
  • 29. Yes:
  • 30.
  • 31. Yes:
  • 33. Improving Quality as Well as Availability How Hard is Data and Software Review? Not really that much harder than narrative review.
  • 34. Fail – submitter is provided error report Pass – dataset is uploaded to GigaDB. Curator makes dataset public (can be set as future date if required) DataCite XML file Submission Submitter logs in to GigaDB website and uploads Excel submission or uses online wizard DOI assigned Files Submitter provides files by ftp or Aspera XML is generated and registered with DataCite Curator Review Curator contacts submitter with DOI citation and to arrange file transfer (and resolve any other questions/issues). DOI 10.5524/100003 Genomic data from the crab-eating macaque/cynomolgus monkey (Macaca fascicularis) (2011) Public GigaDB dataset Data must be available for review with the manuscript (and at the very least get a sanity check…)
  • 35. Reviewing Data in More Detail Issue: We can’t ask our reviewers to do that! Our finding: Reviewers don’t mind Reviewer Dr. Christophe Pouzat on neuroscience manuscript: “In addition to making the presented research trustworthy, the reproducible research paradigm definitely makes the reviewers job more fun!” Can also use specific Data Reviewers (we have)
  • 36. Reviewing DataAND Software Code in sourceforge under GPLv3: http://soapdenovo2.sourceforge.net/>5000 downloads http://homolog.us/wiki/index.php?title=SOAPdenovo2 Data sets Analyses Open-Paper Open-Review DOI:10.1186/2047-217X-1-18 >35,000 accesses Open-Code 8 reviewers tested data in ftp server & named reports published DOI:10.5524/100044 Open-Pipelines Open-Workflows DOI:10.5524/100038 Open-Data 78GB CC0 data Enabled code to being picked apart by bloggers in wiki
  • 37. 8 Reviewers! Holy Cow- that must have taken forever!! Submission July 24 Final review Aug 28 These were reviewing teams from different labs, assessing the materials at multiple levels
  • 38. Is this really worth the effort? Beyond Reproducibility: REUSE Data Availability and Tools
  • 40. The polar bear DATA were released –prepublication- in 2011 They were used and cited in the following studies- before the main paper on the sequencing was published Hailer, F et al., Nuclear genomic sequences reveal that polar bears are an old and distinct bear lineage. Science. 2012 Apr 20;336(6079):344-7. doi:10.1126/science.1216424. Cahill, JA et al., Genomic evidence for island population conversion resolves conflicting theories of polar bear evolution. PLoS Genet. 2013;9(3):e1003345. doi:10.1371/journal.pgen.1003345. Morgan, CC et al., Heterogeneous models place the root of the placental mammal phylogeny. Mol Biol Evol. 2013 Sep;30(9):2145-56. doi:10.1093/molbev/mst117. Cronin, MA et al., Molecular Phylogeny and SNP Variation of Polar Bears (Ursus maritimus), Brown Bears (U. arctos), and Black Bears (U. americanus) Derived from Genome Sequences. J Hered. 2014; 105(3):312-23. doi:10.1093/jhered/est133. Bidon, T et al., Brown and Polar Bear Y Chromosomes Reveal Extensive Male-Biased Gene Flow within Brother Lineages. Mol Biol Evol. 2014 Apr 4. doi:10.1093/molbev/msu109 http://blogs.biomedcentral.com/gigablog/2014/05/14/the-latest-weapon-in-publishing-data-the-polar-bear/
  • 41. Even though the data had been released over 2 years earlier and cited in other papers- the main analysis paper was published in Cell
  • 42. Cell Press Journals had indicated publishing a dataset prior to publication could be considered as prior publication
  • 43. • New Sequencing technology • minION Oxford-Nanopore • New Sequence Data Type • EBI and NCBI Databases not ready • High community interest for testing data • >100 GB of data Real time use during the publication process • Uploaded prior to publication • Deployed on Amazon Cloud Front • Ongoing testing/comparison/information sharing prior to publication • When ready for data EBI used our cloud to upload data • EBI transferred the data to NCBI when they were ready
  • 45. Reproduce and Reuse Needs Much More • Data: GigaDB • Software: Github • Workflows – Galaxy – Executable Docs – VMs • Images: OMERO • Cloud storage, tools, and compute power… • Need this to reach the smaller labs github.com/gigascience/gigadb-cogini More Journals have or are starting to introduce these and other tools: More is needed…
  • 46. Currently… it feels like this… Well… …because it is like this
  • 47. If we want to move forward, we need to go through that to reach this: It will require researchers, institutions, publishers, and funders working together.
  • 48. Thanks to: Scott Edmunds, Executive Editor Nicole Nogoy, Commissioning Editor Peter Li, Lead Data Manager Chris Hunter, Lead BioCurator Rob Davidson, Data Scientist Xiao (Jesse) Si Zhe, Database Developer Amye Kenall, Journal Development Manager editorial@gigasciencejournal.com database@gigasciencejournal.com @GigaScience facebook.com/GigaScience blogs.openaccesscentral.com/blogs/gigablog Contact us: Follow us: www.gigasciencejournal.com www.gigadb.org

Editor's Notes

  1. Isn’t hyperbole fun?
  2. Raw data has been submitted to the SRA, the assembly submitted to GenBank (no number), SV data to dbVar (it’s the first plant data they’ve received). Complements the traditional public databases by having all these “extra” data types, it’s all in one place, and it’s citable.
  3. Raw data has been submitted to the SRA, the assembly submitted to GenBank (no number), SV data to dbVar (it’s the first plant data they’ve received). Complements the traditional public databases by having all these “extra” data types, it’s all in one place, and it’s citable.
  4. 44
  5. 45