SlideShare a Scribd company logo
1 of 48
Big Data Publishing,
Handling, & Reuse
Laurie Goodman, PhD
Editor-in-Chief, GigaScience
Laurie@gigasciencejournal.com
ORCID ID: 0000-0001-9724-5976
Beyond Data Release Mandates
What is the point of publishing?
• To disseminate
information/knowledge/ideas.
• To present material so it can be reasonably
assessed for its level of quality (and interest).
• To gain credit for career advancement.
What goes into a research article?
+ Area of Interest/
Question
What goes into a research article?
+ Area of Interest/
Question
Data & Metadata Collection
Analysis/Hypothesis/Analysis
Conclusions
What goes into a research article?
Analysis/Hypothesis/Analysis
Conclusions
+ Area of Interest/
Question
Data & Metadata Collection
Scientific Communication
Via Publication
• Scholarly articles are merely advertisement of scholarship .
The actual scholarly artefacts, i.e. the data and
computational methods, which support the
scholarship, remain largely inaccessible --- Jon B.
Buckheit and David L. Donoho, WaveLab and reproducible
research, 1995
• Core scientific statements or assertions are intertwined and
hidden in the conventional scholarly narratives
• Lack of transparency, lack of credit for anything other than
“regular” dead tree publication
Kahn, Goodman, & Mittleman. Dragging Scientific Publishing into the 21st Century 2014
http://genomebiology.com/2014/15/12/556
From Journal Delivery to PDF Delivery
Lack of Data and Software Availability
Impacts Reproducibility
1. Ioannidis et al., (2009). Repeatability of published microarray gene expression analyses. Nature Genetics 41: 14
2. Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8)
Out of 18 microarray papers, results
from 10 could not be reproduced
Retractions are on the Rise
>15X increase in last decade
1. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html
2. Retracted Science and the Retraction Index ▿ http://iai.asm.org/content/79/10/3855.abstract?
Deconstructing a paper into accessible,
useable, trackable, interlinked units
Need to provide credit to
reward sharing and proper
organization of:
• Narrative
• Data/Metadata
availability/curation
• Software availability
• Interoperability
• Availability of workflows
• Transparent analyses
Data/
MetaData
Software
Methods
Narrative
Deconstructing a paper into accessible,
useable, trackable, interlinked units
Currently we provide credit
for this:
• Narrative
• Data/Metadata
availability/curation
• Software availability
• Interoperability
• Availability of workflows
• Transparent analyses
Data/
MetaData
Software
Methods
Narrative
Sometimes we publish these
as Methods Papers
Beyond the Narrative
Data And Tools
Promoting Data Release
Data Citation
But- publishing ‘Data’ is “Salami Slicing”!!
What is Salami Slicing?
• Publishing research in several different papers that
should form a single cohesive paper
Why is it ‘unethical’
• It fragments the scientific literature, wasting
researcher’s time as they try to get all the information
related to a very specific topic/dataset/method
• It can give the appearance (given there are multiple
publications) that there is large support for a particular
hypothesis
• It pads a researcher’s publication record unfairly
Publishing ‘Data’ is “Salami Slicing”!
Baloney
1. Those guidelines were developed prior to the year 2000:
• More than 15 years ago: at a time when data set sizes and data
types collected in the life sciences by a single research group
were relatively small and primarily suitable for a single or narrow
range of disciplines or hypotheses.
• Most journals were not online (which allows easier identification
and access to closely related articles ) until the late ‘90s.
2. In 2005, COPE* ruled that a paper that had data that had been used
and described, at least in part, in a previous publication was not
unethical *Council of Publication Ethics. http://www.publicationethics.org/case/salami-publication
3. Data collection can be (should be!!) a scholarly pursuit:
• Data that is broadly reusable requires care, thought, training,
time, and money to be properly collected, curated, stored, and
shared.
Contrary to popular belief…
There are very few
—if any—
‘push-a-button-and-get-it’
reuseable data resources
Your not supposed to just collect samples!
*Collect ALL available metadata*
Help Develop a Digital Data Curation Team at your
Institution’s Library (they may already have one…)
Back to Darwin
Data & Metadata Collection/Experiments
Analysis/Hypothesis/Analysis
Conclusions
+ Area of Interest/
Question
1839
1859
20 Yrs.
Say… was this a Data Publication?
Data & Metadata Collection/Experiments
Analysis/Hypothesis/Analysis
Conclusions
+ Area of Interest/
Question
1839
1859
The most curious fact is the
perfect gradation in the
size of the beaks in the
different species of
Geospiza, from one as large
as that of a hawfinch to
that of a chaffinch, and (if
Mr. Gould is right in including his sub-group, Certhidea, in
the main group) even to that of a warbler. The largest beak
in the genus Geospiza is shown in Fig. 1, and the smallest in
Fig. 3; but instead of there being only one intermediate
species, with a beak of the size shown in Fig. 2, there are no
less than six species with insensibly graduated beaks.
(Chapter 17)
DataCite and DOIs
• Aims to “increase acceptance of research data as
legitimate, citable contributions to the scholarly
record”.
• “data generated in the course of research are
just as valuable to the ongoing academic
discourse as papers and monographs”.
Citing Data Isn’t New
The Physical Sciences have been doing this for a while…
What we’re doing:
Mandating and Aiding for Data Release
Requiring all data supporting work to be Freely available in a
publically available repository
– How we’re helping to do this:
• Journal-dedicated data and software repository GigaDB
that hosts ALL data types.
• Have Biocurators to aid in handling Metadata
• All Datasets are provided a Digital Object Identifier
(DOI) making them citable and countable
• All Material in GigaDB is available under a CC0 Waiver
• Data with a publically approved database must be
submitted there as well
• Provide Direct links to all associated information
Requiring all software and work to be Freely available in a
publically available repository
– How we’re promoting this:
• All software created by authors must be 100% OSI
compliant
• Journal-Dedicated repository GigaDB hosts software so
it can be downloaded.
• Software and Workflows are provided a DOI making
them citable and countable (reward)
• Journal-dedicated Galaxy Platform to run tools
• Have a Data Manager and Data Scientist to wrap and
deploy software tools
• Have our own Github Repository
What we’re doing
Mandating and Aiding Software Release
Data Sets in
GigaDB
Analyses/
Workflows in
GigaGalaxy
Paper in
GigaScience
(Narrative + Methods)
Open-access journal Data Publishing Platform
Data Computation Analysis Platform
How we view publishing at GigaScience
Making the Data Itself Citable
We provide a linked journal database- this is done to link the data
directly to our papers to ease reproducibility, make it available at the
time of review, and provide authors a place to submit data with no
sustainable ‘home’.
Note: there are many community available databases- so in principle-
any journal can do this by taking advantage of such available
resources.
These include the usual suspects: EBI, NCBI, DDBJ etc.
Databases that take all data types and provide Data DOIs: Dryad,
FigShare, etc.
There are also numerous smaller community databases specific to
different fields or data types.
Some of the Journals Currently
Doing Data Publication
http://proj.badc.rl.ac.uk/preparde/blog/DataJournalsList
Citing Data in the
References Allows Tracking
This rewards authors for making data
available AND makes it easier to find
But is this being done?
Yes:
Yes:
Is Cited
Data
Being
Tracked?
Yes:
Improving Quality as
Well as Availability
How Hard is Data and Software Review?
Not really that much harder than narrative
review.
Fail – submitter is
provided error report
Pass – dataset is
uploaded to
GigaDB.
Curator makes dataset public
(can be set as future date if
required)
DataCite
XML file
Submission
Submitter logs in to
GigaDB website and
uploads Excel submission
or uses online wizard
DOI
assigned
Files
Submitter provides
files by ftp or
Aspera
XML is generated and
registered with DataCite
Curator Review
Curator contacts submitter with
DOI citation and to arrange file
transfer (and resolve any other
questions/issues).
DOI 10.5524/100003
Genomic data from the
crab-eating
macaque/cynomolgus
monkey (Macaca
fascicularis) (2011)
Public GigaDB dataset
Data must be available for review with the manuscript
(and at the very least get a sanity check…)
Reviewing Data in More Detail
Issue: We can’t ask our reviewers to do that!
Our finding: Reviewers don’t mind
Reviewer Dr. Christophe Pouzat on neuroscience
manuscript:
“In addition to making the presented research
trustworthy, the reproducible research
paradigm definitely makes the reviewers job
more fun!”
Can also use specific Data Reviewers (we have)
Reviewing DataAND Software
Code in sourceforge under GPLv3:
http://soapdenovo2.sourceforge.net/>5000 downloads
http://homolog.us/wiki/index.php?title=SOAPdenovo2
Data sets
Analyses
Open-Paper
Open-Review
DOI:10.1186/2047-217X-1-18
>35,000 accesses
Open-Code
8 reviewers tested data in ftp server & named reports published
DOI:10.5524/100044
Open-Pipelines
Open-Workflows
DOI:10.5524/100038
Open-Data
78GB CC0 data
Enabled code to being picked apart by bloggers in wiki
8 Reviewers! Holy Cow- that must have
taken forever!!
Submission
July 24
Final review
Aug 28
These were
reviewing
teams from
different labs,
assessing the
materials at
multiple levels
Is this really worth the effort?
Beyond Reproducibility:
REUSE
Data Availability and Tools
http://blogs.biomedcentral.com/gigablog/2014/05/14/the-latest-weapon-in-publishing-
data-the-polar-bear/
These data
were released
THREE YEARS
before
publication of
the analysis
article
The polar bear DATA were released –prepublication- in 2011
They were used and cited in the following studies- before the main paper on the
sequencing was published
Hailer, F et al., Nuclear genomic sequences reveal that polar bears are an old and distinct
bear lineage. Science. 2012 Apr 20;336(6079):344-7. doi:10.1126/science.1216424.
Cahill, JA et al., Genomic evidence for island population conversion resolves conflicting
theories of polar bear evolution. PLoS Genet. 2013;9(3):e1003345.
doi:10.1371/journal.pgen.1003345.
Morgan, CC et al., Heterogeneous models place the root of the placental mammal
phylogeny. Mol Biol Evol. 2013 Sep;30(9):2145-56. doi:10.1093/molbev/mst117.
Cronin, MA et al., Molecular Phylogeny and SNP Variation of Polar Bears (Ursus
maritimus), Brown Bears (U. arctos), and Black Bears (U. americanus) Derived from
Genome Sequences. J Hered. 2014; 105(3):312-23. doi:10.1093/jhered/est133.
Bidon, T et al., Brown and Polar Bear Y Chromosomes Reveal Extensive Male-Biased Gene
Flow within Brother Lineages. Mol Biol Evol. 2014 Apr 4. doi:10.1093/molbev/msu109
http://blogs.biomedcentral.com/gigablog/2014/05/14/the-latest-weapon-in-publishing-data-the-polar-bear/
Even though the data had
been released over 2 years
earlier and cited in other
papers- the main analysis
paper was published in Cell
Cell Press Journals had indicated
publishing a dataset prior to publication
could be considered as prior publication
• New Sequencing technology
• minION Oxford-Nanopore
• New Sequence Data Type
• EBI and NCBI Databases not ready
• High community interest for testing
data
• >100 GB of data
Real time use during the
publication process
• Uploaded prior to publication
• Deployed on Amazon Cloud Front
• Ongoing
testing/comparison/information
sharing prior to publication
• When ready for data EBI used our
cloud to upload data
• EBI transferred the data to NCBI when
they were ready
Getting past…
…look but don't touch
Reproduce and Reuse Needs Much More
• Data: GigaDB
• Software: Github
• Workflows
– Galaxy
– Executable Docs
– VMs
• Images: OMERO
• Cloud storage, tools, and
compute power…
• Need this to reach the smaller
labs
github.com/gigascience/gigadb-cogini
More Journals have or are starting to introduce
these and other tools: More is needed…
Currently… it feels like this…
Well…
…because it is like this
If we want to
move
forward, we
need to go
through that
to reach this:
It will require
researchers,
institutions,
publishers,
and funders
working
together.
Thanks to:
Scott Edmunds, Executive Editor
Nicole Nogoy, Commissioning Editor
Peter Li, Lead Data Manager
Chris Hunter, Lead BioCurator
Rob Davidson, Data Scientist
Xiao (Jesse) Si Zhe, Database Developer
Amye Kenall, Journal Development Manager
editorial@gigasciencejournal.com
database@gigasciencejournal.com
@GigaScience
facebook.com/GigaScience
blogs.openaccesscentral.com/blogs/gigablog
Contact us:
Follow us:
www.gigasciencejournal.com
www.gigadb.org

More Related Content

What's hot

2012 Fall Data Management Planning Workshop
2012 Fall Data Management Planning Workshop2012 Fall Data Management Planning Workshop
2012 Fall Data Management Planning WorkshopLizzy_Rolando
 
DataONE Education Module 08: Data Citation
DataONE Education Module 08: Data CitationDataONE Education Module 08: Data Citation
DataONE Education Module 08: Data CitationDataONE
 
The DataTags System: Sharing Sensitive Data with Confidence
The DataTags System: Sharing Sensitive Data with ConfidenceThe DataTags System: Sharing Sensitive Data with Confidence
The DataTags System: Sharing Sensitive Data with ConfidenceMerce Crosas
 
20160414 23 Research Data Things
20160414 23 Research Data Things20160414 23 Research Data Things
20160414 23 Research Data ThingsKatina Toufexis
 
Managing, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital EnvironmentManaging, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital Environmentphilipdurbin
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceCarole Goble
 
Preparing your data for sharing and publishing
Preparing your data for sharing and publishingPreparing your data for sharing and publishing
Preparing your data for sharing and publishingVarsha Khodiyar
 
DataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE
 
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinai
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinaiDataTags: Sharing Privacy Sensitive Data by Michael Bar-sinai
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinaidatascienceiqss
 
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALSBROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALSMicah Altman
 
dkNET Poster ENDO 2016
dkNET Poster ENDO 2016 dkNET Poster ENDO 2016
dkNET Poster ENDO 2016 dkNET
 
David Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordDavid Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordJisc
 
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital TextsCase Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital TextsBeth Plale
 
Publishing perspectives on data management & future directions
Publishing perspectives on data management & future directionsPublishing perspectives on data management & future directions
Publishing perspectives on data management & future directionsARDC
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...Robert Grossman
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...Projeto RCAAP
 
DataONE Education Module 01: Why Data Management?
DataONE Education Module 01: Why Data Management?DataONE Education Module 01: Why Data Management?
DataONE Education Module 01: Why Data Management?DataONE
 
HathiTrust Research Center Secure Commons
HathiTrust Research Center Secure CommonsHathiTrust Research Center Secure Commons
HathiTrust Research Center Secure CommonsBeth Plale
 

What's hot (20)

2012 Fall Data Management Planning Workshop
2012 Fall Data Management Planning Workshop2012 Fall Data Management Planning Workshop
2012 Fall Data Management Planning Workshop
 
DataONE Education Module 08: Data Citation
DataONE Education Module 08: Data CitationDataONE Education Module 08: Data Citation
DataONE Education Module 08: Data Citation
 
The DataTags System: Sharing Sensitive Data with Confidence
The DataTags System: Sharing Sensitive Data with ConfidenceThe DataTags System: Sharing Sensitive Data with Confidence
The DataTags System: Sharing Sensitive Data with Confidence
 
20160414 23 Research Data Things
20160414 23 Research Data Things20160414 23 Research Data Things
20160414 23 Research Data Things
 
Managing, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital EnvironmentManaging, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital Environment
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
Preparing your data for sharing and publishing
Preparing your data for sharing and publishingPreparing your data for sharing and publishing
Preparing your data for sharing and publishing
 
DataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data Sharing
 
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinai
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinaiDataTags: Sharing Privacy Sensitive Data by Michael Bar-sinai
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinai
 
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALSBROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
 
295 king
295 king295 king
295 king
 
dkNET Poster ENDO 2016
dkNET Poster ENDO 2016 dkNET Poster ENDO 2016
dkNET Poster ENDO 2016
 
David Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordDavid Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published record
 
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital TextsCase Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
 
McGeary Data Curation Network: Developing and Scaling
McGeary Data Curation Network: Developing and ScalingMcGeary Data Curation Network: Developing and Scaling
McGeary Data Curation Network: Developing and Scaling
 
Publishing perspectives on data management & future directions
Publishing perspectives on data management & future directionsPublishing perspectives on data management & future directions
Publishing perspectives on data management & future directions
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...
 
DataONE Education Module 01: Why Data Management?
DataONE Education Module 01: Why Data Management?DataONE Education Module 01: Why Data Management?
DataONE Education Module 01: Why Data Management?
 
HathiTrust Research Center Secure Commons
HathiTrust Research Center Secure CommonsHathiTrust Research Center Secure Commons
HathiTrust Research Center Secure Commons
 

Similar to Laurie Goodman at NDIC: Big Data Publishing, Handling & Reuse

HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 Scott Edmunds
 
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Anita de Waard
 
Human Genome and Big Data Challenges
Human Genome and Big Data ChallengesHuman Genome and Big Data Challenges
Human Genome and Big Data ChallengesPhilip Bourne
 
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...The University of Edinburgh
 
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseAnita de Waard
 
How and Why to Share Your Data
How and Why to Share Your DataHow and Why to Share Your Data
How and Why to Share Your Datakfear
 
2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorialJosh Young
 
Laurie Goodman at #SSPBoston: Article+Data+Tools Reproducibility, Reuse, & Ra...
Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Ra...Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Ra...
Laurie Goodman at #SSPBoston: Article+Data+Tools Reproducibility, Reuse, & Ra...GigaScience, BGI Hong Kong
 
Fsci 2018 thursday2_august_am6
Fsci 2018 thursday2_august_am6Fsci 2018 thursday2_august_am6
Fsci 2018 thursday2_august_am6ARDC
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data LocallyErin D. Foster
 
Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...GarethKnight
 
Data Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach DataData Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach Datacunera
 
Data Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumData Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumMerce Crosas
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and LibariesRob Grim
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data ManagementAnita de Waard
 

Similar to Laurie Goodman at NDIC: Big Data Publishing, Handling & Reuse (20)

Research-Data-Management-and-your-PhD
Research-Data-Management-and-your-PhDResearch-Data-Management-and-your-PhD
Research-Data-Management-and-your-PhD
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
 
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
 
Human Genome and Big Data Challenges
Human Genome and Big Data ChallengesHuman Genome and Big Data Challenges
Human Genome and Big Data Challenges
 
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
 
Research data life cycle
Research data life cycleResearch data life cycle
Research data life cycle
 
Where's the Data?
Where's the Data?Where's the Data?
Where's the Data?
 
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
 
How and Why to Share Your Data
How and Why to Share Your DataHow and Why to Share Your Data
How and Why to Share Your Data
 
Intro to RDM
Intro to RDMIntro to RDM
Intro to RDM
 
2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial
 
Laurie Goodman at #SSPBoston: Article+Data+Tools Reproducibility, Reuse, & Ra...
Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Ra...Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Ra...
Laurie Goodman at #SSPBoston: Article+Data+Tools Reproducibility, Reuse, & Ra...
 
Fsci 2018 thursday2_august_am6
Fsci 2018 thursday2_august_am6Fsci 2018 thursday2_august_am6
Fsci 2018 thursday2_august_am6
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data Locally
 
Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...
 
Data Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach DataData Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach Data
 
Data Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumData Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access Symposium
 
Critical infrastructure to promote data synthesis
Critical infrastructure to promote data synthesis Critical infrastructure to promote data synthesis
Critical infrastructure to promote data synthesis
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and Libaries
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data Management
 

More from GigaScience, BGI Hong Kong

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...GigaScience, BGI Hong Kong
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteGigaScience, BGI Hong Kong
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...GigaScience, BGI Hong Kong
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...GigaScience, BGI Hong Kong
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...GigaScience, BGI Hong Kong
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...GigaScience, BGI Hong Kong
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...GigaScience, BGI Hong Kong
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...GigaScience, BGI Hong Kong
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...GigaScience, BGI Hong Kong
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixGigaScience, BGI Hong Kong
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserGigaScience, BGI Hong Kong
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...GigaScience, BGI Hong Kong
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceGigaScience, BGI Hong Kong
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...GigaScience, BGI Hong Kong
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...GigaScience, BGI Hong Kong
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveGigaScience, BGI Hong Kong
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...GigaScience, BGI Hong Kong
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...GigaScience, BGI Hong Kong
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...GigaScience, BGI Hong Kong
 

More from GigaScience, BGI Hong Kong (20)

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByte
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
 

Recently uploaded

Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body Areesha Ahmad
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptxCherry
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLkantirani197
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Cherry
 
Plasmid: types, structure and functions.
Plasmid: types, structure and functions.Plasmid: types, structure and functions.
Plasmid: types, structure and functions.Cherry
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsSérgio Sacani
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...Monika Rani
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxANSARKHAN96
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.Cherry
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...Scintica Instrumentation
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Cherry
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsbassianu17
 
Genome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptxGenome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptxCherry
 

Recently uploaded (20)

Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Plasmid: types, structure and functions.
Plasmid: types, structure and functions.Plasmid: types, structure and functions.
Plasmid: types, structure and functions.
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
Genome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptxGenome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptx
 

Laurie Goodman at NDIC: Big Data Publishing, Handling & Reuse

  • 1. Big Data Publishing, Handling, & Reuse Laurie Goodman, PhD Editor-in-Chief, GigaScience Laurie@gigasciencejournal.com ORCID ID: 0000-0001-9724-5976 Beyond Data Release Mandates
  • 2. What is the point of publishing? • To disseminate information/knowledge/ideas. • To present material so it can be reasonably assessed for its level of quality (and interest). • To gain credit for career advancement.
  • 3. What goes into a research article? + Area of Interest/ Question
  • 4. What goes into a research article? + Area of Interest/ Question Data & Metadata Collection Analysis/Hypothesis/Analysis Conclusions
  • 5. What goes into a research article? Analysis/Hypothesis/Analysis Conclusions + Area of Interest/ Question Data & Metadata Collection
  • 6. Scientific Communication Via Publication • Scholarly articles are merely advertisement of scholarship . The actual scholarly artefacts, i.e. the data and computational methods, which support the scholarship, remain largely inaccessible --- Jon B. Buckheit and David L. Donoho, WaveLab and reproducible research, 1995 • Core scientific statements or assertions are intertwined and hidden in the conventional scholarly narratives • Lack of transparency, lack of credit for anything other than “regular” dead tree publication
  • 7. Kahn, Goodman, & Mittleman. Dragging Scientific Publishing into the 21st Century 2014 http://genomebiology.com/2014/15/12/556 From Journal Delivery to PDF Delivery
  • 8. Lack of Data and Software Availability Impacts Reproducibility 1. Ioannidis et al., (2009). Repeatability of published microarray gene expression analyses. Nature Genetics 41: 14 2. Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8) Out of 18 microarray papers, results from 10 could not be reproduced
  • 9. Retractions are on the Rise >15X increase in last decade 1. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html 2. Retracted Science and the Retraction Index ▿ http://iai.asm.org/content/79/10/3855.abstract?
  • 10. Deconstructing a paper into accessible, useable, trackable, interlinked units Need to provide credit to reward sharing and proper organization of: • Narrative • Data/Metadata availability/curation • Software availability • Interoperability • Availability of workflows • Transparent analyses Data/ MetaData Software Methods Narrative
  • 11. Deconstructing a paper into accessible, useable, trackable, interlinked units Currently we provide credit for this: • Narrative • Data/Metadata availability/curation • Software availability • Interoperability • Availability of workflows • Transparent analyses Data/ MetaData Software Methods Narrative Sometimes we publish these as Methods Papers
  • 14. But- publishing ‘Data’ is “Salami Slicing”!! What is Salami Slicing? • Publishing research in several different papers that should form a single cohesive paper Why is it ‘unethical’ • It fragments the scientific literature, wasting researcher’s time as they try to get all the information related to a very specific topic/dataset/method • It can give the appearance (given there are multiple publications) that there is large support for a particular hypothesis • It pads a researcher’s publication record unfairly
  • 15. Publishing ‘Data’ is “Salami Slicing”! Baloney 1. Those guidelines were developed prior to the year 2000: • More than 15 years ago: at a time when data set sizes and data types collected in the life sciences by a single research group were relatively small and primarily suitable for a single or narrow range of disciplines or hypotheses. • Most journals were not online (which allows easier identification and access to closely related articles ) until the late ‘90s. 2. In 2005, COPE* ruled that a paper that had data that had been used and described, at least in part, in a previous publication was not unethical *Council of Publication Ethics. http://www.publicationethics.org/case/salami-publication 3. Data collection can be (should be!!) a scholarly pursuit: • Data that is broadly reusable requires care, thought, training, time, and money to be properly collected, curated, stored, and shared.
  • 16. Contrary to popular belief… There are very few —if any— ‘push-a-button-and-get-it’ reuseable data resources
  • 17. Your not supposed to just collect samples! *Collect ALL available metadata* Help Develop a Digital Data Curation Team at your Institution’s Library (they may already have one…)
  • 18. Back to Darwin Data & Metadata Collection/Experiments Analysis/Hypothesis/Analysis Conclusions + Area of Interest/ Question 1839 1859 20 Yrs.
  • 19. Say… was this a Data Publication? Data & Metadata Collection/Experiments Analysis/Hypothesis/Analysis Conclusions + Area of Interest/ Question 1839 1859 The most curious fact is the perfect gradation in the size of the beaks in the different species of Geospiza, from one as large as that of a hawfinch to that of a chaffinch, and (if Mr. Gould is right in including his sub-group, Certhidea, in the main group) even to that of a warbler. The largest beak in the genus Geospiza is shown in Fig. 1, and the smallest in Fig. 3; but instead of there being only one intermediate species, with a beak of the size shown in Fig. 2, there are no less than six species with insensibly graduated beaks. (Chapter 17)
  • 20. DataCite and DOIs • Aims to “increase acceptance of research data as legitimate, citable contributions to the scholarly record”. • “data generated in the course of research are just as valuable to the ongoing academic discourse as papers and monographs”. Citing Data Isn’t New The Physical Sciences have been doing this for a while…
  • 21.
  • 22. What we’re doing: Mandating and Aiding for Data Release Requiring all data supporting work to be Freely available in a publically available repository – How we’re helping to do this: • Journal-dedicated data and software repository GigaDB that hosts ALL data types. • Have Biocurators to aid in handling Metadata • All Datasets are provided a Digital Object Identifier (DOI) making them citable and countable • All Material in GigaDB is available under a CC0 Waiver • Data with a publically approved database must be submitted there as well • Provide Direct links to all associated information
  • 23. Requiring all software and work to be Freely available in a publically available repository – How we’re promoting this: • All software created by authors must be 100% OSI compliant • Journal-Dedicated repository GigaDB hosts software so it can be downloaded. • Software and Workflows are provided a DOI making them citable and countable (reward) • Journal-dedicated Galaxy Platform to run tools • Have a Data Manager and Data Scientist to wrap and deploy software tools • Have our own Github Repository What we’re doing Mandating and Aiding Software Release
  • 24. Data Sets in GigaDB Analyses/ Workflows in GigaGalaxy Paper in GigaScience (Narrative + Methods) Open-access journal Data Publishing Platform Data Computation Analysis Platform How we view publishing at GigaScience
  • 25. Making the Data Itself Citable We provide a linked journal database- this is done to link the data directly to our papers to ease reproducibility, make it available at the time of review, and provide authors a place to submit data with no sustainable ‘home’. Note: there are many community available databases- so in principle- any journal can do this by taking advantage of such available resources. These include the usual suspects: EBI, NCBI, DDBJ etc. Databases that take all data types and provide Data DOIs: Dryad, FigShare, etc. There are also numerous smaller community databases specific to different fields or data types.
  • 26. Some of the Journals Currently Doing Data Publication http://proj.badc.rl.ac.uk/preparde/blog/DataJournalsList
  • 27. Citing Data in the References Allows Tracking This rewards authors for making data available AND makes it easier to find But is this being done?
  • 28.
  • 29. Yes:
  • 30.
  • 31. Yes:
  • 33. Improving Quality as Well as Availability How Hard is Data and Software Review? Not really that much harder than narrative review.
  • 34. Fail – submitter is provided error report Pass – dataset is uploaded to GigaDB. Curator makes dataset public (can be set as future date if required) DataCite XML file Submission Submitter logs in to GigaDB website and uploads Excel submission or uses online wizard DOI assigned Files Submitter provides files by ftp or Aspera XML is generated and registered with DataCite Curator Review Curator contacts submitter with DOI citation and to arrange file transfer (and resolve any other questions/issues). DOI 10.5524/100003 Genomic data from the crab-eating macaque/cynomolgus monkey (Macaca fascicularis) (2011) Public GigaDB dataset Data must be available for review with the manuscript (and at the very least get a sanity check…)
  • 35. Reviewing Data in More Detail Issue: We can’t ask our reviewers to do that! Our finding: Reviewers don’t mind Reviewer Dr. Christophe Pouzat on neuroscience manuscript: “In addition to making the presented research trustworthy, the reproducible research paradigm definitely makes the reviewers job more fun!” Can also use specific Data Reviewers (we have)
  • 36. Reviewing DataAND Software Code in sourceforge under GPLv3: http://soapdenovo2.sourceforge.net/>5000 downloads http://homolog.us/wiki/index.php?title=SOAPdenovo2 Data sets Analyses Open-Paper Open-Review DOI:10.1186/2047-217X-1-18 >35,000 accesses Open-Code 8 reviewers tested data in ftp server & named reports published DOI:10.5524/100044 Open-Pipelines Open-Workflows DOI:10.5524/100038 Open-Data 78GB CC0 data Enabled code to being picked apart by bloggers in wiki
  • 37. 8 Reviewers! Holy Cow- that must have taken forever!! Submission July 24 Final review Aug 28 These were reviewing teams from different labs, assessing the materials at multiple levels
  • 38. Is this really worth the effort? Beyond Reproducibility: REUSE Data Availability and Tools
  • 40. The polar bear DATA were released –prepublication- in 2011 They were used and cited in the following studies- before the main paper on the sequencing was published Hailer, F et al., Nuclear genomic sequences reveal that polar bears are an old and distinct bear lineage. Science. 2012 Apr 20;336(6079):344-7. doi:10.1126/science.1216424. Cahill, JA et al., Genomic evidence for island population conversion resolves conflicting theories of polar bear evolution. PLoS Genet. 2013;9(3):e1003345. doi:10.1371/journal.pgen.1003345. Morgan, CC et al., Heterogeneous models place the root of the placental mammal phylogeny. Mol Biol Evol. 2013 Sep;30(9):2145-56. doi:10.1093/molbev/mst117. Cronin, MA et al., Molecular Phylogeny and SNP Variation of Polar Bears (Ursus maritimus), Brown Bears (U. arctos), and Black Bears (U. americanus) Derived from Genome Sequences. J Hered. 2014; 105(3):312-23. doi:10.1093/jhered/est133. Bidon, T et al., Brown and Polar Bear Y Chromosomes Reveal Extensive Male-Biased Gene Flow within Brother Lineages. Mol Biol Evol. 2014 Apr 4. doi:10.1093/molbev/msu109 http://blogs.biomedcentral.com/gigablog/2014/05/14/the-latest-weapon-in-publishing-data-the-polar-bear/
  • 41. Even though the data had been released over 2 years earlier and cited in other papers- the main analysis paper was published in Cell
  • 42. Cell Press Journals had indicated publishing a dataset prior to publication could be considered as prior publication
  • 43. • New Sequencing technology • minION Oxford-Nanopore • New Sequence Data Type • EBI and NCBI Databases not ready • High community interest for testing data • >100 GB of data Real time use during the publication process • Uploaded prior to publication • Deployed on Amazon Cloud Front • Ongoing testing/comparison/information sharing prior to publication • When ready for data EBI used our cloud to upload data • EBI transferred the data to NCBI when they were ready
  • 45. Reproduce and Reuse Needs Much More • Data: GigaDB • Software: Github • Workflows – Galaxy – Executable Docs – VMs • Images: OMERO • Cloud storage, tools, and compute power… • Need this to reach the smaller labs github.com/gigascience/gigadb-cogini More Journals have or are starting to introduce these and other tools: More is needed…
  • 46. Currently… it feels like this… Well… …because it is like this
  • 47. If we want to move forward, we need to go through that to reach this: It will require researchers, institutions, publishers, and funders working together.
  • 48. Thanks to: Scott Edmunds, Executive Editor Nicole Nogoy, Commissioning Editor Peter Li, Lead Data Manager Chris Hunter, Lead BioCurator Rob Davidson, Data Scientist Xiao (Jesse) Si Zhe, Database Developer Amye Kenall, Journal Development Manager editorial@gigasciencejournal.com database@gigasciencejournal.com @GigaScience facebook.com/GigaScience blogs.openaccesscentral.com/blogs/gigablog Contact us: Follow us: www.gigasciencejournal.com www.gigadb.org

Editor's Notes

  1. Isn’t hyperbole fun?
  2. Raw data has been submitted to the SRA, the assembly submitted to GenBank (no number), SV data to dbVar (it’s the first plant data they’ve received). Complements the traditional public databases by having all these “extra” data types, it’s all in one place, and it’s citable.
  3. Raw data has been submitted to the SRA, the assembly submitted to GenBank (no number), SV data to dbVar (it’s the first plant data they’ve received). Complements the traditional public databases by having all these “extra” data types, it’s all in one place, and it’s citable.
  4. 44
  5. 45