Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways.

FAIRDOM
FAIRDOMFAIRDOM
Citing data in research articles:
principles, implementation, challenges
- and the benefits of changing our ways
Jo McEntyre
Europe PMC, EMBL-EBI
www.ebi.ac.uk
Life Science Data
Familiar Complexity!
Article‘Package’ExternalResources
“Recognized” data repos:
file|structured record,
Accession|DOI|API+ Accession
Institutional repos:
file|structured record,
URL|DOI|API+Accession
Author database|‘website’:
file|struct record,
URL|DOI|API+Accession
Supp info tables/data:
file, URL|DOI
Cross-reference
Dataset list
Ref to external
resRef to external
res
Reference list
Fig Source data:
file, URL|DOI
Fig (caption + graphic)
Cross-reference
Ref to external
resource
Adapted from Thomas Lemberger, EMBO
Europe PMC literature database
Europe PMC
• Abstracts: 30 million
• Full-text articles: 3 million
• Article citation counts
• Grants
• ORCIDs
• Semantic annotation
• Data citations
• Data integration
Europe PMC is a member of the PMC
International Collaboration.
Funded by 28 European funders of life science research
About EMBL-EBI
• Part of the European
Molecular Biology
Laboratory
• International, non-profit
research institute
• Europe’s hub for
biological data services
and research
Making data discoverable
Labs around the
world deposit
data and we…
Archive it
Classify it
Share it with
other data
providers
Analyse, add
value and
integrate it
…provide
tools to help
researchers
use it
A collaborative
enterprise
Journal Data Publishing
Data Citation in Europe PMC full text
Literature*
Added-Value
Submitted
*OMIM, Clinical trials, GO
Submission statements
vs reuse?
260K
Data Citation Principals Engender Two
Big Ideas
"sound, reproducible scholarship rests upon a
foundation of robust, accessible data"
"data should be considered legitimate, citable
products of research"
These slides are adapted from:
http://www.slideshare.net/joanstarr/data-citation-a-joint-declaration-
1 Importance
2 Credit and Attribution
3 Evidence
4 Unique Identification
5 Access
6 Persistence
7 Specificity and Verifiability
8 Interoperability and flexibility
Full Principles: https://www.force11.org/datacitation
Joint Declaration on Data Citation Principles
Joint Declaration
Data should be considered legitimate, citable
products of research. Data citations should be
accorded the same importance in the scholarly
record as citations of other research objects, such as
publications.
1. Importance
Data citations should facilitate giving scholarly credit
and normative and legal attribution to all contributors
to the data, recognizing that a single style or
mechanism of attribution may not be applicable to all
data.
2. Credit and Attribution
Joint Declaration
In scholarly literature, whenever and wherever
a claim relies upon data, the corresponding data
should be cited.
3. Evidence
Joint Declaration
A data citation should include a persistent method
for identification that is machine actionable, globally
unique, and widely used by a community.
4. Unique identification
etc.. !!!
Joint Declaration
Data citations should facilitate access to the data
themselves and to such associated metadata,
documentation, code, and other materials, as are
necessary for both humans and machines to make
informed use of the referenced data.
5. Access
Joint Declaration
Unique identifiers, and metadata describing the
data, and its disposition, should persist -- even
beyond the lifespan of the data they describe.
6. Persistence
Joint Declaration
Data citations should facilitate identification of,
access to, and verification of the specific data that
support a claim. Citations or citation metadata
should include information about provenance and
fixity sufficient to facilitate verifying that the specific
timeslice, version and/or granular portion of data
retrieved subsequently is the same as was
originally cited.
7. Specificity and Verifiability
Joint Declaration
Data citation methods should be sufficiently flexible
to accommodate the variant practices among
communities, but should not differ so much that they
compromise interoperability of data citation practices
across communities.
8. Interoperability and flexibility
Joint Declaration
Many organizational endorsements
An implementation example
Principle 2:
Credit and
Attribution
Principle 4, 5,
6:
Unique ID
Access
Persistence
Principle 7:
Specificity
and
Verifiability
Principle 8: Interoperability and flexibility
Creators, Year, Dataset Title, DOI, Data Repository, version
(Resolves to landing page with
access to metadata, docs, and
data)
Slide from
Mercè Crosas, Ph.D.
Harvard University
http://europepmc.org/articles/PMC3089613
Large dataset:
http://europepmc.org/articles/PMC3535838
http://europepmc.org/articles/PMC3766260
http://europepmc.org/articles/PMC3704603
http://europepmc.org/articles/PMC3710810
Fig. 2
!! 2469 references !!
http://europepmc.org/articles/PMC2672098
Examples of Implementations of Data Citations
in Reference Lists
http://europepmc.org/articles/PMC3661987
<mixed-citation publication-type="other">
Occurrence in reference list:
Occurrence in text:
Tagged in reference list as:
http://europepmc.org/articles/PMC3646594
<mixed-citation publication-type="thesis">
Occurrence in text:
Occurrence in reference list:
Tagged in reference list as:
http://europepmc.org/articles/PMC3722494
<mixed-citation publication-type="webpage">
Also in this reference list: a non-DOI data citation
Occurrence in text:
Occurrence in reference list:
Tagged in reference list as:
http://europepmc.org/articles/PMC3626513
<mixed-citation publication-type="journal">
Occurrence in text:
Occurrence in reference list:
Tagged in reference list as:
Cite data generated in
the course of the work
described?
JATS support for data citation
<mixed-citation publication-type='data'>
<name><surname>Heinz</surname><given-names>D.W.</given-
names></name>,
<name><surname>Baase</surname><given-names>W.A.</given-
names></name>,
<etal>et. al.</etal>
<data-title>How amino-acid insertions are allowed in an
alpha-helix of T4
lysozyme</data-title>.
<source>PDB Europe</source>,
accession <pub-id pub-id-type='accession' assigning-
authority='pdb'
xlink:href='http://www.ebi.ac.uk/pdbe/entry/search/index?te
xt:102L'>102l</pub-id>.
<pub-id pub-id-type='doi'
xlink:href='http://dx.doi.org/10.2210/pdb102l/pdb'>10.2210/
pdb102l/pdb</pub-id>
</mixed-citation>
Minimal, maximal & extensible citation
Resource
name
I
D
Resource
name
Resolution ‘template’ I
D
Author
list
Resource
name
Resolution
‘template’
I
D
Tim
e
? Author
list
Resource
name
Resolution
‘template’
I
D
Tim
e
?
For example:
new data vs pre-existing
data
For example:
version
Thomas Lemberger, EMBO
Integrated Research
Reused from: seier+seier,
Flickr
Reused from: Images
Money, Flickr
Articles
Data
People
Institutions
Funders
A data citation should include a persistent method
for identification that is machine actionable, globally
unique, and widely used by a community.
4. Unique identification
etc..
Joint Declaration
1. Discoverability through accessibility
• Deposit in a public/open database
• Where possible, structured archive (e.g. PDB,
ENA) >> unstructured archive (e.g. Zenodo,
Figshare)
• Uniquely identify it: PID, Accession number, DOI,
ROI
• Give it context: metadata (and more)
• All of the above = citable =
2. Discoverability through structured data
structured data is one of the true
enablers of life science
- Discovery of homology between genes across species
- Predicting function based on protein folds
• Structured data can be cross-analysed, compared by
algorithm, and encourages development of new products
and tools
Structured data is good value for money
Annual cost of generating new protein
structure data in labs around the world
Annual cost of
maintaining it
in a central
database
Degrees of Data
Unstructured/semi-
structured
Structured
Added Value
Metadata
A picture of a graph
A spreadsheet of my results
A record in a DNA
sequence
database
A graphical display of a genome
A narrative with
citations, pictures
and attachments
Article
Metadata – critical to discoverability
Generic: title, submitters, date, file format, version.
citation
basic search
Wagner F.F., 23-APR-2002, TPA: Homo sapiens SMP1
gene, RHD gene and RHCE gene, INSDC, 14-NOV-2006
(Rel. 89, Last updated, Version 7). BN000065
Specific: organism, tissue, assay, page number …
deep search
analysis
computation
BioStudyEBI
BioStudy database for unstructured data
Study
Publications
Ontologies
Data files
Other DBs
Metadata
Other DBs
Elixir: An international distributed infrastructure
for
• Data
• Standards
• Tools
• Compute
• Training
• Industry
THE END
1 of 43

Recommended

FAIR Data and Model Management for Systems Biology (and SOPs too!) by
FAIR Data and Model Management for Systems Biology(and SOPs too!)FAIR Data and Model Management for Systems Biology(and SOPs too!)
FAIR Data and Model Management for Systems Biology (and SOPs too!)Carole Goble
1.1K views50 slides
The FAIRDOM Commons for Systems Biology by
The FAIRDOM Commons for Systems BiologyThe FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems BiologyFAIRDOM
2.3K views29 slides
Reproducible and citable data and models: an introduction. by
Reproducible and citable data and models: an introduction.Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.FAIRDOM
4.2K views15 slides
Introduction to FAIRDOM by
Introduction to FAIRDOMIntroduction to FAIRDOM
Introduction to FAIRDOMCarole Goble
1.3K views41 slides
FAIR data and model management for systems biology. by
FAIR data and model management for systems biology.FAIR data and model management for systems biology.
FAIR data and model management for systems biology.FAIRDOM
1.6K views21 slides
Reproducibility (and the R*) of Science: motivations, challenges and trends by
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsCarole Goble
1.8K views37 slides

More Related Content

What's hot

Let’s go on a FAIR safari! by
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Carole Goble
1.4K views58 slides
Reproducible Research: how could Research Objects help by
Reproducible Research: how could Research Objects helpReproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects helpCarole Goble
605 views30 slides
FAIR Data, Operations and Model management for Systems Biology and Systems Me... by
FAIR Data, Operations and Model management for Systems Biology and Systems Me...FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...Carole Goble
1.5K views47 slides
What is Reproducibility? The R* brouhaha (and how Research Objects can help) by
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)Carole Goble
1.5K views34 slides
Being FAIR: FAIR data and model management SSBSS 2017 Summer School by
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
978 views65 slides
Better Software, Better Research by
Better Software, Better ResearchBetter Software, Better Research
Better Software, Better ResearchCarole Goble
657 views57 slides

What's hot(20)

Let’s go on a FAIR safari! by Carole Goble
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!
Carole Goble1.4K views
Reproducible Research: how could Research Objects help by Carole Goble
Reproducible Research: how could Research Objects helpReproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects help
Carole Goble605 views
FAIR Data, Operations and Model management for Systems Biology and Systems Me... by Carole Goble
FAIR Data, Operations and Model management for Systems Biology and Systems Me...FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
Carole Goble1.5K views
What is Reproducibility? The R* brouhaha (and how Research Objects can help) by Carole Goble
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
Carole Goble1.5K views
Being FAIR: FAIR data and model management SSBSS 2017 Summer School by Carole Goble
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Carole Goble978 views
Better Software, Better Research by Carole Goble
Better Software, Better ResearchBetter Software, Better Research
Better Software, Better Research
Carole Goble657 views
Report of the second FAIRDOM foundry by FAIRDOM
Report of the second FAIRDOM foundryReport of the second FAIRDOM foundry
Report of the second FAIRDOM foundry
FAIRDOM1.1K views
FAIRy stories: tales from building the FAIR Research Commons by Carole Goble
FAIRy stories: tales from building the FAIR Research CommonsFAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research Commons
Carole Goble1.4K views
Acs collaborative computational technologies for biomedical research an enabl... by Sean Ekins
Acs collaborative computational technologies for biomedical research an enabl...Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...
Sean Ekins1.3K views
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata... by Open Science Fair
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...
Open Science Fair203 views
ACS 248th Paper 71 ChAMP Project by Stuart Chalk
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP Project
Stuart Chalk1.2K views
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe... by Carole Goble
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
Carole Goble459 views
Improving the Management of Computational Models -- Invited talk at the EBI by Martin Scharm
Improving the Management of Computational Models -- Invited talk at the EBIImproving the Management of Computational Models -- Invited talk at the EBI
Improving the Management of Computational Models -- Invited talk at the EBI
Martin Scharm6K views
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ... by Carole Goble
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
Carole Goble866 views
Being Reproducible: SSBSS Summer School 2017 by Carole Goble
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
Carole Goble1.4K views
Research Shared: researchobject.org by Norman Morrison
Research Shared: researchobject.orgResearch Shared: researchobject.org
Research Shared: researchobject.org
Norman Morrison2K views
Trust and Accountability: experiences from the FAIRDOM Commons Initiative. by Carole Goble
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Carole Goble1.4K views
Reproducibility, Research Objects and Reality, Leiden 2016 by Carole Goble
Reproducibility, Research Objects and Reality, Leiden 2016Reproducibility, Research Objects and Reality, Leiden 2016
Reproducibility, Research Objects and Reality, Leiden 2016
Carole Goble1.1K views
Mtsr2015 goble-keynote by Carole Goble
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynote
Carole Goble1.5K views

Viewers also liked

Reproducibility of model-based results: standards, infrastructure, and recogn... by
Reproducibility of model-based results: standards, infrastructure, and recogn...Reproducibility of model-based results: standards, infrastructure, and recogn...
Reproducibility of model-based results: standards, infrastructure, and recogn...FAIRDOM
4K views40 slides
Capturing the context: one small(ish step for modellers, one giant leap for m... by
Capturing the context: one small(ish step for modellers, one giant leap for m...Capturing the context: one small(ish step for modellers, one giant leap for m...
Capturing the context: one small(ish step for modellers, one giant leap for m...FAIRDOM
3.4K views26 slides
FAIR data and model management for systems biology (and SOPs too!) by
FAIR data and model management for systems biology (and SOPs too!)FAIR data and model management for systems biology (and SOPs too!)
FAIR data and model management for systems biology (and SOPs too!)FAIRDOM
6.6K views50 slides
Improving the management of computational models. by
Improving the management of computational models.Improving the management of computational models.
Improving the management of computational models.FAIRDOM
4.1K views69 slides
Licensing, Citation and Sustainability. by
Licensing, Citation and Sustainability.Licensing, Citation and Sustainability.
Licensing, Citation and Sustainability.FAIRDOM
3.9K views18 slides
Advances in Scientific Workflow Environments by
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsCarole Goble
1.1K views41 slides

Viewers also liked(8)

Reproducibility of model-based results: standards, infrastructure, and recogn... by FAIRDOM
Reproducibility of model-based results: standards, infrastructure, and recogn...Reproducibility of model-based results: standards, infrastructure, and recogn...
Reproducibility of model-based results: standards, infrastructure, and recogn...
FAIRDOM4K views
Capturing the context: one small(ish step for modellers, one giant leap for m... by FAIRDOM
Capturing the context: one small(ish step for modellers, one giant leap for m...Capturing the context: one small(ish step for modellers, one giant leap for m...
Capturing the context: one small(ish step for modellers, one giant leap for m...
FAIRDOM3.4K views
FAIR data and model management for systems biology (and SOPs too!) by FAIRDOM
FAIR data and model management for systems biology (and SOPs too!)FAIR data and model management for systems biology (and SOPs too!)
FAIR data and model management for systems biology (and SOPs too!)
FAIRDOM6.6K views
Improving the management of computational models. by FAIRDOM
Improving the management of computational models.Improving the management of computational models.
Improving the management of computational models.
FAIRDOM4.1K views
Licensing, Citation and Sustainability. by FAIRDOM
Licensing, Citation and Sustainability.Licensing, Citation and Sustainability.
Licensing, Citation and Sustainability.
FAIRDOM3.9K views
Advances in Scientific Workflow Environments by Carole Goble
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
Carole Goble1.1K views
Research Objects, SEEK and FAIRDOM by Carole Goble
Research Objects, SEEK and FAIRDOMResearch Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOM
Carole Goble1.7K views
ERA CoBioTech Data Management Webinar by FAIRDOM
ERA CoBioTech Data Management WebinarERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management Webinar
FAIRDOM520 views

Similar to Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways.

How to make your published data findable, accessible, interoperable and reusable by
How to make your published data findable, accessible, interoperable and reusableHow to make your published data findable, accessible, interoperable and reusable
How to make your published data findable, accessible, interoperable and reusablePhoenix Bioinformatics
128 views48 slides
Scientific Data overview of Data Descriptors - WT Data-Literature integration... by
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Susanna-Assunta Sansone
1K views22 slides
Data citationworkshop idcc_2014 Altman by
Data citationworkshop idcc_2014 AltmanData citationworkshop idcc_2014 Altman
Data citationworkshop idcc_2014 AltmanMicah Altman
1.2K views15 slides
How to expose research data in EOSC by
How to expose research data in EOSCHow to expose research data in EOSC
How to expose research data in EOSCEUDAT
270 views41 slides
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data by
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataNIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataSusanna-Assunta Sansone
621 views41 slides
2013 CrossRef Workshops Citing Data Ed Pentz by
2013 CrossRef Workshops Citing Data Ed Pentz2013 CrossRef Workshops Citing Data Ed Pentz
2013 CrossRef Workshops Citing Data Ed PentzCrossref
878 views22 slides

Similar to Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways.(20)

How to make your published data findable, accessible, interoperable and reusable by Phoenix Bioinformatics
How to make your published data findable, accessible, interoperable and reusableHow to make your published data findable, accessible, interoperable and reusable
How to make your published data findable, accessible, interoperable and reusable
Scientific Data overview of Data Descriptors - WT Data-Literature integration... by Susanna-Assunta Sansone
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Data citationworkshop idcc_2014 Altman by Micah Altman
Data citationworkshop idcc_2014 AltmanData citationworkshop idcc_2014 Altman
Data citationworkshop idcc_2014 Altman
Micah Altman1.2K views
How to expose research data in EOSC by EUDAT
How to expose research data in EOSCHow to expose research data in EOSC
How to expose research data in EOSC
EUDAT270 views
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data by Susanna-Assunta Sansone
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataNIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
2013 CrossRef Workshops Citing Data Ed Pentz by Crossref
2013 CrossRef Workshops Citing Data Ed Pentz2013 CrossRef Workshops Citing Data Ed Pentz
2013 CrossRef Workshops Citing Data Ed Pentz
Crossref878 views
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data... by Merce Crosas
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Merce Crosas634 views
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014 by Susanna-Assunta Sansone
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
DataONE Education Module 08: Data Citation by DataONE
DataONE Education Module 08: Data CitationDataONE Education Module 08: Data Citation
DataONE Education Module 08: Data Citation
DataONE1K views
SHARE Update for CNI, Fall 2014 by SHARE
SHARE Update for CNI, Fall 2014SHARE Update for CNI, Fall 2014
SHARE Update for CNI, Fall 2014
SHARE267 views
Mduke sagecite-jisc-march11 by monicaduke
Mduke sagecite-jisc-march11Mduke sagecite-jisc-march11
Mduke sagecite-jisc-march11
monicaduke709 views
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS by Micah Altman
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALSBROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
Micah Altman601 views
Linking Data to Publications through Citation and Virtual Archives by Micah Altman
Linking Data to Publications through Citation and Virtual ArchivesLinking Data to Publications through Citation and Virtual Archives
Linking Data to Publications through Citation and Virtual Archives
Micah Altman1.1K views
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ... by Natsuko Nicholls
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Natsuko Nicholls721 views
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr... by SC CTSI at USC and CHLA
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...
A Data Citation Roadmap for Scholarly Data Repositories by LIBER Europe
A Data Citation Roadmap for Scholarly Data RepositoriesA Data Citation Roadmap for Scholarly Data Repositories
A Data Citation Roadmap for Scholarly Data Repositories
LIBER Europe649 views
Data CItation Principles in Practice by Anita de Waard
Data CItation Principles in PracticeData CItation Principles in Practice
Data CItation Principles in Practice
Anita de Waard1K views

Recently uploaded

Distinct distributions of elliptical and disk galaxies across the Local Super... by
Distinct distributions of elliptical and disk galaxies across the Local Super...Distinct distributions of elliptical and disk galaxies across the Local Super...
Distinct distributions of elliptical and disk galaxies across the Local Super...Sérgio Sacani
33 views12 slides
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F... by
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...SwagatBehera9
5 views36 slides
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance... by
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...InsideScientific
78 views62 slides
Exploring the nature and synchronicity of early cluster formation in the Larg... by
Exploring the nature and synchronicity of early cluster formation in the Larg...Exploring the nature and synchronicity of early cluster formation in the Larg...
Exploring the nature and synchronicity of early cluster formation in the Larg...Sérgio Sacani
910 views12 slides
MILK LIPIDS 2.pptx by
MILK LIPIDS 2.pptxMILK LIPIDS 2.pptx
MILK LIPIDS 2.pptxabhinambroze18
8 views15 slides
ELECTRON TRANSPORT CHAIN by
ELECTRON TRANSPORT CHAINELECTRON TRANSPORT CHAIN
ELECTRON TRANSPORT CHAINDEEKSHA RANI
10 views16 slides

Recently uploaded(20)

Distinct distributions of elliptical and disk galaxies across the Local Super... by Sérgio Sacani
Distinct distributions of elliptical and disk galaxies across the Local Super...Distinct distributions of elliptical and disk galaxies across the Local Super...
Distinct distributions of elliptical and disk galaxies across the Local Super...
Sérgio Sacani33 views
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F... by SwagatBehera9
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...
SwagatBehera95 views
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance... by InsideScientific
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...
InsideScientific78 views
Exploring the nature and synchronicity of early cluster formation in the Larg... by Sérgio Sacani
Exploring the nature and synchronicity of early cluster formation in the Larg...Exploring the nature and synchronicity of early cluster formation in the Larg...
Exploring the nature and synchronicity of early cluster formation in the Larg...
Sérgio Sacani910 views
ELECTRON TRANSPORT CHAIN by DEEKSHA RANI
ELECTRON TRANSPORT CHAINELECTRON TRANSPORT CHAIN
ELECTRON TRANSPORT CHAIN
DEEKSHA RANI10 views
Conventional and non-conventional methods for improvement of cucurbits.pptx by gandhi976
Conventional and non-conventional methods for improvement of cucurbits.pptxConventional and non-conventional methods for improvement of cucurbits.pptx
Conventional and non-conventional methods for improvement of cucurbits.pptx
gandhi97620 views
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ... by ILRI
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
ILRI5 views
별헤는 사람들 2023년 12월호 전명원 교수 자료 by sciencepeople
별헤는 사람들 2023년 12월호 전명원 교수 자료별헤는 사람들 2023년 12월호 전명원 교수 자료
별헤는 사람들 2023년 12월호 전명원 교수 자료
sciencepeople58 views
RemeOs science and clinical evidence by PetrusViitanen1
RemeOs science and clinical evidenceRemeOs science and clinical evidence
RemeOs science and clinical evidence
PetrusViitanen147 views
Applications of Large Language Models in Materials Discovery and Design by Anubhav Jain
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and Design
Anubhav Jain13 views
How to be(come) a successful PhD student by Tom Mens
How to be(come) a successful PhD studentHow to be(come) a successful PhD student
How to be(come) a successful PhD student
Tom Mens524 views
Open Access Publishing in Astrophysics by Peter Coles
Open Access Publishing in AstrophysicsOpen Access Publishing in Astrophysics
Open Access Publishing in Astrophysics
Peter Coles1.2K views
Structure of purines and pyrimidines - Jahnvi arora (11228108), mmdu ,mullana... by jahnviarora989
Structure of purines and pyrimidines - Jahnvi arora (11228108), mmdu ,mullana...Structure of purines and pyrimidines - Jahnvi arora (11228108), mmdu ,mullana...
Structure of purines and pyrimidines - Jahnvi arora (11228108), mmdu ,mullana...
jahnviarora9896 views
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe... by Anmol Vishnu Gupta
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...

Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways.

  • 1. Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways Jo McEntyre Europe PMC, EMBL-EBI www.ebi.ac.uk
  • 3. Familiar Complexity! Article‘Package’ExternalResources “Recognized” data repos: file|structured record, Accession|DOI|API+ Accession Institutional repos: file|structured record, URL|DOI|API+Accession Author database|‘website’: file|struct record, URL|DOI|API+Accession Supp info tables/data: file, URL|DOI Cross-reference Dataset list Ref to external resRef to external res Reference list Fig Source data: file, URL|DOI Fig (caption + graphic) Cross-reference Ref to external resource Adapted from Thomas Lemberger, EMBO
  • 4. Europe PMC literature database Europe PMC • Abstracts: 30 million • Full-text articles: 3 million • Article citation counts • Grants • ORCIDs • Semantic annotation • Data citations • Data integration Europe PMC is a member of the PMC International Collaboration. Funded by 28 European funders of life science research
  • 5. About EMBL-EBI • Part of the European Molecular Biology Laboratory • International, non-profit research institute • Europe’s hub for biological data services and research
  • 6. Making data discoverable Labs around the world deposit data and we… Archive it Classify it Share it with other data providers Analyse, add value and integrate it …provide tools to help researchers use it A collaborative enterprise
  • 8. Data Citation in Europe PMC full text Literature* Added-Value Submitted *OMIM, Clinical trials, GO Submission statements vs reuse? 260K
  • 9. Data Citation Principals Engender Two Big Ideas "sound, reproducible scholarship rests upon a foundation of robust, accessible data" "data should be considered legitimate, citable products of research" These slides are adapted from: http://www.slideshare.net/joanstarr/data-citation-a-joint-declaration-
  • 10. 1 Importance 2 Credit and Attribution 3 Evidence 4 Unique Identification 5 Access 6 Persistence 7 Specificity and Verifiability 8 Interoperability and flexibility Full Principles: https://www.force11.org/datacitation Joint Declaration on Data Citation Principles
  • 11. Joint Declaration Data should be considered legitimate, citable products of research. Data citations should be accorded the same importance in the scholarly record as citations of other research objects, such as publications. 1. Importance
  • 12. Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data, recognizing that a single style or mechanism of attribution may not be applicable to all data. 2. Credit and Attribution Joint Declaration
  • 13. In scholarly literature, whenever and wherever a claim relies upon data, the corresponding data should be cited. 3. Evidence Joint Declaration
  • 14. A data citation should include a persistent method for identification that is machine actionable, globally unique, and widely used by a community. 4. Unique identification etc.. !!! Joint Declaration
  • 15. Data citations should facilitate access to the data themselves and to such associated metadata, documentation, code, and other materials, as are necessary for both humans and machines to make informed use of the referenced data. 5. Access Joint Declaration
  • 16. Unique identifiers, and metadata describing the data, and its disposition, should persist -- even beyond the lifespan of the data they describe. 6. Persistence Joint Declaration
  • 17. Data citations should facilitate identification of, access to, and verification of the specific data that support a claim. Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice, version and/or granular portion of data retrieved subsequently is the same as was originally cited. 7. Specificity and Verifiability Joint Declaration
  • 18. Data citation methods should be sufficiently flexible to accommodate the variant practices among communities, but should not differ so much that they compromise interoperability of data citation practices across communities. 8. Interoperability and flexibility Joint Declaration
  • 20. An implementation example Principle 2: Credit and Attribution Principle 4, 5, 6: Unique ID Access Persistence Principle 7: Specificity and Verifiability Principle 8: Interoperability and flexibility Creators, Year, Dataset Title, DOI, Data Repository, version (Resolves to landing page with access to metadata, docs, and data) Slide from Mercè Crosas, Ph.D. Harvard University
  • 26. !! 2469 references !! http://europepmc.org/articles/PMC2672098
  • 27. Examples of Implementations of Data Citations in Reference Lists
  • 28. http://europepmc.org/articles/PMC3661987 <mixed-citation publication-type="other"> Occurrence in reference list: Occurrence in text: Tagged in reference list as:
  • 29. http://europepmc.org/articles/PMC3646594 <mixed-citation publication-type="thesis"> Occurrence in text: Occurrence in reference list: Tagged in reference list as:
  • 30. http://europepmc.org/articles/PMC3722494 <mixed-citation publication-type="webpage"> Also in this reference list: a non-DOI data citation Occurrence in text: Occurrence in reference list: Tagged in reference list as:
  • 31. http://europepmc.org/articles/PMC3626513 <mixed-citation publication-type="journal"> Occurrence in text: Occurrence in reference list: Tagged in reference list as: Cite data generated in the course of the work described?
  • 32. JATS support for data citation <mixed-citation publication-type='data'> <name><surname>Heinz</surname><given-names>D.W.</given- names></name>, <name><surname>Baase</surname><given-names>W.A.</given- names></name>, <etal>et. al.</etal> <data-title>How amino-acid insertions are allowed in an alpha-helix of T4 lysozyme</data-title>. <source>PDB Europe</source>, accession <pub-id pub-id-type='accession' assigning- authority='pdb' xlink:href='http://www.ebi.ac.uk/pdbe/entry/search/index?te xt:102L'>102l</pub-id>. <pub-id pub-id-type='doi' xlink:href='http://dx.doi.org/10.2210/pdb102l/pdb'>10.2210/ pdb102l/pdb</pub-id> </mixed-citation>
  • 33. Minimal, maximal & extensible citation Resource name I D Resource name Resolution ‘template’ I D Author list Resource name Resolution ‘template’ I D Tim e ? Author list Resource name Resolution ‘template’ I D Tim e ? For example: new data vs pre-existing data For example: version Thomas Lemberger, EMBO
  • 34. Integrated Research Reused from: seier+seier, Flickr Reused from: Images Money, Flickr Articles Data People Institutions Funders
  • 35. A data citation should include a persistent method for identification that is machine actionable, globally unique, and widely used by a community. 4. Unique identification etc.. Joint Declaration
  • 36. 1. Discoverability through accessibility • Deposit in a public/open database • Where possible, structured archive (e.g. PDB, ENA) >> unstructured archive (e.g. Zenodo, Figshare) • Uniquely identify it: PID, Accession number, DOI, ROI • Give it context: metadata (and more) • All of the above = citable =
  • 37. 2. Discoverability through structured data structured data is one of the true enablers of life science - Discovery of homology between genes across species - Predicting function based on protein folds • Structured data can be cross-analysed, compared by algorithm, and encourages development of new products and tools
  • 38. Structured data is good value for money Annual cost of generating new protein structure data in labs around the world Annual cost of maintaining it in a central database
  • 39. Degrees of Data Unstructured/semi- structured Structured Added Value Metadata A picture of a graph A spreadsheet of my results A record in a DNA sequence database A graphical display of a genome A narrative with citations, pictures and attachments Article
  • 40. Metadata – critical to discoverability Generic: title, submitters, date, file format, version. citation basic search Wagner F.F., 23-APR-2002, TPA: Homo sapiens SMP1 gene, RHD gene and RHCE gene, INSDC, 14-NOV-2006 (Rel. 89, Last updated, Version 7). BN000065 Specific: organism, tissue, assay, page number … deep search analysis computation
  • 41. BioStudyEBI BioStudy database for unstructured data Study Publications Ontologies Data files Other DBs Metadata Other DBs
  • 42. Elixir: An international distributed infrastructure for • Data • Standards • Tools • Compute • Training • Industry

Editor's Notes

  1. Image: https://www.flickr.com/photos/svenwerk/506579282 #1: Importance
  2. Image: http://www.flickr.com/photos/ggunson/16900719 #2: Credit and Attribution
  3. Image: http://www.flickr.com/photos/8395214@N06/2441779856 #3: Evidence
  4. Image: http://www.doi.org/ #4: Unique Identification
  5. Image: http://www.flickr.com/photos/mag3737/8755090129 #5: Access
  6. Image: http://www.flickr.com/photos/azwegers/6691014193 #6: Persistence
  7. Image: by Joan Starr #7: Specificity and Verifiability
  8. Image: Image: http://www.flickr.com/photos/29261875@N05/6410305335 #8: Interoperability and flexibility
  9. Image: http://www.doi.org/ #4: Unique Identification