SlideShare a Scribd company logo
1 of 35
Download to read offline
Joint Declaration of Data Citation
Principles
© 2015 Massachusetts General Hospital
and FORCE11.org
Tim Clark, Ph.D.
Assistant Professor of Neurology
Massachusetts General Hospital & Harvard Medical School
June 9, 2015
reproducibility crisis
Non-reproduciblity
11%
Begley CG and Ellis LM, Nature 2012, 483(7391):531-533
Transparency and
Reproducibility
• Transparency is the basis of reproducibility
• What we are aiming for is robust science
• Validation from multiple orthogonal viewpoints
• Focus on transparent communication of results
Joint Declaration of Data Citation Principles
endorsed by over 90 scholarly organizations
The Brief JDDCP
1. Importance. Data are first-
class objects.
2. Credit. Support citing all
contributors to the data.
3. Evidence. Assertions must
be traceable to evidence.
4. Unique ID. Cited datasets
must have resolvable IDs.
5. Access. Data must be
robustly archived.
6. Persistence. Metadata must
persist even after data is gone.
7. Specificity & Verifiability.
Get same dynamic time-slice.
8. Interoperable & flexible.
Give cross-community support.
How to implement
JDDCP?
JDDCP
Archival, id
& retrieval
Document
model
Archival &
retrieval
Archival &
retrieval
Identification
Common
APIs
Workflows
Metadata
repositories
social science
biomedicine
earth science
climatology
scholarly publishing
scholarly publishing
web standards
scientific data standards
astronomy
scholarly publishing
physics
academic libraries
data science
software technology
physics
scholarly publishing
biomedicine
Archival &
retrieval
Human and machine accessibility of
cited data in scholarly publications
© 2015 Massachusetts General Hospital
and FORCE11.org
Tim Clark, Ph.D.
Assistant Professor of Neurology
Massachusetts General Hospital & Harvard Medical School
June 9, 2015
or, how to store and access cited
data to radically improve scholarly
transparency - and so that BOTH
humans and machines are happy.
PeerJ Computer Science 1:e1. https://dx.doi.org/10.7717/peerj-cs.1
Basic guidelines
1. Cite data as you would cite publications.
2. Deposit data in an archival-quality repository.
3. Use an identifier scheme meeting JDDCP
criteria.
4. Identifiers should resolve to a landing page,
not directly to the data.
5. Landing pages describe the data in both
human and machine readable form.
Basic guidelines (contd.)
6. Landing page & data retention may differ.
7. Repositories should provide specific
guarantee of landing page persistence.
8. Landing pages should provide both human
and machine interpretable information.
9. Provide web service accessibility.
10. Stakeholder responsibilities for ecosystem.
1. Cite data as you would
cite publications
• Strongly preferred:
• Use the NISO JATS revision 1.1d2 XML schema
• Interim (less good) alternative:
• Use own XML schema, but do what JATS does.
2. Deposit data in archival
quality repositories
Examples:
• NIH and EBI bioscience repositories;
• Standard earth/space/physical science repositories;
• Dataverse, Dryad, Figshare, Zenodo; etc.
Unacceptable:
• “Available on my laboratory website”.
3. Use an ID scheme that meets
JDDCP criteria (4-6)
Any currently‐available identifier scheme that is:
• Machine actionable,
• Globally unique,
• Widely used by a community, and
• Has a long term commitment to persistence
Best practice:
• use a scheme that is cross-discipline, such as
DOI.
Machine accessibility
Machine accessibility in this context means:
“access by well-documented Web services—preferably
RESTful Web services—to data and metadata stored in
a robust repository, independently of integrated browser
access by humans.”
Commitment to persistence
If a resolving authority is required, that authority has
demonstrated a reasonable chance to be present and
functional in the future;
Owner of the domain or the resolving authority has
made a credible commitment to ensure that its
identifiers will always resolve.
A useful survey of persistent identifier schemes
appears in Hilse & Kothe (2006).
• Digital Object Identifiers (DOIs)
4. Identifiers should resolve to a
landing page, not directly to data
Because:
• Data may be de-accessioned, like books, but
the description of thing cited should remain;
• Data may be restricted (e.g. Protected Health
Information; specially-licensed data; etc.);
• Data may be VERY large and user needs to
be able to decide whether to download or not.
• Content negotiation for machine access!
5. Landing pages describe the data
Best practices:
• Identifier, title, description, creator,
publisher/contact, publication/release date,
version.
Additional:
• Creator identifier (e.g. ORCID), license
Content encoding:
• HTML; plus…
• At least one non-proprietary machine-readable
format, e.g. XML, JSON/JSON-LD, RDF,
microformats, microdata, RDFa,…
Serving the landing pages
“To enable automated agents to extract the metadata
these landing pages should include an HTML <link>
element specifying a machine readable form of the
page as an alternative.”
“For those that are capable of doing so, we
recommend also using Web Linking (Nottingham,
2010) to provide this information from all of the
alternative formats.”
6. Landing page retention may differ
from data retention
Because:
• Repositories cannot commit to keeping
arbitrary and possibly very large volumes of
data forever!
• But when data is de-accessioned, the citation
identifier must not give a 404 error.
• Retain awareness of what was cited even if it
is not currently extant in a particular repository.
7. Repositories should provide a
specific guarantee of persistence for
landing pages
Model guarantee language:
“[Organization/Institution Name] is committed to maintaining
persistent identifiers in [Repository Name] so that they will
continue to resolve to a landing page providing metadata
describing the data, including elements of stewardship,
provenance, and availability.
[Organization/Institution Name] has made the following plan
for organizational persistence and succession [plan]
8. Landing pages should provide
both human and machine
interpretable information.
Because:
• Mash-ups and distributed search.
• Apps that you haven’t yet thought of.
• Web services.
Examples of machine interpretable info:
•.RDF, RDFa, XML, microformats, JSON-LD,
etc.
9. Provide web service accessibility
Because:
• Service composition, new apps, etc.
Best practice:
•.RESTful web service, because this is a data-
oriented application and required functionality.
Much less good practice:
• SOAP, because SOAP is process-oriented.
10. Stakeholder
responsibilities
• Archives and repositories: Ids, resolution, landing
page metadata, dataset description, data access
methods conform to these recommendations.
• Registries of repositories: Document conformance.
• Researchers: Treat data as first-class objects.
• Funders, scholarly societies, academic institutions:
Strongly encourage conformance to best practices.
Summary
• Use NISO JATS 1.1d2 to publish & archive documents.
• Cite datasets as if they were publications and deposit
datasets in archival repositories.
• Follow human & machine accessibility guidelines as
presented above in points 3 through 9.
• Adhere to stakeholder responsibilities as in point 10.
• Welcome to the future of scholarly publishing!
Acknowledgements
• Joan Starr, California Digital Library
• other co-authors of the “Achieving Human and Machine
Accessibility” publication
• FORCE11 Data Citation Implementation Group
• Maryann Martone, UCSD & FORCE11
• John Kunze, California Digital Library
• Harry Hochheiser, University of Pittsburgh
• Phil Bourne, NIH Data Science Directorate
Questions?

More Related Content

What's hot

What's hot (20)

Persistent Identifier Services and their Metadata by John Kunze
Persistent Identifier Services and their Metadata by John KunzePersistent Identifier Services and their Metadata by John Kunze
Persistent Identifier Services and their Metadata by John Kunze
 
DataONE Education Module 10: Legal and Policy Issues
DataONE Education Module 10: Legal and Policy IssuesDataONE Education Module 10: Legal and Policy Issues
DataONE Education Module 10: Legal and Policy Issues
 
DataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data Sharing
 
THOR Workshop - Persistent Identifier Linking
THOR Workshop - Persistent Identifier LinkingTHOR Workshop - Persistent Identifier Linking
THOR Workshop - Persistent Identifier Linking
 
Research Data Management: Why is it important?
Research Data Management: Why is it  important?Research Data Management: Why is it  important?
Research Data Management: Why is it important?
 
Overcoming obstacles to sharing data about human subjects
Overcoming obstacles to sharing data about human subjectsOvercoming obstacles to sharing data about human subjects
Overcoming obstacles to sharing data about human subjects
 
Dataverse in China: Internationalization, Curation and Promotion by Yin Shenqin
Dataverse in China: Internationalization, Curation and Promotion by Yin ShenqinDataverse in China: Internationalization, Curation and Promotion by Yin Shenqin
Dataverse in China: Internationalization, Curation and Promotion by Yin Shenqin
 
Organising and Documenting Data
Organising and Documenting DataOrganising and Documenting Data
Organising and Documenting Data
 
Altman RDAP11 Policy-based Data Management
Altman RDAP11 Policy-based Data ManagementAltman RDAP11 Policy-based Data Management
Altman RDAP11 Policy-based Data Management
 
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
 
Next generation data services at the Marriott Library
Next generation data services at the Marriott LibraryNext generation data services at the Marriott Library
Next generation data services at the Marriott Library
 
Data Management Planning for researchers
Data Management Planning for researchersData Management Planning for researchers
Data Management Planning for researchers
 
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
 
Publishing perspectives on data management & future directions
Publishing perspectives on data management & future directionsPublishing perspectives on data management & future directions
Publishing perspectives on data management & future directions
 
Managing and sharing data
Managing and sharing dataManaging and sharing data
Managing and sharing data
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
 
Rots RDAP11 Data Archives in Federal Agencies
Rots RDAP11 Data Archives in Federal AgenciesRots RDAP11 Data Archives in Federal Agencies
Rots RDAP11 Data Archives in Federal Agencies
 
Preparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR PrinciplesPreparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR Principles
 
ANDS and Data Management
ANDS and Data ManagementANDS and Data Management
ANDS and Data Management
 
DataONE Education Module 08: Data Citation
DataONE Education Module 08: Data CitationDataONE Education Module 08: Data Citation
DataONE Education Module 08: Data Citation
 

Similar to Data Citation Implementation Guidelines By Tim Clark

DataCite and its DOI infrastructure - IASSIST 2013
DataCite and its DOI infrastructure - IASSIST 2013DataCite and its DOI infrastructure - IASSIST 2013
DataCite and its DOI infrastructure - IASSIST 2013
Frauke Ziedorn
 

Similar to Data Citation Implementation Guidelines By Tim Clark (20)

A Data Citation Roadmap for Scholarly Data Repositories
A Data Citation Roadmap for Scholarly Data RepositoriesA Data Citation Roadmap for Scholarly Data Repositories
A Data Citation Roadmap for Scholarly Data Repositories
 
Data Management and Horizon 2020
Data Management and Horizon 2020Data Management and Horizon 2020
Data Management and Horizon 2020
 
DataCite and its DOI infrastructure - IASSIST 2013
DataCite and its DOI infrastructure - IASSIST 2013DataCite and its DOI infrastructure - IASSIST 2013
DataCite and its DOI infrastructure - IASSIST 2013
 
A Framework for Dynamic Data Source Identification and Orchestration on the Web
A Framework for Dynamic Data Source Identification and Orchestration on the WebA Framework for Dynamic Data Source Identification and Orchestration on the Web
A Framework for Dynamic Data Source Identification and Orchestration on the Web
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
EPA OEI Linked Data Process
EPA OEI Linked Data ProcessEPA OEI Linked Data Process
EPA OEI Linked Data Process
 
Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13
 
Deploying Linked Open Data: Methodologies and Software Tools
Deploying Linked Open Data: Methodologies and Software ToolsDeploying Linked Open Data: Methodologies and Software Tools
Deploying Linked Open Data: Methodologies and Software Tools
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
 
Intro to RDM
Intro to RDMIntro to RDM
Intro to RDM
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...
 
Managing your data paget
Managing your data pagetManaging your data paget
Managing your data paget
 
Shareable by Design: Making Better Use of your Research
Shareable by Design: Making Better Use of your ResearchShareable by Design: Making Better Use of your Research
Shareable by Design: Making Better Use of your Research
 
Linked data 20171106
Linked data 20171106Linked data 20171106
Linked data 20171106
 
Clinical Quality Linked Data on health.data.gov
Clinical Quality Linked Data on health.data.govClinical Quality Linked Data on health.data.gov
Clinical Quality Linked Data on health.data.gov
 
2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial
 
2012 Fall Data Management Planning Workshop
2012 Fall Data Management Planning Workshop2012 Fall Data Management Planning Workshop
2012 Fall Data Management Planning Workshop
 
Dataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsDataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTags
 
How can we ensure research data is re-usable? The role of Publishers in Resea...
How can we ensure research data is re-usable? The role of Publishers in Resea...How can we ensure research data is re-usable? The role of Publishers in Resea...
How can we ensure research data is re-usable? The role of Publishers in Resea...
 
Linked Open Data_mlanet13
Linked Open Data_mlanet13Linked Open Data_mlanet13
Linked Open Data_mlanet13
 

More from datascienceiqss

American Journal of Political Science & The Odum Institute: Promoting Researc...
American Journal of Political Science & The Odum Institute: Promoting Researc...American Journal of Political Science & The Odum Institute: Promoting Researc...
American Journal of Political Science & The Odum Institute: Promoting Researc...
datascienceiqss
 

More from datascienceiqss (18)

Citing Data in Journal Articles using JATS by Deborah A. Lapeyre
Citing Data in Journal Articles using JATS by Deborah A. LapeyreCiting Data in Journal Articles using JATS by Deborah A. Lapeyre
Citing Data in Journal Articles using JATS by Deborah A. Lapeyre
 
Big Data Repository for Structural Biology: Challenges and Opportunities by P...
Big Data Repository for Structural Biology: Challenges and Opportunities by P...Big Data Repository for Structural Biology: Challenges and Opportunities by P...
Big Data Repository for Structural Biology: Challenges and Opportunities by P...
 
iRODS/Dataverse Project by Jonathan Crabtree
iRODS/Dataverse Project by Jonathan CrabtreeiRODS/Dataverse Project by Jonathan Crabtree
iRODS/Dataverse Project by Jonathan Crabtree
 
DataTags: Sharing Privacy Sensitive Data by Latanya Sweeney
DataTags: Sharing Privacy Sensitive Data by Latanya SweeneyDataTags: Sharing Privacy Sensitive Data by Latanya Sweeney
DataTags: Sharing Privacy Sensitive Data by Latanya Sweeney
 
Center for Open Science and the Open Science Framework: Dataverse Add-on by S...
Center for Open Science and the Open Science Framework: Dataverse Add-on by S...Center for Open Science and the Open Science Framework: Dataverse Add-on by S...
Center for Open Science and the Open Science Framework: Dataverse Add-on by S...
 
Data Analysis in Dataverse & Visualization of Datasets on Historical Maps by ...
Data Analysis in Dataverse & Visualization of Datasets on Historical Maps by ...Data Analysis in Dataverse & Visualization of Datasets on Historical Maps by ...
Data Analysis in Dataverse & Visualization of Datasets on Historical Maps by ...
 
Geospatial Data Visualization: WorldMap Integration by Raman Prasad
Geospatial Data Visualization: WorldMap Integration by Raman PrasadGeospatial Data Visualization: WorldMap Integration by Raman Prasad
Geospatial Data Visualization: WorldMap Integration by Raman Prasad
 
Sharing Data Through Plots with Plotly by Alex Johnson
Sharing Data Through Plots with Plotly by Alex JohnsonSharing Data Through Plots with Plotly by Alex Johnson
Sharing Data Through Plots with Plotly by Alex Johnson
 
TwoRavens: A Graphical, Browser-Based Statistical Interface for Data Reposito...
TwoRavens: A Graphical, Browser-Based Statistical Interface for Data Reposito...TwoRavens: A Graphical, Browser-Based Statistical Interface for Data Reposito...
TwoRavens: A Graphical, Browser-Based Statistical Interface for Data Reposito...
 
MIT Libraries Dataverse by Katherine McNeill
MIT Libraries Dataverse by Katherine McNeillMIT Libraries Dataverse by Katherine McNeill
MIT Libraries Dataverse by Katherine McNeill
 
The Project TIER Dataverse: Archiving and Sharing Replicable Student Research...
The Project TIER Dataverse: Archiving and Sharing Replicable Student Research...The Project TIER Dataverse: Archiving and Sharing Replicable Student Research...
The Project TIER Dataverse: Archiving and Sharing Replicable Student Research...
 
American Journal of Political Science & The Odum Institute: Promoting Researc...
American Journal of Political Science & The Odum Institute: Promoting Researc...American Journal of Political Science & The Odum Institute: Promoting Researc...
American Journal of Political Science & The Odum Institute: Promoting Researc...
 
Political Analysis Dataverse by Jonathan N. Katz
Political Analysis Dataverse by Jonathan N. KatzPolitical Analysis Dataverse by Jonathan N. Katz
Political Analysis Dataverse by Jonathan N. Katz
 
Data in Brief and Dataverse: Incentivizing Authors to Share Data by Paige Sha...
Data in Brief and Dataverse: Incentivizing Authors to Share Data by Paige Sha...Data in Brief and Dataverse: Incentivizing Authors to Share Data by Paige Sha...
Data in Brief and Dataverse: Incentivizing Authors to Share Data by Paige Sha...
 
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
 
Contributing Code to Dataverse by Gustavo Durand
Contributing Code to Dataverse by Gustavo DurandContributing Code to Dataverse by Gustavo Durand
Contributing Code to Dataverse by Gustavo Durand
 
Dataverse 4.0 UX by Elizabeth Quigley
Dataverse 4.0 UX by Elizabeth QuigleyDataverse 4.0 UX by Elizabeth Quigley
Dataverse 4.0 UX by Elizabeth Quigley
 
Towards a common deposit api (the dataverse example) Elizabeth Quigley + Phil...
Towards a common deposit api (the dataverse example) Elizabeth Quigley + Phil...Towards a common deposit api (the dataverse example) Elizabeth Quigley + Phil...
Towards a common deposit api (the dataverse example) Elizabeth Quigley + Phil...
 

Recently uploaded

Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
AnaAcapella
 

Recently uploaded (20)

Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
How to Manage Call for Tendor in Odoo 17
How to Manage Call for Tendor in Odoo 17How to Manage Call for Tendor in Odoo 17
How to Manage Call for Tendor in Odoo 17
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Tatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsTatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf arts
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfFICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
 
AIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptAIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.ppt
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Philosophy of china and it's charactistics
Philosophy of china and it's charactisticsPhilosophy of china and it's charactistics
Philosophy of china and it's charactistics
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
latest AZ-104 Exam Questions and Answers
latest AZ-104 Exam Questions and Answerslatest AZ-104 Exam Questions and Answers
latest AZ-104 Exam Questions and Answers
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 

Data Citation Implementation Guidelines By Tim Clark

  • 1. Joint Declaration of Data Citation Principles © 2015 Massachusetts General Hospital and FORCE11.org Tim Clark, Ph.D. Assistant Professor of Neurology Massachusetts General Hospital & Harvard Medical School June 9, 2015
  • 3.
  • 4. Non-reproduciblity 11% Begley CG and Ellis LM, Nature 2012, 483(7391):531-533
  • 5. Transparency and Reproducibility • Transparency is the basis of reproducibility • What we are aiming for is robust science • Validation from multiple orthogonal viewpoints • Focus on transparent communication of results
  • 6. Joint Declaration of Data Citation Principles endorsed by over 90 scholarly organizations
  • 7.
  • 8.
  • 9. The Brief JDDCP 1. Importance. Data are first- class objects. 2. Credit. Support citing all contributors to the data. 3. Evidence. Assertions must be traceable to evidence. 4. Unique ID. Cited datasets must have resolvable IDs. 5. Access. Data must be robustly archived. 6. Persistence. Metadata must persist even after data is gone. 7. Specificity & Verifiability. Get same dynamic time-slice. 8. Interoperable & flexible. Give cross-community support.
  • 11.
  • 12. JDDCP Archival, id & retrieval Document model Archival & retrieval Archival & retrieval Identification Common APIs Workflows Metadata
  • 13. repositories social science biomedicine earth science climatology scholarly publishing scholarly publishing web standards scientific data standards astronomy scholarly publishing physics academic libraries data science software technology physics scholarly publishing biomedicine Archival & retrieval
  • 14. Human and machine accessibility of cited data in scholarly publications © 2015 Massachusetts General Hospital and FORCE11.org Tim Clark, Ph.D. Assistant Professor of Neurology Massachusetts General Hospital & Harvard Medical School June 9, 2015
  • 15. or, how to store and access cited data to radically improve scholarly transparency - and so that BOTH humans and machines are happy.
  • 16. PeerJ Computer Science 1:e1. https://dx.doi.org/10.7717/peerj-cs.1
  • 17. Basic guidelines 1. Cite data as you would cite publications. 2. Deposit data in an archival-quality repository. 3. Use an identifier scheme meeting JDDCP criteria. 4. Identifiers should resolve to a landing page, not directly to the data. 5. Landing pages describe the data in both human and machine readable form.
  • 18. Basic guidelines (contd.) 6. Landing page & data retention may differ. 7. Repositories should provide specific guarantee of landing page persistence. 8. Landing pages should provide both human and machine interpretable information. 9. Provide web service accessibility. 10. Stakeholder responsibilities for ecosystem.
  • 19. 1. Cite data as you would cite publications • Strongly preferred: • Use the NISO JATS revision 1.1d2 XML schema • Interim (less good) alternative: • Use own XML schema, but do what JATS does.
  • 20. 2. Deposit data in archival quality repositories Examples: • NIH and EBI bioscience repositories; • Standard earth/space/physical science repositories; • Dataverse, Dryad, Figshare, Zenodo; etc. Unacceptable: • “Available on my laboratory website”.
  • 21. 3. Use an ID scheme that meets JDDCP criteria (4-6) Any currently‐available identifier scheme that is: • Machine actionable, • Globally unique, • Widely used by a community, and • Has a long term commitment to persistence Best practice: • use a scheme that is cross-discipline, such as DOI.
  • 22. Machine accessibility Machine accessibility in this context means: “access by well-documented Web services—preferably RESTful Web services—to data and metadata stored in a robust repository, independently of integrated browser access by humans.”
  • 23. Commitment to persistence If a resolving authority is required, that authority has demonstrated a reasonable chance to be present and functional in the future; Owner of the domain or the resolving authority has made a credible commitment to ensure that its identifiers will always resolve. A useful survey of persistent identifier schemes appears in Hilse & Kothe (2006).
  • 24. • Digital Object Identifiers (DOIs)
  • 25. 4. Identifiers should resolve to a landing page, not directly to data Because: • Data may be de-accessioned, like books, but the description of thing cited should remain; • Data may be restricted (e.g. Protected Health Information; specially-licensed data; etc.); • Data may be VERY large and user needs to be able to decide whether to download or not. • Content negotiation for machine access!
  • 26. 5. Landing pages describe the data Best practices: • Identifier, title, description, creator, publisher/contact, publication/release date, version. Additional: • Creator identifier (e.g. ORCID), license Content encoding: • HTML; plus… • At least one non-proprietary machine-readable format, e.g. XML, JSON/JSON-LD, RDF, microformats, microdata, RDFa,…
  • 27. Serving the landing pages “To enable automated agents to extract the metadata these landing pages should include an HTML <link> element specifying a machine readable form of the page as an alternative.” “For those that are capable of doing so, we recommend also using Web Linking (Nottingham, 2010) to provide this information from all of the alternative formats.”
  • 28. 6. Landing page retention may differ from data retention Because: • Repositories cannot commit to keeping arbitrary and possibly very large volumes of data forever! • But when data is de-accessioned, the citation identifier must not give a 404 error. • Retain awareness of what was cited even if it is not currently extant in a particular repository.
  • 29. 7. Repositories should provide a specific guarantee of persistence for landing pages Model guarantee language: “[Organization/Institution Name] is committed to maintaining persistent identifiers in [Repository Name] so that they will continue to resolve to a landing page providing metadata describing the data, including elements of stewardship, provenance, and availability. [Organization/Institution Name] has made the following plan for organizational persistence and succession [plan]
  • 30. 8. Landing pages should provide both human and machine interpretable information. Because: • Mash-ups and distributed search. • Apps that you haven’t yet thought of. • Web services. Examples of machine interpretable info: •.RDF, RDFa, XML, microformats, JSON-LD, etc.
  • 31. 9. Provide web service accessibility Because: • Service composition, new apps, etc. Best practice: •.RESTful web service, because this is a data- oriented application and required functionality. Much less good practice: • SOAP, because SOAP is process-oriented.
  • 32. 10. Stakeholder responsibilities • Archives and repositories: Ids, resolution, landing page metadata, dataset description, data access methods conform to these recommendations. • Registries of repositories: Document conformance. • Researchers: Treat data as first-class objects. • Funders, scholarly societies, academic institutions: Strongly encourage conformance to best practices.
  • 33. Summary • Use NISO JATS 1.1d2 to publish & archive documents. • Cite datasets as if they were publications and deposit datasets in archival repositories. • Follow human & machine accessibility guidelines as presented above in points 3 through 9. • Adhere to stakeholder responsibilities as in point 10. • Welcome to the future of scholarly publishing!
  • 34. Acknowledgements • Joan Starr, California Digital Library • other co-authors of the “Achieving Human and Machine Accessibility” publication • FORCE11 Data Citation Implementation Group • Maryann Martone, UCSD & FORCE11 • John Kunze, California Digital Library • Harry Hochheiser, University of Pittsburgh • Phil Bourne, NIH Data Science Directorate