Linking Data to Publications through Citation and Virtual Archives Micah Altman, Institute for Quantitative Social Science...
Collaborators* <ul><li>Leonid Andreev, Ed Bachman,  Adam Buchbinder,  Ken Bollen, Bryan Beecher, Steve Burling, Kevin Cond...
Related Work <ul><li>Reprints available from:  http://maltman.hmdc.harvard.edu </li></ul><ul><li>M. Altman, Adams, M., Cra...
Roadmap <ul><li>Motivations </li></ul><ul><li>Elements of data management </li></ul><ul><li>Citing Data </li></ul><ul><li>...
Data Access is Key to Science <ul><li>Science is not (only) about being scientific </li></ul><ul><li>Scientific progress r...
Data is Key to Democracy <ul><li>Statistics = state-istics </li></ul><ul><li>The state tax authority: counting people, est...
Open Data Enables New Forms of Science and Education <ul><li>Data Intensive Science </li></ul><ul><ul><li>Increased opport...
Science  Model <ul><li>“ Unpublished data and personal communications  Citations to unpublished data and personal communic...
Some Formal Requirements <ul><li>The  Final NIH Statement on Sharing Research Data   </li></ul><ul><ul><li>was published i...
Data Management Plans <ul><li>Safeguarding data for internal use in project </li></ul><ul><ul><li>Documentation </li></ul>...
Data Management Elements Linking Data to Publications through Citation and Virtual Archives Motivations Elements Citing Da...
Access and Sharing Linking Data to Publications through Citation and Virtual Archives Motivations Elements Citing Data Vir...
Organization and Documentation Linking Data to Publications through Citation and Virtual Archives Motivations Elements Cit...
DMP Operational Issues Linking Data to Publications through Citation and Virtual Archives Motivations Elements Citing Data...
Rights and Responsibilities Linking Data to Publications through Citation and Virtual Archives Motivations Elements Citing...
Why is Infrastructure for Data Sharing Necessary? <ul><li>Accessibility: </li></ul><ul><ul><li>Many large data sets: in pu...
Core Requirements for Data Sharing Infrastructure <ul><li>Stakeholder incentives  </li></ul><ul><ul><li>recognition; citat...
Data Citation as a Leverage Point  <ul><li>Services </li></ul><ul><ul><li>Identifiers to specific fixed versions of data a...
Emerging Practices for Data Citation  <ul><li>Publishers </li></ul><ul><li>Data archives </li></ul><ul><li>Standard bodies...
ORCID Participant Meeting: Data Citations and The DataVerse Network (R) Common Principles
Thanks to 37 Participants Linking Data to Publications through Citation and Virtual Archives Motivations Elements Citing D...
<ul><li>Seven Ways of Looking at Data </li></ul>Linking Data to Publications through Citation and Virtual Archives AKA ^ S...
Theory Linking Data to Publications through Citation and Virtual Archives
Theory + Linking Data to Publications through Citation and Virtual Archives <ul><li>Data citations should be first class o...
Theory + Practice
Use Cases Linking Data to Publications through Citation and Virtual Archives
Use Cases Linking Data to Publications through Citation and Virtual Archives
Dataverse For Organizations For Scholars <ul><li>Brand it like your own website. </li></ul><ul><li>Upload any type of data...
Federated Archive: National Research Portal <ul><li>Preserve data </li></ul><ul><li>Provide access to local data </li></ul...
Virtual Archive: Library Catalog & Repository <ul><li>Pathfinder </li></ul><ul><li>Virtual collection </li></ul><ul><li>Ma...
Federated + Virtual: DataPass Union Catalog  <ul><li>Data-PASS uses DataVerse: </li></ul><ul><ul><li>Creates federated cat...
Journal Replication Archives <ul><li>Support publication workflows </li></ul><ul><li>Permanent, branded supplementary mate...
Virtual Archive: Scholar Site <ul><li>Scholar retains control over branding and dissemination </li></ul><ul><li>Preservati...
Dataverse Network  – Designed for Research Data Linking Data to Publications through Citation and Virtual Archives
Summary <ul><li>Data Management Include </li></ul><ul><ul><li>Safeguarding data for use </li></ul></ul><ul><ul><li>Protect...
Contact Us <ul><li>Micah Altman </li></ul><ul><li>maltman.hmdc.harvard.edu </li></ul><ul><li>The Dataverse Network ™ </li>...
Upcoming SlideShare
Loading in...5
×

Linking Data to Publications through Citation and Virtual Archives

1,082

Published on

Prepared for the 2011 SSP 33rd Annual Meeting June 2011

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,082
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
19
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • This work “Trustworthy Repositories, Organizations &amp; Infrastructure”, by Micah Altman (http://redistricting.info) is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
  • Transcript of "Linking Data to Publications through Citation and Virtual Archives"

    1. 1. Linking Data to Publications through Citation and Virtual Archives Micah Altman, Institute for Quantitative Social Science, Harvard University Prepared for the 2011 SSP 33rd Annual Meeting June 2011
    2. 2. Collaborators* <ul><li>Leonid Andreev, Ed Bachman, Adam Buchbinder, Ken Bollen, Bryan Beecher, Steve Burling, Kevin Condon, Jonathan Crabtree, Merce Crosas, Gary King , Patrick King, Tom Lipkis, Freeman Lo, Jared Lyle, Marc Maynard, Nancy McGovern, Lois Timms-Ferrarra, Akio Sone, Bob Treacy </li></ul><ul><li>Research Support </li></ul><ul><ul><li>Thanks to the Library of Congress (PA#NDP03-1), the National Science Foundation (DMS-0835500, SES 0112072), IMLS (LG-05-09-0041-09), the Harvard University Library, the Institute for Quantitative Social Science, the Harvard-MIT Data Center, and the Murray Research Archive. </li></ul></ul>Linking Data to Publications through Citation and Virtual Archives * And co-conspirators
    3. 3. Related Work <ul><li>Reprints available from: http://maltman.hmdc.harvard.edu </li></ul><ul><li>M. Altman, Adams, M., Crabtree, J., Donakowski, D., Maynard, M., Pienta, A., & Young, C. 2009. &quot;Digital preservation through archival collaboration: The Data Preservation Alliance for the Social Sciences.&quot; The American Archivist . 72(1): 169-182 </li></ul><ul><li>M. Altman and G. King. 2007. “A Proposed Standard for the Scholarly Citation of Quantitative Data”, D-Lib, 13, 3/4 (March/April). </li></ul><ul><li>M. Altman,2008, &quot;A Fingerprint Method for Verification of Scientific Data&quot; in, Advances in Systems, Computing Sciences and Software Engineering , (Proceedings of the International Conference on Systems, Computing Sciences and Software Engineering 2007) , Springer Verlag. </li></ul><ul><li>M. Crosas, 2011, “The Dataverse Network: An Open-Source Application for Sharing, Discovering and Preserving Data”, D-Lib Magazine 17(1/2). </li></ul><ul><li>G. King, 2007, &quot; An Introduction to the Dataverse Network as an Infrastructure for Data Sharing&quot;, Sociological Methods and Research , Vol. 32, No. 2, pp. 173-199 </li></ul>Linking Data to Publications through Citation and Virtual Archives
    4. 4. Roadmap <ul><li>Motivations </li></ul><ul><li>Elements of data management </li></ul><ul><li>Citing Data </li></ul><ul><li>Virtual Archives </li></ul>Linking Data to Publications through Citation and Virtual Archives Motivations Elements Citing Data Virtual Archives
    5. 5. Data Access is Key to Science <ul><li>Science is not (only) about being scientific </li></ul><ul><li>Scientific progress requires community: competition and collaboration in the pursuit of common goals </li></ul><ul><li>Without access to the same materials: no community exists … data is the nucleus of scientific collaboration </li></ul><ul><li>The value of an article that can’t be replicated: ? </li></ul><ul><li>Scholarly articles are summaries, not the actual research results </li></ul><ul><li>Experimental expensive to reproduce, observational data impossible </li></ul><ul><li>Hard for journal editors to verify -- If you find it, how do you know it’s the same? </li></ul><ul><li>Replication projects show: many published articles cannot be replicated … data is needed for scientific replication </li></ul>Linking Data to Publications through Citation and Virtual Archives Motivations Elements Citing Data Virtual Archives
    6. 6. Data is Key to Democracy <ul><li>Statistics = state-istics </li></ul><ul><li>The state tax authority: counting people, estimating wealth </li></ul><ul><li>Reformers use data to assess the performance of the state </li></ul><ul><li>Science informs public policy continually </li></ul><ul><li>In modern democracy: the public needs a direct source of information </li></ul>Linking Data to Publications through Citation and Virtual Archives Source: “Propaganda” http://www.media-studies.ca/articles/images/berlin_wall.jpg Motivations Elements Citing Data Virtual Archives
    7. 7. Open Data Enables New Forms of Science and Education <ul><li>Data Intensive Science </li></ul><ul><ul><li>Increased opportunities for interdisciplinarity </li></ul></ul><ul><ul><li>Science modeling reality across multiple scales </li></ul></ul><ul><ul><li>Continuous, complete, fine-grained information on physical processes, systems, human behavior </li></ul></ul><ul><li>Open Data Democratizes Science </li></ul><ul><ul><li>Citizen-scientist </li></ul></ul><ul><ul><li>Developing countries </li></ul></ul><ul><ul><li>Institutions outside of the inner circle of research </li></ul></ul><ul><li>Education </li></ul><ul><ul><li>Open data eases transition from education to research </li></ul></ul><ul><li>In addition, sharing data increases citation rates [Gleditsch 2003; Wilson 2008; Piowar 2007] </li></ul>Linking Data to Publications through Citation and Virtual Archives Visualization from multiple experiments using Community Climate Systems Model, through Earth Science Grid. Source: “ Beyond Being There”, National Science Foundation, 2008. Motivations Elements Citing Data Virtual Archives
    8. 8. Science Model <ul><li>“ Unpublished data and personal communications Citations to unpublished data and personal communications cannot be used to support claims in a published paper.” </li></ul><ul><li>“ Data and materials availability All data necessary to understand, assess, and extend the conclusions of the manuscript must be available to any reader of Science . ” </li></ul>Linking Data to Publications through Citation and Virtual Archives Motivations Elements Citing Data Virtual Archives
    9. 9. Some Formal Requirements <ul><li>The Final NIH Statement on Sharing Research Data </li></ul><ul><ul><li>was published in the NIH Guide on February 26, 2003. </li></ul></ul><ul><ul><ul><li>“ Starting with the October 1, 2003 receipt date, investigators submitting an NIH application seeking $500,000 or more in direct costs in any single year are expected to include a plan for data sharing or state why data sharing is not possible. “ </li></ul></ul></ul><ul><ul><li>No later than the main findings from the final data set are accepted for publication </li></ul></ul><ul><li>NSF, All proposals must (as of 1/1/2011) include a data management plan. </li></ul><ul><ul><li>Specific requirements vague, for the most part: “will be determined by the community of interest through the process of peer review and program management.” </li></ul></ul><ul><li>Wellcome Trust: </li></ul><ul><ul><li>“ will review data management and sharing plans, and any costs involved in delivering them, as an integral part of the funding decision ” </li></ul></ul>Linking Data to Publications through Citation and Virtual Archives Motivations Elements Citing Data Virtual Archives
    10. 10. Data Management Plans <ul><li>Safeguarding data for internal use in project </li></ul><ul><ul><li>Documentation </li></ul></ul><ul><ul><li>Backup and recovery </li></ul></ul><ul><ul><li>Review </li></ul></ul><ul><li>Treatment of confidential and rights-encumbered information </li></ul><ul><ul><li>Consent to disclosure </li></ul></ul><ul><ul><li>Overview: http://www.icpsr.org/DATAPASS/pdf/confidentiality.pdf </li></ul></ul><ul><ul><li>Separation of identifying and sensitive information </li></ul></ul><ul><ul><li>Obtain certificate of confidentiality, other legal safeguards </li></ul></ul><ul><ul><li>De-identification and public use files </li></ul></ul><ul><ul><li>Licensing </li></ul></ul><ul><li>Dissemination </li></ul><ul><ul><li>Archiving commitment (include letter of support) </li></ul></ul><ul><ul><li>Archiving timeline </li></ul></ul><ul><ul><li>Access procedures </li></ul></ul><ul><ul><li>Documentation </li></ul></ul><ul><ul><li>User vetting, tracking, and support </li></ul></ul><ul><ul><li>Licenses and restriction </li></ul></ul>Linking Data to Publications through Citation and Virtual Archives Motivations Elements Citing Data Virtual Archives
    11. 11. Data Management Elements Linking Data to Publications through Citation and Virtual Archives Motivations Elements Citing Data Virtual Archives
    12. 12. Access and Sharing Linking Data to Publications through Citation and Virtual Archives Motivations Elements Citing Data Virtual Archives
    13. 13. Organization and Documentation Linking Data to Publications through Citation and Virtual Archives Motivations Elements Citing Data Virtual Archives
    14. 14. DMP Operational Issues Linking Data to Publications through Citation and Virtual Archives Motivations Elements Citing Data Virtual Archives
    15. 15. Rights and Responsibilities Linking Data to Publications through Citation and Virtual Archives Motivations Elements Citing Data Virtual Archives
    16. 16. Why is Infrastructure for Data Sharing Necessary? <ul><li>Accessibility: </li></ul><ul><ul><li>Many large data sets: in public archives </li></ul></ul><ul><ul><li>Most data in published articles: not accessible, results not replicable without the original author </li></ul></ul><ul><ul><li>Most data sets from federal grants: not publicly available </li></ul></ul><ul><li>Problems with discovery and linking even with professional archives: </li></ul><ul><ul><li>Data in different archives have different identifiers </li></ul></ul><ul><ul><li>Archives change identifiers, links </li></ul></ul><ul><ul><li>Changes to data are made; identifiers are reused or removed; old data are lost </li></ul></ul><ul><ul><li>Locating/browsing/extracting requires specialized tools & approaches </li></ul></ul><ul><li>Sharing data requires exposing tacit knowledge </li></ul><ul><ul><li>Explicit documentation of data structure, collection process, interpretation </li></ul></ul><ul><ul><li>Harmonizing/linking to known ontologies, metadata schemas, vocabularies </li></ul></ul><ul><li>Data sets are not preserved like books </li></ul><ul><ul><li>Static data files (even if on the web): unreadable after a few years </li></ul></ul><ul><ul><li>When storage methods change: some data sets are lost; others have altered content! </li></ul></ul><ul><li>Why not Single Centralized infrastructure ? </li></ul><ul><ul><li>Single point of failure </li></ul></ul><ul><ul><li>Impossible when data are heterogeneous in format, origin, size, effort needed to collect or analyze, legal access rules, etc. </li></ul></ul><ul><ul><li>Data producers want credit, control, and visibility </li></ul></ul>Linking Data to Publications through Citation and Virtual Archives Motivations Elements Citing Data Virtual Archives
    17. 17. Core Requirements for Data Sharing Infrastructure <ul><li>Stakeholder incentives </li></ul><ul><ul><li>recognition; citation; payment; compliance; services </li></ul></ul><ul><li>Dissemination </li></ul><ul><ul><li>access to metadata; documentation; data </li></ul></ul><ul><li>Access control </li></ul><ul><ul><li>authentication; authorization; rights management </li></ul></ul><ul><li>Provenance </li></ul><ul><ul><li>chain of control; verification of metadata, bits, semantic content </li></ul></ul><ul><li>Persistence </li></ul><ul><ul><li>bits; semantic content; use </li></ul></ul><ul><li>Legal protection </li></ul><ul><ul><li>rights management; consent; record keeping; auditing </li></ul></ul><ul><li>Usability </li></ul><ul><ul><li>discovery; deposit; curation; administration; collaboration </li></ul></ul><ul><li>Business model </li></ul>Linking Data to Publications through Citation and Virtual Archives Sources: King 2007; ICSU 2004; NSB 2005 Motivations Elements Citing Data Virtual Archives
    18. 18. Data Citation as a Leverage Point <ul><li>Services </li></ul><ul><ul><li>Identifiers to specific fixed versions of data are needed to establish unambiguous chains of provenance </li></ul></ul><ul><ul><li>Identifiers that can be globally resolved to machine-understandable metadata and to identified object are needed to building generalized access and analysis services </li></ul></ul><ul><ul><li>Persistence of identifiers are needed to maintain long-term access </li></ul></ul><ul><li>Incentives </li></ul><ul><ul><li>Scholarly credit (intellectual attribution) is a large motivator for many researchers – citation creates incentive for researchers to publish data </li></ul></ul><ul><ul><li>Scholars also comply with enforceable journal policies -- requiring data citation is a light-weight method to make data access policies auditable </li></ul></ul><ul><ul><li>Impact/usage is a motivator for public research funders – data citation provides foundation for measures of usage and impact </li></ul></ul>Linking Data to Publications through Citation and Virtual Archives Motivations Elements Citing Data Virtual Archives
    19. 19. Emerging Practices for Data Citation <ul><li>Publishers </li></ul><ul><li>Data archives </li></ul><ul><li>Standard bodies </li></ul><ul><li>Librarians </li></ul><ul><li>Discipline specific standards </li></ul>Linking Data to Publications through Citation and Virtual Archives Motivations Elements Citing Data Virtual Archives
    20. 20. ORCID Participant Meeting: Data Citations and The DataVerse Network (R) Common Principles
    21. 21. Thanks to 37 Participants Linking Data to Publications through Citation and Virtual Archives Motivations Elements Citing Data Virtual Archives
    22. 22. <ul><li>Seven Ways of Looking at Data </li></ul>Linking Data to Publications through Citation and Virtual Archives AKA ^ Supplementary
    23. 23. Theory Linking Data to Publications through Citation and Virtual Archives
    24. 24. Theory + Linking Data to Publications through Citation and Virtual Archives <ul><li>Data citations should be first class objects for publication -- appear with citation; should be as easy to cite as other works </li></ul><ul><li>At minimum, all data necessary to understand assess extend conclusions in scholarly work should be cited </li></ul><ul><li>Citations should persist and enable access to fixed version of data at least as long as citing work </li></ul><ul><li>Data citation should support unambiguous attribution of credit to all contributors, possibly through the citation ecosystem </li></ul>
    25. 25. Theory + Practice
    26. 26. Use Cases Linking Data to Publications through Citation and Virtual Archives
    27. 27. Use Cases Linking Data to Publications through Citation and Virtual Archives
    28. 28. Dataverse For Organizations For Scholars <ul><li>Brand it like your own website. </li></ul><ul><li>Upload any type of data. </li></ul><ul><li>Establish a persistent data citation </li></ul><ul><li>Facilitate data discovery </li></ul><ul><li>Provide live analysis </li></ul><ul><li>Receive permanent storage space </li></ul><ul><li>Used by archives, libraries, journals, schools </li></ul><ul><li>Enable contributors to upload data </li></ul><ul><li>Organize studies by collections </li></ul><ul><li>Search across a universe of data </li></ul><ul><li>Control access and terms of use </li></ul><ul><li>Federate with catalogs and partners: 
OAI-PMH, LOCKSS, Z39.50, DDI </li></ul>Linking Data to Publications through Citation and Virtual Archives
    29. 29. Federated Archive: National Research Portal <ul><li>Preserve data </li></ul><ul><li>Provide access to local data </li></ul><ul><li>Organize universe of data </li></ul>Linking Data to Publications through Citation and Virtual Archives
    30. 30. Virtual Archive: Library Catalog & Repository <ul><li>Pathfinder </li></ul><ul><li>Virtual collection </li></ul><ul><li>Manages licensed works </li></ul><ul><li>Institutional repository </li></ul>Linking Data to Publications through Citation and Virtual Archives
    31. 31. Federated + Virtual: DataPass Union Catalog <ul><li>Data-PASS uses DataVerse: </li></ul><ul><ul><li>Creates federated catalog </li></ul></ul><ul><ul><li>Manages content for some partners </li></ul></ul><ul><ul><li>Provides simple way for organizations to participate in partnership </li></ul></ul><ul><li>Data-PASS uses SafeArchive: </li></ul><ul><ul><li>Collaboration through mutual replication of partner content </li></ul></ul><ul><ul><li>Supports legal transfer agreements </li></ul></ul><ul><li>SafeArchive + LOCKSS + Dataverse = Policy based replicated data archives </li></ul>Linking Data to Publications through Citation and Virtual Archives
    32. 32. Journal Replication Archives <ul><li>Support publication workflows </li></ul><ul><li>Permanent, branded supplementary materials repository </li></ul><ul><li>Treats data as first class objects – provides identifiers and services </li></ul>Linking Data to Publications through Citation and Virtual Archives
    33. 33. Virtual Archive: Scholar Site <ul><li>Scholar retains control over branding and dissemination </li></ul><ul><li>Preservation and long-term access is guaranteed </li></ul><ul><li>Dissemination and compliance with Data Manage Plans is verifiable </li></ul><ul><li>Integrates with OpenScholar </li></ul>Linking Data to Publications through Citation and Virtual Archives
    34. 34. Dataverse Network – Designed for Research Data Linking Data to Publications through Citation and Virtual Archives
    35. 35. Summary <ul><li>Data Management Include </li></ul><ul><ul><li>Safeguarding data for use </li></ul></ul><ul><ul><li>Protecting rights and confidentiality </li></ul></ul><ul><ul><li>Short term and long term dissemination </li></ul></ul><ul><ul><li>Many technical issue </li></ul></ul><ul><ul><li>Many institutional models </li></ul></ul><ul><li>Citation provides leverage for incentives and services </li></ul><ul><ul><li>Data citations should be first class objects for publication -- appear with citation; should be as easy to cite as other works </li></ul></ul><ul><ul><li>At minimum, all data necessary to understand assess extend conclusions in scholarly work should be cited </li></ul></ul><ul><ul><li>Citations should persist and enable access to fixed version of data at least as long as citing work </li></ul></ul><ul><ul><li>Data citation should support unambiguous attribution of credit to all contributors, possibly through the citation ecosystem </li></ul></ul><ul><li>Virtual archiving is one successful model for publication related data archiving and primary data archiving </li></ul>Linking Data to Publications through Citation and Virtual Archives
    36. 36. Contact Us <ul><li>Micah Altman </li></ul><ul><li>maltman.hmdc.harvard.edu </li></ul><ul><li>The Dataverse Network ™ </li></ul><ul><li>thedata.org </li></ul>Linking Data to Publications through Citation and Virtual Archives
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×