• Like

Linking Data to Publications through Citation and Virtual Archives

  • 964 views
Uploaded on

Prepared for the 2011 SSP 33rd Annual Meeting June 2011 …

Prepared for the 2011 SSP 33rd Annual Meeting June 2011

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
964
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
18
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • This work “Trustworthy Repositories, Organizations & Infrastructure”, by Micah Altman (http://redistricting.info) is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

Transcript

  • 1. Linking Data to Publications through Citation and Virtual Archives Micah Altman, Institute for Quantitative Social Science, Harvard University Prepared for the 2011 SSP 33rd Annual Meeting June 2011
  • 2. Collaborators*
    • Leonid Andreev, Ed Bachman, Adam Buchbinder, Ken Bollen, Bryan Beecher, Steve Burling, Kevin Condon, Jonathan Crabtree, Merce Crosas, Gary King , Patrick King, Tom Lipkis, Freeman Lo, Jared Lyle, Marc Maynard, Nancy McGovern, Lois Timms-Ferrarra, Akio Sone, Bob Treacy
    • Research Support
      • Thanks to the Library of Congress (PA#NDP03-1), the National Science Foundation (DMS-0835500, SES 0112072), IMLS (LG-05-09-0041-09), the Harvard University Library, the Institute for Quantitative Social Science, the Harvard-MIT Data Center, and the Murray Research Archive.
    Linking Data to Publications through Citation and Virtual Archives * And co-conspirators
  • 3. Related Work
    • Reprints available from: http://maltman.hmdc.harvard.edu
    • M. Altman, Adams, M., Crabtree, J., Donakowski, D., Maynard, M., Pienta, A., & Young, C. 2009. "Digital preservation through archival collaboration: The Data Preservation Alliance for the Social Sciences." The American Archivist . 72(1): 169-182
    • M. Altman and G. King. 2007. “A Proposed Standard for the Scholarly Citation of Quantitative Data”, D-Lib, 13, 3/4 (March/April).
    • M. Altman,2008, "A Fingerprint Method for Verification of Scientific Data" in, Advances in Systems, Computing Sciences and Software Engineering , (Proceedings of the International Conference on Systems, Computing Sciences and Software Engineering 2007) , Springer Verlag.
    • M. Crosas, 2011, “The Dataverse Network: An Open-Source Application for Sharing, Discovering and Preserving Data”, D-Lib Magazine 17(1/2).
    • G. King, 2007, " An Introduction to the Dataverse Network as an Infrastructure for Data Sharing", Sociological Methods and Research , Vol. 32, No. 2, pp. 173-199
    Linking Data to Publications through Citation and Virtual Archives
  • 4. Roadmap
    • Motivations
    • Elements of data management
    • Citing Data
    • Virtual Archives
    Linking Data to Publications through Citation and Virtual Archives Motivations Elements Citing Data Virtual Archives
  • 5. Data Access is Key to Science
    • Science is not (only) about being scientific
    • Scientific progress requires community: competition and collaboration in the pursuit of common goals
    • Without access to the same materials: no community exists … data is the nucleus of scientific collaboration
    • The value of an article that can’t be replicated: ?
    • Scholarly articles are summaries, not the actual research results
    • Experimental expensive to reproduce, observational data impossible
    • Hard for journal editors to verify -- If you find it, how do you know it’s the same?
    • Replication projects show: many published articles cannot be replicated … data is needed for scientific replication
    Linking Data to Publications through Citation and Virtual Archives Motivations Elements Citing Data Virtual Archives
  • 6. Data is Key to Democracy
    • Statistics = state-istics
    • The state tax authority: counting people, estimating wealth
    • Reformers use data to assess the performance of the state
    • Science informs public policy continually
    • In modern democracy: the public needs a direct source of information
    Linking Data to Publications through Citation and Virtual Archives Source: “Propaganda” http://www.media-studies.ca/articles/images/berlin_wall.jpg Motivations Elements Citing Data Virtual Archives
  • 7. Open Data Enables New Forms of Science and Education
    • Data Intensive Science
      • Increased opportunities for interdisciplinarity
      • Science modeling reality across multiple scales
      • Continuous, complete, fine-grained information on physical processes, systems, human behavior
    • Open Data Democratizes Science
      • Citizen-scientist
      • Developing countries
      • Institutions outside of the inner circle of research
    • Education
      • Open data eases transition from education to research
    • In addition, sharing data increases citation rates [Gleditsch 2003; Wilson 2008; Piowar 2007]
    Linking Data to Publications through Citation and Virtual Archives Visualization from multiple experiments using Community Climate Systems Model, through Earth Science Grid. Source: “ Beyond Being There”, National Science Foundation, 2008. Motivations Elements Citing Data Virtual Archives
  • 8. Science Model
    • “ Unpublished data and personal communications Citations to unpublished data and personal communications cannot be used to support claims in a published paper.”
    • “ Data and materials availability All data necessary to understand, assess, and extend the conclusions of the manuscript must be available to any reader of Science . ”
    Linking Data to Publications through Citation and Virtual Archives Motivations Elements Citing Data Virtual Archives
  • 9. Some Formal Requirements
    • The Final NIH Statement on Sharing Research Data
      • was published in the NIH Guide on February 26, 2003.
        • “ Starting with the October 1, 2003 receipt date, investigators submitting an NIH application seeking $500,000 or more in direct costs in any single year are expected to include a plan for data sharing or state why data sharing is not possible. “
      • No later than the main findings from the final data set are accepted for publication
    • NSF, All proposals must (as of 1/1/2011) include a data management plan.
      • Specific requirements vague, for the most part: “will be determined by the community of interest through the process of peer review and program management.”
    • Wellcome Trust:
      • “ will review data management and sharing plans, and any costs involved in delivering them, as an integral part of the funding decision ”
    Linking Data to Publications through Citation and Virtual Archives Motivations Elements Citing Data Virtual Archives
  • 10. Data Management Plans
    • Safeguarding data for internal use in project
      • Documentation
      • Backup and recovery
      • Review
    • Treatment of confidential and rights-encumbered information
      • Consent to disclosure
      • Overview: http://www.icpsr.org/DATAPASS/pdf/confidentiality.pdf
      • Separation of identifying and sensitive information
      • Obtain certificate of confidentiality, other legal safeguards
      • De-identification and public use files
      • Licensing
    • Dissemination
      • Archiving commitment (include letter of support)
      • Archiving timeline
      • Access procedures
      • Documentation
      • User vetting, tracking, and support
      • Licenses and restriction
    Linking Data to Publications through Citation and Virtual Archives Motivations Elements Citing Data Virtual Archives
  • 11. Data Management Elements Linking Data to Publications through Citation and Virtual Archives Motivations Elements Citing Data Virtual Archives
  • 12. Access and Sharing Linking Data to Publications through Citation and Virtual Archives Motivations Elements Citing Data Virtual Archives
  • 13. Organization and Documentation Linking Data to Publications through Citation and Virtual Archives Motivations Elements Citing Data Virtual Archives
  • 14. DMP Operational Issues Linking Data to Publications through Citation and Virtual Archives Motivations Elements Citing Data Virtual Archives
  • 15. Rights and Responsibilities Linking Data to Publications through Citation and Virtual Archives Motivations Elements Citing Data Virtual Archives
  • 16. Why is Infrastructure for Data Sharing Necessary?
    • Accessibility:
      • Many large data sets: in public archives
      • Most data in published articles: not accessible, results not replicable without the original author
      • Most data sets from federal grants: not publicly available
    • Problems with discovery and linking even with professional archives:
      • Data in different archives have different identifiers
      • Archives change identifiers, links
      • Changes to data are made; identifiers are reused or removed; old data are lost
      • Locating/browsing/extracting requires specialized tools & approaches
    • Sharing data requires exposing tacit knowledge
      • Explicit documentation of data structure, collection process, interpretation
      • Harmonizing/linking to known ontologies, metadata schemas, vocabularies
    • Data sets are not preserved like books
      • Static data files (even if on the web): unreadable after a few years
      • When storage methods change: some data sets are lost; others have altered content!
    • Why not Single Centralized infrastructure ?
      • Single point of failure
      • Impossible when data are heterogeneous in format, origin, size, effort needed to collect or analyze, legal access rules, etc.
      • Data producers want credit, control, and visibility
    Linking Data to Publications through Citation and Virtual Archives Motivations Elements Citing Data Virtual Archives
  • 17. Core Requirements for Data Sharing Infrastructure
    • Stakeholder incentives
      • recognition; citation; payment; compliance; services
    • Dissemination
      • access to metadata; documentation; data
    • Access control
      • authentication; authorization; rights management
    • Provenance
      • chain of control; verification of metadata, bits, semantic content
    • Persistence
      • bits; semantic content; use
    • Legal protection
      • rights management; consent; record keeping; auditing
    • Usability
      • discovery; deposit; curation; administration; collaboration
    • Business model
    Linking Data to Publications through Citation and Virtual Archives Sources: King 2007; ICSU 2004; NSB 2005 Motivations Elements Citing Data Virtual Archives
  • 18. Data Citation as a Leverage Point
    • Services
      • Identifiers to specific fixed versions of data are needed to establish unambiguous chains of provenance
      • Identifiers that can be globally resolved to machine-understandable metadata and to identified object are needed to building generalized access and analysis services
      • Persistence of identifiers are needed to maintain long-term access
    • Incentives
      • Scholarly credit (intellectual attribution) is a large motivator for many researchers – citation creates incentive for researchers to publish data
      • Scholars also comply with enforceable journal policies -- requiring data citation is a light-weight method to make data access policies auditable
      • Impact/usage is a motivator for public research funders – data citation provides foundation for measures of usage and impact
    Linking Data to Publications through Citation and Virtual Archives Motivations Elements Citing Data Virtual Archives
  • 19. Emerging Practices for Data Citation
    • Publishers
    • Data archives
    • Standard bodies
    • Librarians
    • Discipline specific standards
    Linking Data to Publications through Citation and Virtual Archives Motivations Elements Citing Data Virtual Archives
  • 20. ORCID Participant Meeting: Data Citations and The DataVerse Network (R) Common Principles
  • 21. Thanks to 37 Participants Linking Data to Publications through Citation and Virtual Archives Motivations Elements Citing Data Virtual Archives
  • 22.
    • Seven Ways of Looking at Data
    Linking Data to Publications through Citation and Virtual Archives AKA ^ Supplementary
  • 23. Theory Linking Data to Publications through Citation and Virtual Archives
  • 24. Theory + Linking Data to Publications through Citation and Virtual Archives
    • Data citations should be first class objects for publication -- appear with citation; should be as easy to cite as other works
    • At minimum, all data necessary to understand assess extend conclusions in scholarly work should be cited
    • Citations should persist and enable access to fixed version of data at least as long as citing work
    • Data citation should support unambiguous attribution of credit to all contributors, possibly through the citation ecosystem
  • 25. Theory + Practice
  • 26. Use Cases Linking Data to Publications through Citation and Virtual Archives
  • 27. Use Cases Linking Data to Publications through Citation and Virtual Archives
  • 28. Dataverse For Organizations For Scholars
    • Brand it like your own website.
    • Upload any type of data.
    • Establish a persistent data citation
    • Facilitate data discovery
    • Provide live analysis
    • Receive permanent storage space
    • Used by archives, libraries, journals, schools
    • Enable contributors to upload data
    • Organize studies by collections
    • Search across a universe of data
    • Control access and terms of use
    • Federate with catalogs and partners: 
OAI-PMH, LOCKSS, Z39.50, DDI
    Linking Data to Publications through Citation and Virtual Archives
  • 29. Federated Archive: National Research Portal
    • Preserve data
    • Provide access to local data
    • Organize universe of data
    Linking Data to Publications through Citation and Virtual Archives
  • 30. Virtual Archive: Library Catalog & Repository
    • Pathfinder
    • Virtual collection
    • Manages licensed works
    • Institutional repository
    Linking Data to Publications through Citation and Virtual Archives
  • 31. Federated + Virtual: DataPass Union Catalog
    • Data-PASS uses DataVerse:
      • Creates federated catalog
      • Manages content for some partners
      • Provides simple way for organizations to participate in partnership
    • Data-PASS uses SafeArchive:
      • Collaboration through mutual replication of partner content
      • Supports legal transfer agreements
    • SafeArchive + LOCKSS + Dataverse = Policy based replicated data archives
    Linking Data to Publications through Citation and Virtual Archives
  • 32. Journal Replication Archives
    • Support publication workflows
    • Permanent, branded supplementary materials repository
    • Treats data as first class objects – provides identifiers and services
    Linking Data to Publications through Citation and Virtual Archives
  • 33. Virtual Archive: Scholar Site
    • Scholar retains control over branding and dissemination
    • Preservation and long-term access is guaranteed
    • Dissemination and compliance with Data Manage Plans is verifiable
    • Integrates with OpenScholar
    Linking Data to Publications through Citation and Virtual Archives
  • 34. Dataverse Network – Designed for Research Data Linking Data to Publications through Citation and Virtual Archives
  • 35. Summary
    • Data Management Include
      • Safeguarding data for use
      • Protecting rights and confidentiality
      • Short term and long term dissemination
      • Many technical issue
      • Many institutional models
    • Citation provides leverage for incentives and services
      • Data citations should be first class objects for publication -- appear with citation; should be as easy to cite as other works
      • At minimum, all data necessary to understand assess extend conclusions in scholarly work should be cited
      • Citations should persist and enable access to fixed version of data at least as long as citing work
      • Data citation should support unambiguous attribution of credit to all contributors, possibly through the citation ecosystem
    • Virtual archiving is one successful model for publication related data archiving and primary data archiving
    Linking Data to Publications through Citation and Virtual Archives
  • 36. Contact Us
    • Micah Altman
    • maltman.hmdc.harvard.edu
    • The Dataverse Network ™
    • thedata.org
    Linking Data to Publications through Citation and Virtual Archives