Altman RDAP11 Policy-based Data Management


Published on

Micah Altman, Harvard; Policy-based Data Management

The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • This work “Trustworthy Repositories, Organizations & Infrastructure”, by Micah Altman ( is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
  • Altman RDAP11 Policy-based Data Management

    1. 1. Policy Based Digital Preservation: SafeArchive & The Dataverse Network ® Micah Altman, Institute for Quantitative Social Science, Harvard University Prepared for the Research Data Access and Preservation Summit ASIS&T March 2011
    2. 2. Collaborators* <ul><li>Leonid Andreev, Ed Bachman, Adam Buchbinder, Ken Bollen, Bryan Beecher, Steve Burling, Kevin Condon, Jonathan Crabtree, Merce Crosas, Gary King, Patrick King, Tom Lipkis, Freeman Lo, Jared Lyle, Marc Maynard, Nancy McGovern, Lois Timms-Ferrarra, Akio Sone, Bob Treacy </li></ul><ul><li>Research Support </li></ul><ul><ul><li>Thanks to the Library of Congress (PA#NDP03-1), the National Science Foundation (DMS-0835500, SES 0112072), IMLS (LG-05-09-0041-09), the Harvard University Library, the Institute for Quantitative Social Science, the Harvard-MIT Data Center, and the Murray Research Archive. </li></ul></ul>Policy Based Digital Preservation * And co-conspirators
    3. 3. Related Work <ul><li>Reprints available from: </li></ul><ul><li>Altman, M., and J. Crabtree, 2011. “Using the SafeArchive System: TRAC-Based Auditing of LOCKSS”, Proceedings of Archiving 2011. (Forthcoming) </li></ul><ul><li>Altman, M., Beecher, B., and Crabtree, J.; with L. Andreev, E. Bachman, A. Buchbinder, S. Burling, P. King, M. Maynard. 2009. &quot;A Prototype Platform for Policy-Based Archival Replication.&quot; Against the Grain . 21(2): 44-47. </li></ul><ul><li>Altman, M., Adams, M., Crabtree, J., Donakowski, D., Maynard, M., Pienta, A., & Young, C. 2009. &quot;Digital preservation through archival collaboration: The Data Preservation Alliance for the Social Sciences.&quot; The American Archivist . 72(1): 169-182 </li></ul><ul><li>Crosas, M. 2011, “The Dataverse Network: An Open-Source Application for Sharing, Discovering and Preserving Data”, D-Lib Magazine 17(1/2). </li></ul><ul><li>King, Gary (2007), &quot; An Introduction to the Dataverse Network as an Infrastructure for Data Sharing&quot;, Sociological Methods and Research , Vol. 32, No. 2, pp. 173-199 </li></ul><ul><li>Gutmann,M. Abrahamson, M, Adams, M.O., Altman, M, Arms, C., Bollen, K., Carlson, M., Crabtree, J., Donakowski, D., King, G., Lyle, J., Maynard, M., Pienta, A., Rockwell, R, Timms-Ferrara L., Young, C., 2009. &quot;From Preserving the Past to Preserving the Future: The Data-PASS Project and the challenges of preserving digital social science data&quot;, Library Trends 57(3):315-33 </li></ul>Policy Based Digital Preservation
    4. 4. SafeArchive: TRAC-Based Management of LOCKSS <ul><li>Facilitating collaborative replication and preservation with technology… </li></ul><ul><li>Collaborators declare explicit non-uniform resource commitments </li></ul><ul><li>Policy records commitments, storage network properties </li></ul><ul><li>Storage layer provides replication, integrity, freshness, versioning </li></ul><ul><li>SafeArchive software provides monitoring, auditing, and provisioning </li></ul><ul><li>Content is harvested through HTTP (LOCKSS) or OAI-PMH </li></ul><ul><li>Integration of LOCKSS, The Dataverse Network, TRAC </li></ul>Policy Based Digital Preservation
    5. 5. Adding Policy to LOCKSS <ul><li>LOCKSS Lots of Copies Keep Stuff Safe </li></ul><ul><ul><li>Widely used in library community </li></ul></ul><ul><ul><li>Self-contained OSS replication system, low maintenance, inexpensive </li></ul></ul><ul><ul><li>Harvests resources via web-crawling, OAI-PMH, database queries,… </li></ul></ul><ul><ul><li>Maintains copies through secure p2p protocol </li></ul></ul><ul><ul><li>Zero trust & self repairing </li></ul></ul><ul><li>What does SafeArchive Add </li></ul><ul><ul><li>Auditing – easily monitor number of copies of content in network </li></ul></ul><ul><ul><li>Provisioning – ensure sufficient copies and distribution </li></ul></ul><ul><ul><li>Collaboration – coordinate across partners, monitor resource commitments </li></ul></ul><ul><ul><li>Provide restoration guarantees </li></ul></ul><ul><ul><li>Integrate with Dataverse Network digital repository </li></ul></ul>Policy Based Digital Preservation
    6. 6. Why this tool? <ul><li>To facilitate institutions in making commitments aligned with their policies and incentives, and </li></ul><ul><li>Automatically execute and monitor those commitments and policies </li></ul><ul><li>(Self-interest… Support Data-PASS partnership agreements and transfer protocols) </li></ul><ul><li>This tool provides a targeted vertical slice of functionality through the policy stack… </li></ul>Policy Based Digital Preservation
    7. 7. Another Why… Policy Based Digital Preservation R.I.P.
    8. 8. SafeArchive Components Policy Based Digital Preservation Current Planned
    9. 9. SafeArchive Auditing & Reports Policy Based Digital Preservation Example Fragments
    10. 10. SafeArchive: TRAC Alignment <ul><li>SafeArchive audits provide evidence for compliance with policies on: </li></ul><ul><ul><li>archival storage & preservation (B4) </li></ul></ul><ul><ul><li>independent audit mechanisms (B2) </li></ul></ul><ul><ul><li>appropriate system infrastructure (C1) </li></ul></ul><ul><ul><li>and disaster planning and recover (C3) </li></ul></ul><ul><li>SafeArchive supports embedded policy documentation: </li></ul><ul><ul><li>Organizational infrastructure (A1-4) </li></ul></ul><ul><ul><li>Collection policies (B2.5,2.7,5.2) </li></ul></ul><ul><ul><li>System configuration (C1.7-1.10) </li></ul></ul>Policy Based Digital Preservation
    11. 11. SafeArchive: Schematizing Policy and Behavior Policy Based Digital Preservation “ The repository system must be able to identify the number of copies of all stored digital objects, and the location of each object and their copies.” Policy Schematization Behavior (Operationalization)
    12. 12. The Dataverse Network ® Policy Based Digital Preservation For Organizations For Scholars <ul><li>Brand it like your own website. </li></ul><ul><li>Upload any type of data. </li></ul><ul><li>Establish a persistent data citation </li></ul><ul><li>Facilitate data discovery </li></ul><ul><li>Provide live analysis </li></ul><ul><li>Receive permanent storage space </li></ul><ul><li>Used by archives, libraries, journals, schools </li></ul><ul><li>Enable contributors to upload data </li></ul><ul><li>Organize studies by collections </li></ul><ul><li>Search across a universe of data </li></ul><ul><li>Control access and terms of use </li></ul><ul><li>Federate with catalogs and partners: 
OAI-PMH, LOCKSS, Z39.50, DDI </li></ul>
    13. 13. Dataverse Network – Designed for Research Data Policy Based Digital Preservation
    14. 14. Policy Support in the DataVerse Network <ul><li>Access Control </li></ul><ul><ul><li>Roles: access, curation, administration </li></ul></ul><ul><ul><li>Authenticate by: user, group, network, proxy </li></ul></ul><ul><li>Workflow Policies </li></ul><ul><ul><li>Built-in Versioning and Deaccessioning </li></ul></ul><ul><ul><li>Curatorial Review </li></ul></ul><ul><ul><ul><li>Review of changes prior to release of new version </li></ul></ul></ul><ul><ul><ul><li>Review of new virtual archives </li></ul></ul></ul><ul><li>Legal Policies </li></ul><ul><ul><li>Terms of use: accounts, uploads, downloads </li></ul></ul><ul><ul><li>Hierarchical terms: network, archive, study </li></ul></ul><ul><ul><li>Access request workflow </li></ul></ul>Policy Based Digital Preservation
    15. 15. Archival Collaboration through shared infrastructure : Data-PASS <ul><li>Data-PASS is a broad-based partnership of social science data archives. </li></ul><ul><li>Data-PASS partners collaborate to: </li></ul><ul><ul><li>identify and promote good archival practices </li></ul></ul><ul><ul><li>seek out at-risk research data </li></ul></ul><ul><ul><li>mutually safeguard collections </li></ul></ul><ul><ul><li>build preservation infrastructure </li></ul></ul><ul><li>Data-PASS uses DataVerse: </li></ul><ul><ul><li>Creates federated catalog </li></ul></ul><ul><ul><li>Manages content for some partners </li></ul></ul><ul><ul><li>Provides simple way for organizations to participate in partnership </li></ul></ul><ul><li>Data-PASS uses SafeArchive: </li></ul><ul><ul><li>Collaboration through mutual replication of partner content </li></ul></ul><ul><ul><li>Supports legal transfer agreements </li></ul></ul>Policy Based Digital Preservation
    16. 16. Where Do Policies Fit in Organizational Decisions? Policy Based Digital Preservation NSDA LOCKSS META-ARCHIVE DATA-PASS SAFE DVN IRODS
    17. 17. Ideal integration of policy and technology? <ul><li>Expressed in domain/business language </li></ul><ul><li>Translated to a formal schematization </li></ul><ul><li>Automatically measured by technology </li></ul><ul><li>Directly controls procedures & actions to achieve compliance </li></ul><ul><li>Verifiable translation from business domain policy </li></ul><ul><li>Where do we go from here </li></ul><ul><ul><li>Combine flexibility of IRODS and semantic level of TRAC </li></ul></ul><ul><ul><li>Self-documenting infrastructure </li></ul></ul><ul><ul><li>Formal verifiable translation of policy to schema, and schema to action </li></ul></ul><ul><ul><li>Make good policy easy to implement! </li></ul></ul>Policy Based Digital Preservation Policy: A set of rules and objectives expressed at a high level domain that controls actions at a lower level
    18. 18. Contact Us <ul><li>Micah Altman </li></ul><ul><li> </li></ul><ul><li>SafeArchive </li></ul><ul><li> </li></ul><ul><li>The Dataverse Network ™ </li></ul><ul><li> </li></ul>Policy Based Digital Preservation