Policy Based Digital Preservation: SafeArchive & The Dataverse Network ® Micah Altman, Institute for Quantitative Social Science, Harvard University Prepared for the Research Data Access and Preservation Summit ASIS&T March 2011
Collaborators* Leonid Andreev, Ed Bachman,  Adam Buchbinder,  Ken Bollen, Bryan Beecher, Steve Burling, Kevin Condon, Jonathan Crabtree, Merce Crosas, Gary King, Patrick King, Tom Lipkis, Freeman Lo, Jared Lyle, Marc Maynard, Nancy McGovern, Lois Timms-Ferrarra, Akio Sone, Bob Treacy Research Support Thanks to the Library of Congress (PA#NDP03-1), the National Science Foundation (DMS-0835500, SES 0112072), IMLS (LG-05-09-0041-09),  the Harvard University Library, the Institute for Quantitative Social Science, the Harvard-MIT Data Center, and the Murray Research Archive.  Policy Based Digital Preservation * And co-conspirators
Related Work Reprints available from:  http://maltman.hmdc.harvard.edu Altman, M., and J. Crabtree, 2011. “Using the SafeArchive System: TRAC-Based Auditing of LOCKSS”,  Proceedings of Archiving 2011.  (Forthcoming) Altman, M., Beecher, B., and Crabtree, J.; with L. Andreev, E. Bachman, A. Buchbinder, S. Burling, P. King, M. Maynard. 2009. "A Prototype Platform for Policy-Based Archival Replication."  Against the Grain . 21(2): 44-47. Altman, M., Adams, M., Crabtree, J., Donakowski, D., Maynard, M., Pienta, A., & Young, C. 2009. "Digital preservation through archival collaboration: The Data Preservation Alliance for the Social Sciences."  The American Archivist . 72(1): 169-182 Crosas, M. 2011, “The Dataverse Network: An Open-Source Application for Sharing, Discovering and Preserving Data”,  D-Lib Magazine  17(1/2).  King, Gary (2007), " An Introduction to the Dataverse Network as an Infrastructure for Data Sharing",  Sociological Methods and Research , Vol. 32, No. 2, pp. 173-199 Gutmann,M.  Abrahamson, M, Adams, M.O., Altman, M, Arms, C., Bollen, K., Carlson, M., Crabtree, J., Donakowski, D., King, G., Lyle, J., Maynard, M., Pienta, A., Rockwell, R, Timms-Ferrara L., Young, C., 2009. "From Preserving the Past to Preserving the Future: The Data-PASS Project and the challenges of preserving digital social science data",  Library Trends  57(3):315-33 Policy Based Digital Preservation
SafeArchive:  TRAC-Based Management of LOCKSS  Facilitating collaborative replication and preservation with technology…  Collaborators  declare explicit non-uniform resource commitments Policy  records commitments, storage network properties Storage layer  provides replication, integrity, freshness, versioning  SafeArchive software  provides monitoring, auditing, and provisioning  Content  is harvested through HTTP (LOCKSS) or OAI-PMH Integration of  LOCKSS, The Dataverse Network, TRAC Policy Based Digital Preservation
Adding Policy to LOCKSS LOCKSS Lots of Copies Keep Stuff Safe Widely used in library community Self-contained OSS replication system, low maintenance, inexpensive Harvests resources via web-crawling, OAI-PMH, database queries,… Maintains copies through secure p2p protocol Zero trust & self repairing What does SafeArchive Add Auditing – easily monitor number of copies of content in network Provisioning – ensure sufficient copies and distribution Collaboration – coordinate across partners, monitor resource commitments Provide restoration guarantees Integrate with Dataverse Network digital repository  Policy Based Digital Preservation
Why this tool? To facilitate institutions in making commitments aligned with their policies and incentives,  and  Automatically execute and monitor those commitments and policies (Self-interest… Support Data-PASS partnership agreements and transfer protocols) This tool provides a targeted vertical slice of functionality through the policy stack…  Policy Based Digital Preservation
Another Why… Policy Based Digital Preservation R.I.P.
SafeArchive Components Policy Based Digital Preservation Current  Planned
SafeArchive Auditing & Reports Policy Based Digital Preservation Example Fragments
SafeArchive: TRAC Alignment  SafeArchive audits provide evidence for compliance with policies on: archival storage & preservation (B4) independent audit mechanisms (B2) appropriate system infrastructure (C1) and disaster planning and recover (C3) SafeArchive supports embedded policy documentation: Organizational infrastructure (A1-4) Collection policies (B2.5,2.7,5.2) System configuration (C1.7-1.10) Policy Based Digital Preservation
SafeArchive: Schematizing Policy and Behavior Policy Based Digital Preservation “ The repository system must be able to identify the number of copies of all stored digital objects, and the location of each object and their copies.” Policy Schematization Behavior (Operationalization)
The Dataverse Network ® Policy Based Digital Preservation For Organizations For Scholars Brand it like your own website. Upload any type of data. Establish a persistent data citation Facilitate data discovery Provide live analysis  Receive permanent storage space Used by archives, libraries, journals, schools Enable contributors to upload data Organize studies by collections Search across a universe of data Control access and terms of use Federate with catalogs and partners: 
OAI-PMH, LOCKSS, Z39.50, DDI
Dataverse Network  – Designed for Research  Data Policy Based Digital Preservation
Policy Support in the DataVerse Network Access Control Roles: access, curation, administration Authenticate by: user, group, network, proxy Workflow Policies Built-in Versioning and Deaccessioning Curatorial Review Review of changes prior to release of new version Review of new virtual archives Legal Policies Terms of use: accounts, uploads, downloads Hierarchical terms: network, archive, study Access request workflow Policy Based Digital Preservation
Archival Collaboration through shared infrastructure : Data-PASS  Data-PASS is a broad-based partnership of social science data archives. Data-PASS partners collaborate to: identify and promote good archival practices seek out at-risk research data mutually safeguard collections build preservation infrastructure Data-PASS uses DataVerse: Creates federated catalog Manages content for some partners Provides simple way for organizations to participate in partnership Data-PASS uses SafeArchive: Collaboration through mutual replication of partner content Supports legal transfer agreements Policy Based Digital Preservation
Where Do Policies Fit in Organizational Decisions? Policy Based Digital Preservation NSDA LOCKSS META-ARCHIVE DATA-PASS SAFE DVN IRODS
Ideal integration of policy and technology?  Expressed in domain/business language Translated to a formal schematization Automatically measured by technology Directly controls procedures & actions to achieve compliance Verifiable translation from business domain policy  Where do we go from here Combine flexibility of IRODS and semantic level of TRAC Self-documenting  infrastructure Formal verifiable translation of policy to schema, and schema to action Make good policy easy to implement! Policy Based Digital Preservation Policy: A set of rules and objectives expressed at a high level domain that controls actions at a lower level
Contact Us Micah Altman maltman.hmdc.harvard.edu SafeArchive safearchive.org The Dataverse Network ™ thedata.org Policy Based Digital Preservation

Altman RDAP11 Policy-based Data Management

  • 1.
    Policy Based DigitalPreservation: SafeArchive & The Dataverse Network ® Micah Altman, Institute for Quantitative Social Science, Harvard University Prepared for the Research Data Access and Preservation Summit ASIS&T March 2011
  • 2.
    Collaborators* Leonid Andreev,Ed Bachman, Adam Buchbinder, Ken Bollen, Bryan Beecher, Steve Burling, Kevin Condon, Jonathan Crabtree, Merce Crosas, Gary King, Patrick King, Tom Lipkis, Freeman Lo, Jared Lyle, Marc Maynard, Nancy McGovern, Lois Timms-Ferrarra, Akio Sone, Bob Treacy Research Support Thanks to the Library of Congress (PA#NDP03-1), the National Science Foundation (DMS-0835500, SES 0112072), IMLS (LG-05-09-0041-09), the Harvard University Library, the Institute for Quantitative Social Science, the Harvard-MIT Data Center, and the Murray Research Archive. Policy Based Digital Preservation * And co-conspirators
  • 3.
    Related Work Reprintsavailable from: http://maltman.hmdc.harvard.edu Altman, M., and J. Crabtree, 2011. “Using the SafeArchive System: TRAC-Based Auditing of LOCKSS”, Proceedings of Archiving 2011. (Forthcoming) Altman, M., Beecher, B., and Crabtree, J.; with L. Andreev, E. Bachman, A. Buchbinder, S. Burling, P. King, M. Maynard. 2009. "A Prototype Platform for Policy-Based Archival Replication." Against the Grain . 21(2): 44-47. Altman, M., Adams, M., Crabtree, J., Donakowski, D., Maynard, M., Pienta, A., & Young, C. 2009. "Digital preservation through archival collaboration: The Data Preservation Alliance for the Social Sciences." The American Archivist . 72(1): 169-182 Crosas, M. 2011, “The Dataverse Network: An Open-Source Application for Sharing, Discovering and Preserving Data”, D-Lib Magazine 17(1/2). King, Gary (2007), " An Introduction to the Dataverse Network as an Infrastructure for Data Sharing", Sociological Methods and Research , Vol. 32, No. 2, pp. 173-199 Gutmann,M. Abrahamson, M, Adams, M.O., Altman, M, Arms, C., Bollen, K., Carlson, M., Crabtree, J., Donakowski, D., King, G., Lyle, J., Maynard, M., Pienta, A., Rockwell, R, Timms-Ferrara L., Young, C., 2009. "From Preserving the Past to Preserving the Future: The Data-PASS Project and the challenges of preserving digital social science data", Library Trends 57(3):315-33 Policy Based Digital Preservation
  • 4.
    SafeArchive: TRAC-BasedManagement of LOCKSS Facilitating collaborative replication and preservation with technology… Collaborators declare explicit non-uniform resource commitments Policy records commitments, storage network properties Storage layer provides replication, integrity, freshness, versioning SafeArchive software provides monitoring, auditing, and provisioning Content is harvested through HTTP (LOCKSS) or OAI-PMH Integration of LOCKSS, The Dataverse Network, TRAC Policy Based Digital Preservation
  • 5.
    Adding Policy toLOCKSS LOCKSS Lots of Copies Keep Stuff Safe Widely used in library community Self-contained OSS replication system, low maintenance, inexpensive Harvests resources via web-crawling, OAI-PMH, database queries,… Maintains copies through secure p2p protocol Zero trust & self repairing What does SafeArchive Add Auditing – easily monitor number of copies of content in network Provisioning – ensure sufficient copies and distribution Collaboration – coordinate across partners, monitor resource commitments Provide restoration guarantees Integrate with Dataverse Network digital repository Policy Based Digital Preservation
  • 6.
    Why this tool?To facilitate institutions in making commitments aligned with their policies and incentives, and Automatically execute and monitor those commitments and policies (Self-interest… Support Data-PASS partnership agreements and transfer protocols) This tool provides a targeted vertical slice of functionality through the policy stack… Policy Based Digital Preservation
  • 7.
    Another Why… PolicyBased Digital Preservation R.I.P.
  • 8.
    SafeArchive Components PolicyBased Digital Preservation Current Planned
  • 9.
    SafeArchive Auditing &Reports Policy Based Digital Preservation Example Fragments
  • 10.
    SafeArchive: TRAC Alignment SafeArchive audits provide evidence for compliance with policies on: archival storage & preservation (B4) independent audit mechanisms (B2) appropriate system infrastructure (C1) and disaster planning and recover (C3) SafeArchive supports embedded policy documentation: Organizational infrastructure (A1-4) Collection policies (B2.5,2.7,5.2) System configuration (C1.7-1.10) Policy Based Digital Preservation
  • 11.
    SafeArchive: Schematizing Policyand Behavior Policy Based Digital Preservation “ The repository system must be able to identify the number of copies of all stored digital objects, and the location of each object and their copies.” Policy Schematization Behavior (Operationalization)
  • 12.
    The Dataverse Network® Policy Based Digital Preservation For Organizations For Scholars Brand it like your own website. Upload any type of data. Establish a persistent data citation Facilitate data discovery Provide live analysis Receive permanent storage space Used by archives, libraries, journals, schools Enable contributors to upload data Organize studies by collections Search across a universe of data Control access and terms of use Federate with catalogs and partners: 
OAI-PMH, LOCKSS, Z39.50, DDI
  • 13.
    Dataverse Network – Designed for Research Data Policy Based Digital Preservation
  • 14.
    Policy Support inthe DataVerse Network Access Control Roles: access, curation, administration Authenticate by: user, group, network, proxy Workflow Policies Built-in Versioning and Deaccessioning Curatorial Review Review of changes prior to release of new version Review of new virtual archives Legal Policies Terms of use: accounts, uploads, downloads Hierarchical terms: network, archive, study Access request workflow Policy Based Digital Preservation
  • 15.
    Archival Collaboration throughshared infrastructure : Data-PASS Data-PASS is a broad-based partnership of social science data archives. Data-PASS partners collaborate to: identify and promote good archival practices seek out at-risk research data mutually safeguard collections build preservation infrastructure Data-PASS uses DataVerse: Creates federated catalog Manages content for some partners Provides simple way for organizations to participate in partnership Data-PASS uses SafeArchive: Collaboration through mutual replication of partner content Supports legal transfer agreements Policy Based Digital Preservation
  • 16.
    Where Do PoliciesFit in Organizational Decisions? Policy Based Digital Preservation NSDA LOCKSS META-ARCHIVE DATA-PASS SAFE DVN IRODS
  • 17.
    Ideal integration ofpolicy and technology? Expressed in domain/business language Translated to a formal schematization Automatically measured by technology Directly controls procedures & actions to achieve compliance Verifiable translation from business domain policy Where do we go from here Combine flexibility of IRODS and semantic level of TRAC Self-documenting infrastructure Formal verifiable translation of policy to schema, and schema to action Make good policy easy to implement! Policy Based Digital Preservation Policy: A set of rules and objectives expressed at a high level domain that controls actions at a lower level
  • 18.
    Contact Us MicahAltman maltman.hmdc.harvard.edu SafeArchive safearchive.org The Dataverse Network ™ thedata.org Policy Based Digital Preservation

Editor's Notes

  • #2 This work “Trustworthy Repositories, Organizations & Infrastructure”, by Micah Altman (http://redistricting.info) is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.