SlideShare a Scribd company logo
1 of 65
Download to read offline
The Dark Side of
Digital Preservation:
Distributed Digital
Preservation
Sibyl Schaefer, Mary Molinaro, Katherine Skinner, Chip German, David Pcolar
Sam Meister
Archives Records 2016
Session 606
Saturday August 6, 2016
Atlanta, Georgia
Briefing for SAA, August 6, 2016
Chip German, Program Director, APTrust
and Senior Director, Content Stewardship, at
the University of Virginia Library
chip.german@aptrust.org
APTrust Web Site: aptrust.org
Sustaining Members:
Columbia University
Indiana University
Georgetown University
Johns Hopkins University
North Carolina State University
Pennsylvania State University
Syracuse University
University of Cincinnati
University of Connecticut
University of Maryland
University of Miami
University of Michigan
University of North Carolina
University of Notre Dame
University of Virginia
Virginia Tech
aptrust.org
• Quick description of APTrust:
• Provides simple, collaboratively designed and built preservation
environment (bit-level with fixity checking every 90 days) for the
digital scholarly and cultural record (NOW)
• Will provide collaboratively developed services for that content
(FUTURE), which may include:
• choice of preservation assurance levels/cost plans
• access
• format migration, emulation, software preservation
• analysis tools for research
• Currently uses Amazon Web Services (as back-end platform only)
for metadata
and events
Three content
copies in East Coast
data center (three
separate
availability zones)
S3 service
(preservation)
Three content copies
in West Coast data
center (three
separate availability
zones) + Fedora copy
Money
● Sustaining Members pay $20,000 per year in dues
○ Engage in governance, strategic direction-setting, development, etc.
○ Get allocation of 10 TB of content (with replication, total of 60 TB)
● University of Virginia provides $380,000 per year toward staffing costs
● Institution wishes to deposit more than 10 TB of content?
○ Pass-through of incremental costs in 5 TB blocks of content
■ $2,750 per year
○ Approximately $600,000 cash reserve
○ Serves as DPN ingest/replicating node (full cost-recovery from DPN)
Current metrics in our 2nd year of production
● As of August 2, we have 64,111 content objects from 8 institutions in
production environment
● 17.2 TB of content, 3 million (as of June) PREMIS events
● Additional institutions with content in current testing
Future
● Much more volume in deposits
● Develop collaborations for additional services
● Develop sophisticated reporting for depositors
● Services for smaller cultural heritage organizations, libraries, museums;
new categories of APTrust members
Challenges
● Drive costs downward to encourage preservation
■ Balance between preservation principles and cost (if not, just
storage)
■ Technical architecture enables interchangeable back-end platforms
● Balance between
■ importance of Trusted Digital Repository certification
■ cost and workload of TDR audit (potentially $10-$15K every three
years, although Sibyl is educating me about this)
APTrust as a DPN ingest and replication node
● APTrust members can designate their deposit to DPN (individually sign a
separate agreement with DPN)
● APTrust “unbags” a verified copy of deposit from APTrust bag and
“rebags” it to DPN standard into AWS Glacier service in the AWS VA data
center (where three copies are kept, each in a separate availability zone)
● APTrust then collaboratively replicates the data to two other DPN
replication nodes
Sibyl Schaefer // @archivelle
University of California, San Diego
Digital preservation dark storage network spanning multiple
institutions and geographic regions.
Partners:
University of California, San Diego (UCSD)
National Center for Atmospheric Research (NCAR)
University of Maryland Institute for Advanced
Computer Studies
● Funded by a series of NDIPP grants designed to
develop the technological infrastructure needed for
digital preservation.
● First ingest date: 2008
● Nodes are all non-profit research centers.
● Content-agnostic - depositors form SIPS for ingest,
Chronopolis runs minimal packaging processes on data
● Majority of data is UCSD digital library and research
data
CRL TRAC Certification
2012
Focused on active preservation – constant checking of
items.
Chronopolis
JBOD
Replication service
ACE
NCAR JBOD
Replication service
ACE
UMIACS
Ingest
Ingest service
Postgres
Isilon
Replication service
ACE
UCSD
Intake service
DuraCloud as a pathway to Chronopolis
● Provides an existing hosted user interface, reduced need for new development
● Simplifies the process of moving content in and out of the systems
● Shared institutional values
Chronopolis as a storage provider for DuraCloud
● Extends the DuraCloud network to include a non-commercial, highly distributed,
dark archive option
DuraCloud Vault
● Chronopolis + DuraCloud + DPN
S3 Glacier
DuraCloud
UI
Storage
Mediation
Manifest Store
Original
Content Store
REST API
Sync
Tool
Data Owner
Snapshot
Bridge
Bridge
Storage
Bridge
Application
Maryland
UCSD
NCAR
DPN
Node 2
DPN
Node 3
DPN
Network
Chronopolis and DPN
Same:
● dark storage
● 3 copies at geographically different nodes
Different:
● business model
● stewardship succession
● administration
● membership involvement
SAA August 2016
www.dpn.org
Mary Molinaro
Chief Operating Officer
mary@dpn.org
Dave Pcolar
Chief Technical Officer
dave@dpn.org
DPN is a 50+ member cooperative organization investing in
long-term, scalable digital preservation for the academy.
As a project of Internet2, DPN is able to leverage established Research and
Education Networks to facilitate large scale data movement into preservation
environments across the United States.
Why DPN?
● Despite institutions’ individual efforts only a fraction of
what should be preserved is being preserved.
● How will we ensure that the digital assets created and
collected in our institutions are available to scholars in the
future?
● A solution is needed to save at risk content in a way that
protects against technical failure, natural disaster, and
institutional failure.
What does a solution look like?
1. Geographically distributed
2. Content replicated
3. Audit and repair
4. Technological diversity
5. Meet best practices for repositories
6. Succession agreements
The business model
Currently:
● Members pay an annual fee of $20,000
● Members receive 5TB of preservation
storage per year
● Pay once / stored in perpetuity
● Members can purchase additional TB
Now working on solutions for smaller and
larger institutions - available next year
DPN is in production and
accepting deposits!
What is next?
❖ Address problems we know about
➢ getting started
➢ workflow at the local level
❖ Continuing to examine cost
❖ Increase technical diversity within DPN
➢ evaluating new nodes
❖ Open the door to new members
Meeting the DPN Mission
● To save the most valuable content from the academy in a
dark preservation system that is geographically dispersed
and is supported by replication and audit
● The content is protected by succession agreements made
at the time of deposit
● This content is being preserved for humanity
SAA August 2016
www.dpn.org @DPNdotOrg
Mary Molinaro
Chief Operating Officer
mary@dpn.org
Dave Pcolar
Chief Technical Officer
dave@dpn.org
Isabella Stewart Gardner Museum Orientation
Welcome to the Cooperative!
MetaArchive Cooperative:
Case Study in
Collaboration
Sam Meister
Educopia Institute
www.metaarchive.org
Why collaborate?
To preserve our
digital collections
Maintain integrity for
continued access
To preserve our
institutional missions
Outsourcing core mission is
dangerous
Collaboration Benefits
Co$t effective
Scalable
Transparent
Control
Responsive
Standards-based
MetaArchive History
●2004 - Founded as part of NDIIPP funding
and activities
●2006 - Educopia Institute founded to serve
as administrative home for MetaArchive
●2007 - MetaArchive becomes initial Affiliated
Community
●2014 - Celebrated 10 years of successful
preservation network
39
Community-building
Network management expertise
Administrative, legal, and financial
services
Distributed digital preservation
Institutions maintain control over their own
content
Preservation as a process, not a
push-button exercise
Simplicity in ingest, management
43
Hallmarks
MetaArchive is a cooperative, not a vendor:
All hardware and software assets are owned
by members
Membership fees and storage fees go to a
central pool of support for members’ co-op
activities
44
Cooperative Preservation
◼ Compatible with any repository system
▪ E.g., Dspace, Fedora, Archivalware, ETDb,
CONTENTdm, BePress, Digital Commons, etc
◼ Member institutions determine their own
curatorial practices
◼ MetaArchive is a community of support to
help them make informed decisions
45
Philosophy in Practice
MetaArchive Practices
Basic processes
“Producer” (OAIS) determines curation practices;
brings SIPs to MetaArchive
46
MetaArchive Practices
Basic processes
“Producer” (OAIS) determines curation practices;
brings SIPs to MetaArchive
Multiple copies of AIPs dispersed across
geographical, political, and environmental lines
47
AU AU
AU AU
AU
AU
Member
Cache
Member
Cache
Member
Cache
Member
Cache
Member
Cache
Member
Cache
MetaArchive Practices
Basic processes
“Producer” (OAIS) determines curation practices;
brings SIPs to MetaArchive
Multiple copies of AIPs dispersed across
geographical, political, and environmental lines
Checks and repairs automated across network
49
A
U
A
U A
U
A
U
A
U
A
U
A
U
A
U
A
U A
U
A
U
A
U
A
U
A
U
MetaArchive Practices
Basic processes
“Producer” (OAIS) determines curation practices;
brings SIPs to MetaArchive
Multiple copies of AIPs dispersed across
geographical, political, and environmental lines
Checks and repairs automated across network
Deaccession cycle versus data deletion
52
● Auburn University
● Boston College
● Cal Poly San Luis Obispo
● Carnegie Mellon University
● Consorci de Biblioteques
Universitaris de Catalunya
● Florida State University
● Isabella Stewart Gardner
Museum
● Greene County Public Library
● HBCU Library Alliance
● Indiana State University
● Oregon State University
● Penn State University
● Pontificia Universidade
Catolica do Rio de Janeiro
● Purdue University
● Rockefeller Archives Center
● University of Louisville
● University of North Texas
● University of South Carolina
● Virginia Tech University
Membership
54
Collaborative members: $2.5K/year
Preservation members: $3K/year
Sustaining members: $5.5K/year
Server cost: <$5K/term
Storage cost: $585/TB/year
55
Membership Levels
Membership Responsibilities
●Undertake a 3-year membership term
●Take responsibility for content preparation,
evaluation, staging, and ingest testing
●Monitor collections to ensure accurate
long-term preservation
●Host and maintain a MetaArchive cache
(server) or pay in a technology support fee
●Join and contribute to Committees
56
Governance Model
Independent
Nonprofit
Member owned, operated, governed
Governance Model
Charter
Outlines mission, goals, and organizing
principles of cooperative
Governance Procedures
Outlines roles and responsibilities for officers
and member participants
Member agreements
Outlines terms and responsibilities of
membership
Leadership
Steering Committee
Primary governing body responsible for overall
management, coordination, communication,
and reporting efforts
One representative from each Sustaining
Member
Strategic direction-setting, policy setting,
including membership costs and storage fees
Community Building
Common, neutral center
Distribution of work
Community of engagement
Building knowledge
Accomplishing preservation
Concentrated effort toward unified goals
Community Building
Communicate!
Monthly discussion-topic focused
community calls
Two Steering Committees per year
Community Listserv
MetaArchive Wiki
Sustainability
Cooperative framework
Strong organizational center
Limited dependence on any one member
Collaborative model for long-term preservation
Geographic diversity/distribution
Expertise diffusion
Maintain cost-effective, in-house options
63
www.metaarchive.org
Thank you!
www.metaarchive.org
@metaarchive
sam@educopia.org
@samalanmeister
Preservation and Community Action
Questions?
Questions?
Questions?
Questions?
Questions?

More Related Content

What's hot

Avoiding the 927 Problem: Standards, Digital Preservation, and Communities of...
Avoiding the 927 Problem: Standards, Digital Preservation, and Communities of...Avoiding the 927 Problem: Standards, Digital Preservation, and Communities of...
Avoiding the 927 Problem: Standards, Digital Preservation, and Communities of...Artefactual Systems - Archivematica
 
Bibliographic Infrastructure for Shared Print Management
Bibliographic Infrastructure for Shared Print ManagementBibliographic Infrastructure for Shared Print Management
Bibliographic Infrastructure for Shared Print ManagementConstance Malpas
 
Do Something Now: Why Perfect is the Enemy of Good (Enough) in Digital Preser...
Do Something Now: Why Perfect is the Enemy of Good (Enough) in Digital Preser...Do Something Now: Why Perfect is the Enemy of Good (Enough) in Digital Preser...
Do Something Now: Why Perfect is the Enemy of Good (Enough) in Digital Preser...Artefactual Systems - Archivematica
 
WorldCat Holdings Presentation
WorldCat Holdings PresentationWorldCat Holdings Presentation
WorldCat Holdings PresentationSue Bennett
 
Mwdl overview for_ri_20140823
Mwdl overview for_ri_20140823Mwdl overview for_ri_20140823
Mwdl overview for_ri_20140823Sandra McIntyre
 
JCDL 2015 Doctoral Consortium - A Framework for Aggregating Private and Publi...
JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Publi...JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Publi...
JCDL 2015 Doctoral Consortium - A Framework for Aggregating Private and Publi...Mat Kelly
 
IWMW 2006: Archiving the Web What can Institutions learn from National and In...
IWMW 2006: Archiving the Web What can Institutions learn from National and In...IWMW 2006: Archiving the Web What can Institutions learn from National and In...
IWMW 2006: Archiving the Web What can Institutions learn from National and In...IWMW
 
Archivematica integration handshaking towards comprehensive digital preserva...
Archivematica integration  handshaking towards comprehensive digital preserva...Archivematica integration  handshaking towards comprehensive digital preserva...
Archivematica integration handshaking towards comprehensive digital preserva...Artefactual Systems - Archivematica
 

What's hot (15)

Wittenberg Portico: Lessons From a Community Supported Archive
Wittenberg Portico: Lessons From a Community Supported ArchiveWittenberg Portico: Lessons From a Community Supported Archive
Wittenberg Portico: Lessons From a Community Supported Archive
 
Archivematica presentation to SJSU iSchool Colloquia series
Archivematica presentation to SJSU iSchool Colloquia seriesArchivematica presentation to SJSU iSchool Colloquia series
Archivematica presentation to SJSU iSchool Colloquia series
 
Avoiding the 927 Problem: Standards, Digital Preservation, and Communities of...
Avoiding the 927 Problem: Standards, Digital Preservation, and Communities of...Avoiding the 927 Problem: Standards, Digital Preservation, and Communities of...
Avoiding the 927 Problem: Standards, Digital Preservation, and Communities of...
 
Bibliographic Infrastructure for Shared Print Management
Bibliographic Infrastructure for Shared Print ManagementBibliographic Infrastructure for Shared Print Management
Bibliographic Infrastructure for Shared Print Management
 
Personal Digital Archiving 2015 - NYU - Workshop
Personal Digital Archiving 2015 - NYU - WorkshopPersonal Digital Archiving 2015 - NYU - Workshop
Personal Digital Archiving 2015 - NYU - Workshop
 
Do Something Now: Why Perfect is the Enemy of Good (Enough) in Digital Preser...
Do Something Now: Why Perfect is the Enemy of Good (Enough) in Digital Preser...Do Something Now: Why Perfect is the Enemy of Good (Enough) in Digital Preser...
Do Something Now: Why Perfect is the Enemy of Good (Enough) in Digital Preser...
 
147 eileen fenton2006fall
147 eileen fenton2006fall147 eileen fenton2006fall
147 eileen fenton2006fall
 
WorldCat Holdings Presentation
WorldCat Holdings PresentationWorldCat Holdings Presentation
WorldCat Holdings Presentation
 
Mwdl overview for_ri_20140823
Mwdl overview for_ri_20140823Mwdl overview for_ri_20140823
Mwdl overview for_ri_20140823
 
Workshop slides - Introduction to AtoM and Archivematica
Workshop slides - Introduction to AtoM and ArchivematicaWorkshop slides - Introduction to AtoM and Archivematica
Workshop slides - Introduction to AtoM and Archivematica
 
JCDL 2015 Doctoral Consortium - A Framework for Aggregating Private and Publi...
JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Publi...JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Publi...
JCDL 2015 Doctoral Consortium - A Framework for Aggregating Private and Publi...
 
IWMW 2006: Archiving the Web What can Institutions learn from National and In...
IWMW 2006: Archiving the Web What can Institutions learn from National and In...IWMW 2006: Archiving the Web What can Institutions learn from National and In...
IWMW 2006: Archiving the Web What can Institutions learn from National and In...
 
Archivematica integration handshaking towards comprehensive digital preserva...
Archivematica integration  handshaking towards comprehensive digital preserva...Archivematica integration  handshaking towards comprehensive digital preserva...
Archivematica integration handshaking towards comprehensive digital preserva...
 
2015 05-07-mac
2015 05-07-mac2015 05-07-mac
2015 05-07-mac
 
292 daniel dollar ssp yale_28_may2008
292 daniel dollar ssp yale_28_may2008292 daniel dollar ssp yale_28_may2008
292 daniel dollar ssp yale_28_may2008
 

Similar to Distributed Digital Preservation Networks: A Case Study in Collaboration

collection development policy for e-resources
collection development policy for e-resourcescollection development policy for e-resources
collection development policy for e-resourcesaditi bhandarkar
 
Research Data Services @ Edinburgh: MANTRA & Edinburgh DataShare
Research Data Services @ Edinburgh: MANTRA & Edinburgh DataShareResearch Data Services @ Edinburgh: MANTRA & Edinburgh DataShare
Research Data Services @ Edinburgh: MANTRA & Edinburgh DataShareHistoric Environment Scotland
 
Seeing Connecticut Now and Then: Repository Services that Support Your Best M...
Seeing Connecticut Now and Then: Repository Services that Support Your Best M...Seeing Connecticut Now and Then: Repository Services that Support Your Best M...
Seeing Connecticut Now and Then: Repository Services that Support Your Best M...University of Connecticut Libraries
 
Using Archivemedia to preserve research data
Using Archivemedia to preserve research dataUsing Archivemedia to preserve research data
Using Archivemedia to preserve research dataARDC
 
Disasters at any Scale: The MetaArchive Cooperative’s Community-based approac...
Disasters at any Scale: The MetaArchive Cooperative’s Community-based approac...Disasters at any Scale: The MetaArchive Cooperative’s Community-based approac...
Disasters at any Scale: The MetaArchive Cooperative’s Community-based approac...Educopia
 
Digital Preservation in Production (DPN and DuraCloud Vault)
Digital Preservation in Production (DPN and DuraCloud Vault)Digital Preservation in Production (DPN and DuraCloud Vault)
Digital Preservation in Production (DPN and DuraCloud Vault)DuraSpace
 
A collaborative approach to "filling the digital preservation gap" for Resear...
A collaborative approach to "filling the digital preservation gap" for Resear...A collaborative approach to "filling the digital preservation gap" for Resear...
A collaborative approach to "filling the digital preservation gap" for Resear...Jenny Mitcham
 
L&P Humphrey Stewart-Shearer-Joint Session Project ARC & Federated DMP Pilot
L&P Humphrey Stewart-Shearer-Joint Session Project ARC & Federated DMP PilotL&P Humphrey Stewart-Shearer-Joint Session Project ARC & Federated DMP Pilot
L&P Humphrey Stewart-Shearer-Joint Session Project ARC & Federated DMP PilotCASRAI
 
Birgit Plietzsch “RDM within research computing support” SALCTG June 2013
Birgit Plietzsch “RDM within research computing support” SALCTG June 2013Birgit Plietzsch “RDM within research computing support” SALCTG June 2013
Birgit Plietzsch “RDM within research computing support” SALCTG June 2013SALCTG
 
Managing Digital Content Over Time: Identify and Select
Managing Digital Content Over Time: Identify and SelectManaging Digital Content Over Time: Identify and Select
Managing Digital Content Over Time: Identify and SelectRecollection Wisconsin
 
"Filling the Digital Preservation Gap" with Archivematica
"Filling the Digital Preservation Gap" with Archivematica"Filling the Digital Preservation Gap" with Archivematica
"Filling the Digital Preservation Gap" with ArchivematicaJenny Mitcham
 
Managing Digital Content Over Time: Identify and Select
Managing Digital Content Over Time: Identify and SelectManaging Digital Content Over Time: Identify and Select
Managing Digital Content Over Time: Identify and SelectRecollection Wisconsin
 
Institutional Repositories.pptx
Institutional Repositories.pptxInstitutional Repositories.pptx
Institutional Repositories.pptxSheejamolMathew
 
MetaArchive Cooperative: Case Study in Collaboration
MetaArchive Cooperative: Case Study in CollaborationMetaArchive Cooperative: Case Study in Collaboration
MetaArchive Cooperative: Case Study in CollaborationEducopia
 
Fedora Futures - CNI 2012
Fedora Futures - CNI 2012Fedora Futures - CNI 2012
Fedora Futures - CNI 2012Tom-Cramer
 

Similar to Distributed Digital Preservation Networks: A Case Study in Collaboration (20)

collection development policy for e-resources
collection development policy for e-resourcescollection development policy for e-resources
collection development policy for e-resources
 
Research Data Services @ Edinburgh: MANTRA & Edinburgh DataShare
Research Data Services @ Edinburgh: MANTRA & Edinburgh DataShareResearch Data Services @ Edinburgh: MANTRA & Edinburgh DataShare
Research Data Services @ Edinburgh: MANTRA & Edinburgh DataShare
 
Seeing Connecticut Now and Then: Repository Services that Support Your Best M...
Seeing Connecticut Now and Then: Repository Services that Support Your Best M...Seeing Connecticut Now and Then: Repository Services that Support Your Best M...
Seeing Connecticut Now and Then: Repository Services that Support Your Best M...
 
Using Archivemedia to preserve research data
Using Archivemedia to preserve research dataUsing Archivemedia to preserve research data
Using Archivemedia to preserve research data
 
Disasters at any Scale: The MetaArchive Cooperative’s Community-based approac...
Disasters at any Scale: The MetaArchive Cooperative’s Community-based approac...Disasters at any Scale: The MetaArchive Cooperative’s Community-based approac...
Disasters at any Scale: The MetaArchive Cooperative’s Community-based approac...
 
An Introduction to Digital Preservation
An Introduction to Digital PreservationAn Introduction to Digital Preservation
An Introduction to Digital Preservation
 
Digital Preservation in Production (DPN and DuraCloud Vault)
Digital Preservation in Production (DPN and DuraCloud Vault)Digital Preservation in Production (DPN and DuraCloud Vault)
Digital Preservation in Production (DPN and DuraCloud Vault)
 
RDM & ELNs @ Edinburgh
RDM & ELNs @ EdinburghRDM & ELNs @ Edinburgh
RDM & ELNs @ Edinburgh
 
A collaborative approach to "filling the digital preservation gap" for Resear...
A collaborative approach to "filling the digital preservation gap" for Resear...A collaborative approach to "filling the digital preservation gap" for Resear...
A collaborative approach to "filling the digital preservation gap" for Resear...
 
Sgci esip-7-20-18
Sgci esip-7-20-18Sgci esip-7-20-18
Sgci esip-7-20-18
 
L&P Humphrey Stewart-Shearer-Joint Session Project ARC & Federated DMP Pilot
L&P Humphrey Stewart-Shearer-Joint Session Project ARC & Federated DMP PilotL&P Humphrey Stewart-Shearer-Joint Session Project ARC & Federated DMP Pilot
L&P Humphrey Stewart-Shearer-Joint Session Project ARC & Federated DMP Pilot
 
Caplan and York, 'What It Takes To Make It Last: E-Resources Preservation"
Caplan and York, 'What It Takes To Make It Last:  E-Resources Preservation"Caplan and York, 'What It Takes To Make It Last:  E-Resources Preservation"
Caplan and York, 'What It Takes To Make It Last: E-Resources Preservation"
 
Birgit Plietzsch “RDM within research computing support” SALCTG June 2013
Birgit Plietzsch “RDM within research computing support” SALCTG June 2013Birgit Plietzsch “RDM within research computing support” SALCTG June 2013
Birgit Plietzsch “RDM within research computing support” SALCTG June 2013
 
Managing Digital Content Over Time: Identify and Select
Managing Digital Content Over Time: Identify and SelectManaging Digital Content Over Time: Identify and Select
Managing Digital Content Over Time: Identify and Select
 
"Filling the Digital Preservation Gap" with Archivematica
"Filling the Digital Preservation Gap" with Archivematica"Filling the Digital Preservation Gap" with Archivematica
"Filling the Digital Preservation Gap" with Archivematica
 
Managing Digital Content Over Time: Identify and Select
Managing Digital Content Over Time: Identify and SelectManaging Digital Content Over Time: Identify and Select
Managing Digital Content Over Time: Identify and Select
 
Institutional Repositories.pptx
Institutional Repositories.pptxInstitutional Repositories.pptx
Institutional Repositories.pptx
 
RDM Programme at University of Edinburgh
RDM Programme at University of EdinburghRDM Programme at University of Edinburgh
RDM Programme at University of Edinburgh
 
MetaArchive Cooperative: Case Study in Collaboration
MetaArchive Cooperative: Case Study in CollaborationMetaArchive Cooperative: Case Study in Collaboration
MetaArchive Cooperative: Case Study in Collaboration
 
Fedora Futures - CNI 2012
Fedora Futures - CNI 2012Fedora Futures - CNI 2012
Fedora Futures - CNI 2012
 

More from Educopia

Spanning Our Field Libraries: Mindfully Managing LAM Collaborations
Spanning Our Field Libraries: Mindfully Managing LAM CollaborationsSpanning Our Field Libraries: Mindfully Managing LAM Collaborations
Spanning Our Field Libraries: Mindfully Managing LAM CollaborationsEducopia
 
Communities of Action: Distributed Digital Preservation
Communities of Action: Distributed Digital PreservationCommunities of Action: Distributed Digital Preservation
Communities of Action: Distributed Digital PreservationEducopia
 
Cultivating sustaining-networks-katherine-skinner
Cultivating sustaining-networks-katherine-skinnerCultivating sustaining-networks-katherine-skinner
Cultivating sustaining-networks-katherine-skinnerEducopia
 
From act-to-impact-katherine
From act-to-impact-katherineFrom act-to-impact-katherine
From act-to-impact-katherineEducopia
 
Layers of Leadership for Archives, Libraries and Museums
Layers of Leadership for Archives, Libraries and MuseumsLayers of Leadership for Archives, Libraries and Museums
Layers of Leadership for Archives, Libraries and MuseumsEducopia
 
Building a National Agenda for Saving Online News
Building a National Agenda for Saving Online NewsBuilding a National Agenda for Saving Online News
Building a National Agenda for Saving Online NewsEducopia
 
A Preservation Compass
A Preservation CompassA Preservation Compass
A Preservation CompassEducopia
 

More from Educopia (7)

Spanning Our Field Libraries: Mindfully Managing LAM Collaborations
Spanning Our Field Libraries: Mindfully Managing LAM CollaborationsSpanning Our Field Libraries: Mindfully Managing LAM Collaborations
Spanning Our Field Libraries: Mindfully Managing LAM Collaborations
 
Communities of Action: Distributed Digital Preservation
Communities of Action: Distributed Digital PreservationCommunities of Action: Distributed Digital Preservation
Communities of Action: Distributed Digital Preservation
 
Cultivating sustaining-networks-katherine-skinner
Cultivating sustaining-networks-katherine-skinnerCultivating sustaining-networks-katherine-skinner
Cultivating sustaining-networks-katherine-skinner
 
From act-to-impact-katherine
From act-to-impact-katherineFrom act-to-impact-katherine
From act-to-impact-katherine
 
Layers of Leadership for Archives, Libraries and Museums
Layers of Leadership for Archives, Libraries and MuseumsLayers of Leadership for Archives, Libraries and Museums
Layers of Leadership for Archives, Libraries and Museums
 
Building a National Agenda for Saving Online News
Building a National Agenda for Saving Online NewsBuilding a National Agenda for Saving Online News
Building a National Agenda for Saving Online News
 
A Preservation Compass
A Preservation CompassA Preservation Compass
A Preservation Compass
 

Recently uploaded

React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Nikki Chapple
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 

Recently uploaded (20)

React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 

Distributed Digital Preservation Networks: A Case Study in Collaboration

  • 1. The Dark Side of Digital Preservation: Distributed Digital Preservation Sibyl Schaefer, Mary Molinaro, Katherine Skinner, Chip German, David Pcolar Sam Meister Archives Records 2016 Session 606 Saturday August 6, 2016 Atlanta, Georgia
  • 2. Briefing for SAA, August 6, 2016 Chip German, Program Director, APTrust and Senior Director, Content Stewardship, at the University of Virginia Library chip.german@aptrust.org APTrust Web Site: aptrust.org
  • 3. Sustaining Members: Columbia University Indiana University Georgetown University Johns Hopkins University North Carolina State University Pennsylvania State University Syracuse University University of Cincinnati University of Connecticut University of Maryland University of Miami University of Michigan University of North Carolina University of Notre Dame University of Virginia Virginia Tech aptrust.org
  • 4. • Quick description of APTrust: • Provides simple, collaboratively designed and built preservation environment (bit-level with fixity checking every 90 days) for the digital scholarly and cultural record (NOW) • Will provide collaboratively developed services for that content (FUTURE), which may include: • choice of preservation assurance levels/cost plans • access • format migration, emulation, software preservation • analysis tools for research • Currently uses Amazon Web Services (as back-end platform only)
  • 5. for metadata and events Three content copies in East Coast data center (three separate availability zones) S3 service (preservation)
  • 6. Three content copies in West Coast data center (three separate availability zones) + Fedora copy
  • 7. Money ● Sustaining Members pay $20,000 per year in dues ○ Engage in governance, strategic direction-setting, development, etc. ○ Get allocation of 10 TB of content (with replication, total of 60 TB) ● University of Virginia provides $380,000 per year toward staffing costs ● Institution wishes to deposit more than 10 TB of content? ○ Pass-through of incremental costs in 5 TB blocks of content ■ $2,750 per year ○ Approximately $600,000 cash reserve ○ Serves as DPN ingest/replicating node (full cost-recovery from DPN) Current metrics in our 2nd year of production ● As of August 2, we have 64,111 content objects from 8 institutions in production environment ● 17.2 TB of content, 3 million (as of June) PREMIS events ● Additional institutions with content in current testing
  • 8. Future ● Much more volume in deposits ● Develop collaborations for additional services ● Develop sophisticated reporting for depositors ● Services for smaller cultural heritage organizations, libraries, museums; new categories of APTrust members Challenges ● Drive costs downward to encourage preservation ■ Balance between preservation principles and cost (if not, just storage) ■ Technical architecture enables interchangeable back-end platforms ● Balance between ■ importance of Trusted Digital Repository certification ■ cost and workload of TDR audit (potentially $10-$15K every three years, although Sibyl is educating me about this)
  • 9. APTrust as a DPN ingest and replication node ● APTrust members can designate their deposit to DPN (individually sign a separate agreement with DPN) ● APTrust “unbags” a verified copy of deposit from APTrust bag and “rebags” it to DPN standard into AWS Glacier service in the AWS VA data center (where three copies are kept, each in a separate availability zone) ● APTrust then collaboratively replicates the data to two other DPN replication nodes
  • 10. Sibyl Schaefer // @archivelle University of California, San Diego
  • 11. Digital preservation dark storage network spanning multiple institutions and geographic regions. Partners: University of California, San Diego (UCSD) National Center for Atmospheric Research (NCAR) University of Maryland Institute for Advanced Computer Studies
  • 12. ● Funded by a series of NDIPP grants designed to develop the technological infrastructure needed for digital preservation. ● First ingest date: 2008 ● Nodes are all non-profit research centers. ● Content-agnostic - depositors form SIPS for ingest, Chronopolis runs minimal packaging processes on data ● Majority of data is UCSD digital library and research data
  • 14. Focused on active preservation – constant checking of items.
  • 15. Chronopolis JBOD Replication service ACE NCAR JBOD Replication service ACE UMIACS Ingest Ingest service Postgres Isilon Replication service ACE UCSD Intake service
  • 16. DuraCloud as a pathway to Chronopolis ● Provides an existing hosted user interface, reduced need for new development ● Simplifies the process of moving content in and out of the systems ● Shared institutional values Chronopolis as a storage provider for DuraCloud ● Extends the DuraCloud network to include a non-commercial, highly distributed, dark archive option DuraCloud Vault ● Chronopolis + DuraCloud + DPN
  • 17.
  • 18. S3 Glacier DuraCloud UI Storage Mediation Manifest Store Original Content Store REST API Sync Tool Data Owner Snapshot Bridge Bridge Storage Bridge Application Maryland UCSD NCAR DPN Node 2 DPN Node 3 DPN Network
  • 19. Chronopolis and DPN Same: ● dark storage ● 3 copies at geographically different nodes Different: ● business model ● stewardship succession ● administration ● membership involvement
  • 20. SAA August 2016 www.dpn.org Mary Molinaro Chief Operating Officer mary@dpn.org Dave Pcolar Chief Technical Officer dave@dpn.org
  • 21. DPN is a 50+ member cooperative organization investing in long-term, scalable digital preservation for the academy.
  • 22. As a project of Internet2, DPN is able to leverage established Research and Education Networks to facilitate large scale data movement into preservation environments across the United States.
  • 23. Why DPN? ● Despite institutions’ individual efforts only a fraction of what should be preserved is being preserved. ● How will we ensure that the digital assets created and collected in our institutions are available to scholars in the future? ● A solution is needed to save at risk content in a way that protects against technical failure, natural disaster, and institutional failure.
  • 24. What does a solution look like? 1. Geographically distributed 2. Content replicated 3. Audit and repair 4. Technological diversity 5. Meet best practices for repositories 6. Succession agreements
  • 25. The business model Currently: ● Members pay an annual fee of $20,000 ● Members receive 5TB of preservation storage per year ● Pay once / stored in perpetuity ● Members can purchase additional TB Now working on solutions for smaller and larger institutions - available next year
  • 26. DPN is in production and accepting deposits!
  • 27. What is next? ❖ Address problems we know about ➢ getting started ➢ workflow at the local level ❖ Continuing to examine cost ❖ Increase technical diversity within DPN ➢ evaluating new nodes ❖ Open the door to new members
  • 28. Meeting the DPN Mission ● To save the most valuable content from the academy in a dark preservation system that is geographically dispersed and is supported by replication and audit ● The content is protected by succession agreements made at the time of deposit ● This content is being preserved for humanity
  • 29.
  • 30. SAA August 2016 www.dpn.org @DPNdotOrg Mary Molinaro Chief Operating Officer mary@dpn.org Dave Pcolar Chief Technical Officer dave@dpn.org
  • 31. Isabella Stewart Gardner Museum Orientation Welcome to the Cooperative! MetaArchive Cooperative: Case Study in Collaboration Sam Meister Educopia Institute www.metaarchive.org
  • 33. To preserve our digital collections
  • 36. Outsourcing core mission is dangerous
  • 38.
  • 39. MetaArchive History ●2004 - Founded as part of NDIIPP funding and activities ●2006 - Educopia Institute founded to serve as administrative home for MetaArchive ●2007 - MetaArchive becomes initial Affiliated Community ●2014 - Celebrated 10 years of successful preservation network 39
  • 40.
  • 41.
  • 43. Distributed digital preservation Institutions maintain control over their own content Preservation as a process, not a push-button exercise Simplicity in ingest, management 43 Hallmarks
  • 44. MetaArchive is a cooperative, not a vendor: All hardware and software assets are owned by members Membership fees and storage fees go to a central pool of support for members’ co-op activities 44 Cooperative Preservation
  • 45. ◼ Compatible with any repository system ▪ E.g., Dspace, Fedora, Archivalware, ETDb, CONTENTdm, BePress, Digital Commons, etc ◼ Member institutions determine their own curatorial practices ◼ MetaArchive is a community of support to help them make informed decisions 45 Philosophy in Practice
  • 46. MetaArchive Practices Basic processes “Producer” (OAIS) determines curation practices; brings SIPs to MetaArchive 46
  • 47. MetaArchive Practices Basic processes “Producer” (OAIS) determines curation practices; brings SIPs to MetaArchive Multiple copies of AIPs dispersed across geographical, political, and environmental lines 47
  • 49. MetaArchive Practices Basic processes “Producer” (OAIS) determines curation practices; brings SIPs to MetaArchive Multiple copies of AIPs dispersed across geographical, political, and environmental lines Checks and repairs automated across network 49
  • 52. MetaArchive Practices Basic processes “Producer” (OAIS) determines curation practices; brings SIPs to MetaArchive Multiple copies of AIPs dispersed across geographical, political, and environmental lines Checks and repairs automated across network Deaccession cycle versus data deletion 52
  • 53. ● Auburn University ● Boston College ● Cal Poly San Luis Obispo ● Carnegie Mellon University ● Consorci de Biblioteques Universitaris de Catalunya ● Florida State University ● Isabella Stewart Gardner Museum ● Greene County Public Library ● HBCU Library Alliance ● Indiana State University ● Oregon State University ● Penn State University ● Pontificia Universidade Catolica do Rio de Janeiro ● Purdue University ● Rockefeller Archives Center ● University of Louisville ● University of North Texas ● University of South Carolina ● Virginia Tech University Membership
  • 54. 54
  • 55. Collaborative members: $2.5K/year Preservation members: $3K/year Sustaining members: $5.5K/year Server cost: <$5K/term Storage cost: $585/TB/year 55 Membership Levels
  • 56. Membership Responsibilities ●Undertake a 3-year membership term ●Take responsibility for content preparation, evaluation, staging, and ingest testing ●Monitor collections to ensure accurate long-term preservation ●Host and maintain a MetaArchive cache (server) or pay in a technology support fee ●Join and contribute to Committees 56
  • 58. Governance Model Charter Outlines mission, goals, and organizing principles of cooperative Governance Procedures Outlines roles and responsibilities for officers and member participants Member agreements Outlines terms and responsibilities of membership
  • 59. Leadership Steering Committee Primary governing body responsible for overall management, coordination, communication, and reporting efforts One representative from each Sustaining Member Strategic direction-setting, policy setting, including membership costs and storage fees
  • 60. Community Building Common, neutral center Distribution of work Community of engagement Building knowledge Accomplishing preservation Concentrated effort toward unified goals
  • 61. Community Building Communicate! Monthly discussion-topic focused community calls Two Steering Committees per year Community Listserv MetaArchive Wiki
  • 62. Sustainability Cooperative framework Strong organizational center Limited dependence on any one member Collaborative model for long-term preservation Geographic diversity/distribution Expertise diffusion Maintain cost-effective, in-house options