Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
The Dark Side of
Digital Preservation:
Distributed Digital
Preservation
Sibyl Schaefer, Mary Molinaro, Katherine Skinner, ...
Briefing for SAA, August 6, 2016
Chip German, Program Director, APTrust
and Senior Director, Content Stewardship, at
the U...
Sustaining Members:
Columbia University
Indiana University
Georgetown University
Johns Hopkins University
North Carolina S...
• Quick description of APTrust:
• Provides simple, collaboratively designed and built preservation
environment (bit-level ...
for metadata
and events
Three content
copies in East Coast
data center (three
separate
availability zones)
S3 service
(pre...
Three content copies
in West Coast data
center (three
separate availability
zones) + Fedora copy
Money
● Sustaining Members pay $20,000 per year in dues
○ Engage in governance, strategic direction-setting, development, ...
Future
● Much more volume in deposits
● Develop collaborations for additional services
● Develop sophisticated reporting f...
APTrust as a DPN ingest and replication node
● APTrust members can designate their deposit to DPN (individually sign a
sep...
Sibyl Schaefer // @archivelle
University of California, San Diego
Digital preservation dark storage network spanning multiple
institutions and geographic regions.
Partners:
University of C...
● Funded by a series of NDIPP grants designed to
develop the technological infrastructure needed for
digital preservation....
CRL TRAC Certification
2012
Focused on active preservation – constant checking of
items.
Chronopolis
JBOD
Replication service
ACE
NCAR JBOD
Replication service
ACE
UMIACS
Ingest
Ingest service
Postgres
Isilon
Re...
DuraCloud as a pathway to Chronopolis
● Provides an existing hosted user interface, reduced need for new development
● Sim...
S3 Glacier
DuraCloud
UI
Storage
Mediation
Manifest Store
Original
Content Store
REST API
Sync
Tool
Data Owner
Snapshot
Bri...
Chronopolis and DPN
Same:
● dark storage
● 3 copies at geographically different nodes
Different:
● business model
● stewar...
SAA August 2016
www.dpn.org
Mary Molinaro
Chief Operating Officer
mary@dpn.org
Dave Pcolar
Chief Technical Officer
dave@dp...
DPN is a 50+ member cooperative organization investing in
long-term, scalable digital preservation for the academy.
As a project of Internet2, DPN is able to leverage established Research and
Education Networks to facilitate large scale d...
Why DPN?
● Despite institutions’ individual efforts only a fraction of
what should be preserved is being preserved.
● How ...
What does a solution look like?
1. Geographically distributed
2. Content replicated
3. Audit and repair
4. Technological d...
The business model
Currently:
● Members pay an annual fee of $20,000
● Members receive 5TB of preservation
storage per yea...
DPN is in production and
accepting deposits!
What is next?
❖ Address problems we know about
➢ getting started
➢ workflow at the local level
❖ Continuing to examine cos...
Meeting the DPN Mission
● To save the most valuable content from the academy in a
dark preservation system that is geograp...
SAA August 2016
www.dpn.org @DPNdotOrg
Mary Molinaro
Chief Operating Officer
mary@dpn.org
Dave Pcolar
Chief Technical Offi...
Isabella Stewart Gardner Museum Orientation
Welcome to the Cooperative!
MetaArchive Cooperative:
Case Study in
Collaborati...
Why collaborate?
To preserve our
digital collections
Maintain integrity for
continued access
To preserve our
institutional missions
Outsourcing core mission is
dangerous
Collaboration Benefits
Co$t effective
Scalable
Transparent
Control
Responsive
Standards-based
MetaArchive History
●2004 - Founded as part of NDIIPP funding
and activities
●2006 - Educopia Institute founded to serve
a...
Community-building
Network management expertise
Administrative, legal, and financial
services
Distributed digital preservation
Institutions maintain control over their own
content
Preservation as a process, not a
pus...
MetaArchive is a cooperative, not a vendor:
All hardware and software assets are owned
by members
Membership fees and stor...
◼ Compatible with any repository system
▪ E.g., Dspace, Fedora, Archivalware, ETDb,
CONTENTdm, BePress, Digital Commons, e...
MetaArchive Practices
Basic processes
“Producer” (OAIS) determines curation practices;
brings SIPs to MetaArchive
46
MetaArchive Practices
Basic processes
“Producer” (OAIS) determines curation practices;
brings SIPs to MetaArchive
Multiple...
AU AU
AU AU
AU
AU
Member
Cache
Member
Cache
Member
Cache
Member
Cache
Member
Cache
Member
Cache
MetaArchive Practices
Basic processes
“Producer” (OAIS) determines curation practices;
brings SIPs to MetaArchive
Multiple...
A
U
A
U A
U
A
U
A
U
A
U
A
U
A
U
A
U A
U
A
U
A
U
A
U
A
U
MetaArchive Practices
Basic processes
“Producer” (OAIS) determines curation practices;
brings SIPs to MetaArchive
Multiple...
● Auburn University
● Boston College
● Cal Poly San Luis Obispo
● Carnegie Mellon University
● Consorci de Biblioteques
Un...
54
Collaborative members: $2.5K/year
Preservation members: $3K/year
Sustaining members: $5.5K/year
Server cost: <$5K/term
Sto...
Membership Responsibilities
●Undertake a 3-year membership term
●Take responsibility for content preparation,
evaluation, ...
Governance Model
Independent
Nonprofit
Member owned, operated, governed
Governance Model
Charter
Outlines mission, goals, and organizing
principles of cooperative
Governance Procedures
Outlines ...
Leadership
Steering Committee
Primary governing body responsible for overall
management, coordination, communication,
and ...
Community Building
Common, neutral center
Distribution of work
Community of engagement
Building knowledge
Accomplishing pr...
Community Building
Communicate!
Monthly discussion-topic focused
community calls
Two Steering Committees per year
Communit...
Sustainability
Cooperative framework
Strong organizational center
Limited dependence on any one member
Collaborative model...
63
www.metaarchive.org
Thank you!
www.metaarchive.org
@metaarchive
sam@educopia.org
@samalanmeister
Preservation and Community Action
Questions?
Questions?
Questions?
Questions?
Questions?
The Dark Side of Digital Preservation: Distributed Digital Preservation
The Dark Side of Digital Preservation: Distributed Digital Preservation
The Dark Side of Digital Preservation: Distributed Digital Preservation
The Dark Side of Digital Preservation: Distributed Digital Preservation
The Dark Side of Digital Preservation: Distributed Digital Preservation
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
What to Upload to SlideShare
Next
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

The Dark Side of Digital Preservation: Distributed Digital Preservation

Download to read offline

Presentation on distributed digital preservation networks by Sibyl Schaefer, Mary Molinaro, Katherine Skinner, Chip German, David Pcolar, Sam Meister

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

The Dark Side of Digital Preservation: Distributed Digital Preservation

  1. 1. The Dark Side of Digital Preservation: Distributed Digital Preservation Sibyl Schaefer, Mary Molinaro, Katherine Skinner, Chip German, David Pcolar Sam Meister Archives Records 2016 Session 606 Saturday August 6, 2016 Atlanta, Georgia
  2. 2. Briefing for SAA, August 6, 2016 Chip German, Program Director, APTrust and Senior Director, Content Stewardship, at the University of Virginia Library chip.german@aptrust.org APTrust Web Site: aptrust.org
  3. 3. Sustaining Members: Columbia University Indiana University Georgetown University Johns Hopkins University North Carolina State University Pennsylvania State University Syracuse University University of Cincinnati University of Connecticut University of Maryland University of Miami University of Michigan University of North Carolina University of Notre Dame University of Virginia Virginia Tech aptrust.org
  4. 4. • Quick description of APTrust: • Provides simple, collaboratively designed and built preservation environment (bit-level with fixity checking every 90 days) for the digital scholarly and cultural record (NOW) • Will provide collaboratively developed services for that content (FUTURE), which may include: • choice of preservation assurance levels/cost plans • access • format migration, emulation, software preservation • analysis tools for research • Currently uses Amazon Web Services (as back-end platform only)
  5. 5. for metadata and events Three content copies in East Coast data center (three separate availability zones) S3 service (preservation)
  6. 6. Three content copies in West Coast data center (three separate availability zones) + Fedora copy
  7. 7. Money ● Sustaining Members pay $20,000 per year in dues ○ Engage in governance, strategic direction-setting, development, etc. ○ Get allocation of 10 TB of content (with replication, total of 60 TB) ● University of Virginia provides $380,000 per year toward staffing costs ● Institution wishes to deposit more than 10 TB of content? ○ Pass-through of incremental costs in 5 TB blocks of content ■ $2,750 per year ○ Approximately $600,000 cash reserve ○ Serves as DPN ingest/replicating node (full cost-recovery from DPN) Current metrics in our 2nd year of production ● As of August 2, we have 64,111 content objects from 8 institutions in production environment ● 17.2 TB of content, 3 million (as of June) PREMIS events ● Additional institutions with content in current testing
  8. 8. Future ● Much more volume in deposits ● Develop collaborations for additional services ● Develop sophisticated reporting for depositors ● Services for smaller cultural heritage organizations, libraries, museums; new categories of APTrust members Challenges ● Drive costs downward to encourage preservation ■ Balance between preservation principles and cost (if not, just storage) ■ Technical architecture enables interchangeable back-end platforms ● Balance between ■ importance of Trusted Digital Repository certification ■ cost and workload of TDR audit (potentially $10-$15K every three years, although Sibyl is educating me about this)
  9. 9. APTrust as a DPN ingest and replication node ● APTrust members can designate their deposit to DPN (individually sign a separate agreement with DPN) ● APTrust “unbags” a verified copy of deposit from APTrust bag and “rebags” it to DPN standard into AWS Glacier service in the AWS VA data center (where three copies are kept, each in a separate availability zone) ● APTrust then collaboratively replicates the data to two other DPN replication nodes
  10. 10. Sibyl Schaefer // @archivelle University of California, San Diego
  11. 11. Digital preservation dark storage network spanning multiple institutions and geographic regions. Partners: University of California, San Diego (UCSD) National Center for Atmospheric Research (NCAR) University of Maryland Institute for Advanced Computer Studies
  12. 12. ● Funded by a series of NDIPP grants designed to develop the technological infrastructure needed for digital preservation. ● First ingest date: 2008 ● Nodes are all non-profit research centers. ● Content-agnostic - depositors form SIPS for ingest, Chronopolis runs minimal packaging processes on data ● Majority of data is UCSD digital library and research data
  13. 13. CRL TRAC Certification 2012
  14. 14. Focused on active preservation – constant checking of items.
  15. 15. Chronopolis JBOD Replication service ACE NCAR JBOD Replication service ACE UMIACS Ingest Ingest service Postgres Isilon Replication service ACE UCSD Intake service
  16. 16. DuraCloud as a pathway to Chronopolis ● Provides an existing hosted user interface, reduced need for new development ● Simplifies the process of moving content in and out of the systems ● Shared institutional values Chronopolis as a storage provider for DuraCloud ● Extends the DuraCloud network to include a non-commercial, highly distributed, dark archive option DuraCloud Vault ● Chronopolis + DuraCloud + DPN
  17. 17. S3 Glacier DuraCloud UI Storage Mediation Manifest Store Original Content Store REST API Sync Tool Data Owner Snapshot Bridge Bridge Storage Bridge Application Maryland UCSD NCAR DPN Node 2 DPN Node 3 DPN Network
  18. 18. Chronopolis and DPN Same: ● dark storage ● 3 copies at geographically different nodes Different: ● business model ● stewardship succession ● administration ● membership involvement
  19. 19. SAA August 2016 www.dpn.org Mary Molinaro Chief Operating Officer mary@dpn.org Dave Pcolar Chief Technical Officer dave@dpn.org
  20. 20. DPN is a 50+ member cooperative organization investing in long-term, scalable digital preservation for the academy.
  21. 21. As a project of Internet2, DPN is able to leverage established Research and Education Networks to facilitate large scale data movement into preservation environments across the United States.
  22. 22. Why DPN? ● Despite institutions’ individual efforts only a fraction of what should be preserved is being preserved. ● How will we ensure that the digital assets created and collected in our institutions are available to scholars in the future? ● A solution is needed to save at risk content in a way that protects against technical failure, natural disaster, and institutional failure.
  23. 23. What does a solution look like? 1. Geographically distributed 2. Content replicated 3. Audit and repair 4. Technological diversity 5. Meet best practices for repositories 6. Succession agreements
  24. 24. The business model Currently: ● Members pay an annual fee of $20,000 ● Members receive 5TB of preservation storage per year ● Pay once / stored in perpetuity ● Members can purchase additional TB Now working on solutions for smaller and larger institutions - available next year
  25. 25. DPN is in production and accepting deposits!
  26. 26. What is next? ❖ Address problems we know about ➢ getting started ➢ workflow at the local level ❖ Continuing to examine cost ❖ Increase technical diversity within DPN ➢ evaluating new nodes ❖ Open the door to new members
  27. 27. Meeting the DPN Mission ● To save the most valuable content from the academy in a dark preservation system that is geographically dispersed and is supported by replication and audit ● The content is protected by succession agreements made at the time of deposit ● This content is being preserved for humanity
  28. 28. SAA August 2016 www.dpn.org @DPNdotOrg Mary Molinaro Chief Operating Officer mary@dpn.org Dave Pcolar Chief Technical Officer dave@dpn.org
  29. 29. Isabella Stewart Gardner Museum Orientation Welcome to the Cooperative! MetaArchive Cooperative: Case Study in Collaboration Sam Meister Educopia Institute www.metaarchive.org
  30. 30. Why collaborate?
  31. 31. To preserve our digital collections
  32. 32. Maintain integrity for continued access
  33. 33. To preserve our institutional missions
  34. 34. Outsourcing core mission is dangerous
  35. 35. Collaboration Benefits Co$t effective Scalable Transparent Control Responsive Standards-based
  36. 36. MetaArchive History ●2004 - Founded as part of NDIIPP funding and activities ●2006 - Educopia Institute founded to serve as administrative home for MetaArchive ●2007 - MetaArchive becomes initial Affiliated Community ●2014 - Celebrated 10 years of successful preservation network 39
  37. 37. Community-building Network management expertise Administrative, legal, and financial services
  38. 38. Distributed digital preservation Institutions maintain control over their own content Preservation as a process, not a push-button exercise Simplicity in ingest, management 43 Hallmarks
  39. 39. MetaArchive is a cooperative, not a vendor: All hardware and software assets are owned by members Membership fees and storage fees go to a central pool of support for members’ co-op activities 44 Cooperative Preservation
  40. 40. ◼ Compatible with any repository system ▪ E.g., Dspace, Fedora, Archivalware, ETDb, CONTENTdm, BePress, Digital Commons, etc ◼ Member institutions determine their own curatorial practices ◼ MetaArchive is a community of support to help them make informed decisions 45 Philosophy in Practice
  41. 41. MetaArchive Practices Basic processes “Producer” (OAIS) determines curation practices; brings SIPs to MetaArchive 46
  42. 42. MetaArchive Practices Basic processes “Producer” (OAIS) determines curation practices; brings SIPs to MetaArchive Multiple copies of AIPs dispersed across geographical, political, and environmental lines 47
  43. 43. AU AU AU AU AU AU Member Cache Member Cache Member Cache Member Cache Member Cache Member Cache
  44. 44. MetaArchive Practices Basic processes “Producer” (OAIS) determines curation practices; brings SIPs to MetaArchive Multiple copies of AIPs dispersed across geographical, political, and environmental lines Checks and repairs automated across network 49
  45. 45. A U A U A U A U A U A U A U
  46. 46. A U A U A U A U A U A U A U
  47. 47. MetaArchive Practices Basic processes “Producer” (OAIS) determines curation practices; brings SIPs to MetaArchive Multiple copies of AIPs dispersed across geographical, political, and environmental lines Checks and repairs automated across network Deaccession cycle versus data deletion 52
  48. 48. ● Auburn University ● Boston College ● Cal Poly San Luis Obispo ● Carnegie Mellon University ● Consorci de Biblioteques Universitaris de Catalunya ● Florida State University ● Isabella Stewart Gardner Museum ● Greene County Public Library ● HBCU Library Alliance ● Indiana State University ● Oregon State University ● Penn State University ● Pontificia Universidade Catolica do Rio de Janeiro ● Purdue University ● Rockefeller Archives Center ● University of Louisville ● University of North Texas ● University of South Carolina ● Virginia Tech University Membership
  49. 49. 54
  50. 50. Collaborative members: $2.5K/year Preservation members: $3K/year Sustaining members: $5.5K/year Server cost: <$5K/term Storage cost: $585/TB/year 55 Membership Levels
  51. 51. Membership Responsibilities ●Undertake a 3-year membership term ●Take responsibility for content preparation, evaluation, staging, and ingest testing ●Monitor collections to ensure accurate long-term preservation ●Host and maintain a MetaArchive cache (server) or pay in a technology support fee ●Join and contribute to Committees 56
  52. 52. Governance Model Independent Nonprofit Member owned, operated, governed
  53. 53. Governance Model Charter Outlines mission, goals, and organizing principles of cooperative Governance Procedures Outlines roles and responsibilities for officers and member participants Member agreements Outlines terms and responsibilities of membership
  54. 54. Leadership Steering Committee Primary governing body responsible for overall management, coordination, communication, and reporting efforts One representative from each Sustaining Member Strategic direction-setting, policy setting, including membership costs and storage fees
  55. 55. Community Building Common, neutral center Distribution of work Community of engagement Building knowledge Accomplishing preservation Concentrated effort toward unified goals
  56. 56. Community Building Communicate! Monthly discussion-topic focused community calls Two Steering Committees per year Community Listserv MetaArchive Wiki
  57. 57. Sustainability Cooperative framework Strong organizational center Limited dependence on any one member Collaborative model for long-term preservation Geographic diversity/distribution Expertise diffusion Maintain cost-effective, in-house options
  58. 58. 63 www.metaarchive.org Thank you! www.metaarchive.org @metaarchive sam@educopia.org @samalanmeister
  59. 59. Preservation and Community Action
  60. 60. Questions? Questions? Questions? Questions? Questions?

Presentation on distributed digital preservation networks by Sibyl Schaefer, Mary Molinaro, Katherine Skinner, Chip German, David Pcolar, Sam Meister

Views

Total views

1,331

On Slideshare

0

From embeds

0

Number of embeds

2

Actions

Downloads

8

Shares

0

Comments

0

Likes

0

×