SlideShare a Scribd company logo
Update at CNI, April 14, 2015
Chip German, APTrust
Andrew Diamond, APTrust
Nathan Tallman, University of Cincinnati
Linda Newman, University of Cincinnati
Jamie Little, University of Miami
APTrust Web Site: aptrust.org
Sustaining Members:
Columbia University
Indiana University
Johns Hopkins University
North Carolina State University
Penn State
Syracuse University
University of Chicago
University of Cincinnati
University of Connecticut
University of Maryland
University of Miami
University of Michigan
University of North Carolina
University of Notre Dame
University of Virginia
Virginia Tech
• Quick description of APTrust: The Academic Preservation Trust (APTrust)
is a consortium of higher education institutions committed to providing
both a preservation repository for digital content and collaboratively
developed services related to that content. The APTrust repository accepts
digital materials in all formats from member institutions, and provides
redundant storage in the cloud. It is managed and operated by the
University of Virginia and is both a deposit location and a replicating node
for the Digital Preservation Network (DPN).
• 39 of the registrants for CNI in Seattle are from APTrust sustaining-member
institutions
APTrust
How it works
Andrew Diamond
Senior Preservation & Systems Engineer
Behind the Scenes,
Step by Step
What You See,
Step by Step
Validation Pending
Validation Succeeded (Detail)
Validation Failed (Detail)
Storage Pending
Record Pending
Record Succeeded
List of Ingested Objects
Object Detail
List of Files
File Detail
Pending Tasks: Restore & Delete
Completed Tasks: Restore & Delete
Detail of Completed Restore
Detail of Completed Delete
A Note on Bag Storage & Restoration
Bag Storage
Bag Restoration
Where’s my data?
On multiple disks in
Northern Virginia.
S3
Glacier
On hard disk, tape or
optical disk in Oregon.
S3 = Simple Storage Service
S3 = Simple Storage Service
99.999999999% Durability (9 nines)
S3 = Simple Storage Service
99.999999999% Durability (9 nines)
Each file stored on multiple devices in
multiple data centers
S3 = Simple Storage Service
99.999999999% Durability (9 nines)
Each file stored on multiple devices in
multiple data centers
Fixity check on transfer
S3 = Simple Storage Service
99.999999999% Durability (9 nines)
Each file stored on multiple devices in
multiple data centers
Fixity check on transfer
Ongoing data integrity checks & self-
healing
Glacier
Long-term storage with 99.999999999%
durability (11 nines)
Not intended for frequent access
Takes five hours or so to restore a file
Chances of losing an
archived object?
Chances of losing an archived object?
1 in 100,000,000,000
S3
Chances of losing an archived object?
1 in 100,000,000,000
S3
Glacier
1 in 10,000,000,000,000
Chances of losing an archived object?
1 in 100,000,000,000
S3
Glacier
1 in 10,000,000,000,000
1 in 1,000,000,000,000,000,000,000,000
chance of losing both to separate events
Chances of losing an archived object?
1 in 100,000,000,000
S3
Glacier
1 in 10,000,000,000,000
1 in 1,000,000,000,000,000,000,000,000
chance of losing both to separate events
Very Slim
Protecting Against Failure
During Ingest
Protecting Against Failure
During Ingest
Microservices with state
Resume intelligently after failure
Protecting Against Failure
After Ingest
Protecting Against Failure
After Ingest
Objects & Files
Metadata & Events
What if...
What if ingest services stop?
They restart themselves and
pick up where they left off.
What if ingest server dies?
We spin up a new one in a
few hours.
What if Fedora service stops?
We restart it manually.
What if Fedora server dies?
We can build a new one from
the data on the hot backup in
Oregon.
What if we lose all of Glacier in
Oregon?
We have all files in S3 in
Virginia. Ingest services and
Fedora are not affected.
What if we lose the whole
Virginia data center?
Fedora Ingest
Services
Metadata and
Events
Files and
Objects
S3
What if we lose the whole
Virginia data center?
Ingest
Services
We can
rebuild
services from
code and
config files in
hours.
What if we lose the whole
Virginia data center?
We can restore all files
from Glacier in Oregon.
Files and
Objects
S3
What if we lose the whole
Virginia data center?
We can restore all
metadata and
events from the
Fedora backup in
Oregon.Fedora
Metadata and
Events
Metadata stored with files
Attached metadata for last-ditch recovery
To lose everything...
To lose everything...
A major catastrophic event in
Virginia and Oregon, or...
To lose everything...
Don’t pay the AWS bill
A major catastrophic event in
Virginia and Oregon, or...
Partner Tools
Partner Tools
Bag Management
Upload
Download
List
Delete
Partner Tools
Bag Management
Upload
Download
List
Delete
...or use open-source AWS libraries for Java, Python, Ruby,
etc.
Partner Tools
Bag Management
Partner Tools
Bag Validation
Validate a bag before you upload it.
Get specific details about what to fix.
Partner Tools
Bag Validation
Partner Tools
Bag Management & Validation
Download from the APTrust wiki
https://sites.google.com/a/aptrust.org/aptrust-wiki/home/partner-tools
APTrust
Repository
https://repository.aptrust.org
Test Site
http://test.aptrust.org
Wiki
https://sites.google.com/a/aptrust.org/aptrust-wiki/
Help
help@aptrust.org
Learning to Run
University of Cincinnati Libraries path to
contributing content to APTrust
Digital Collection Environment
● 3 digital repositories
o DSpace -- UC Digital Resource Commons
o LUNA -- UC Libraries Image and Media Collections
o Hydra -- Scholar@UC Institutional Repository
● 1 website
● 1 Storage Area Network file system
Workflow Considerations
● Preservation vs. Recovery
o platform-specific bags or platform-agnostic bags?
o derivative files
● Bagging level
o record level or collection level?
UC Libraries Workflow Choices
● Recovery is as important as preservation
o Include platform specific files and derivatives
● Track platform provenance
o use bag naming convention or additional fields in
bag-info.text
● Track collection membership
o use bag naming convention or additional fields in
bag-info.text
Bagging (crawling)
1st iteration
● Start with DSpace, Cincinnati Subway and
Street Improvements collection
● Replication Task Suite to produce Bags
o one bag per record
● Bash scripts to transform to APTrust Bags
● Bash scripts to upload to test repository
● Worked well, but…
Scripts available at https://github.com/ntallman/APTrust_tools.
Bagging (walking)
2nd iteration
● Same starting place
● Export collection as Simple Archive
Format
o ~ 8,800 records, ~ 375 GB
● Create bags using bagit-python
o 250 GB is bag-size upper limit
Bagging (walking)
2nd iteration (continued)
● Collection had three series
o two series under 250 GB
▪ each series was put into a single bag
o one series over 250 GB
▪ used a multi-part bag to send to APTrust,
APTrust merges split bags into one
● Bash script to upload bags to production
repository
Bagging (running)
3rd iteration
● Testing of tools created by Jamie Little of
University of Miami
● Jamie can tell you more but…
Long-Term Strategy (marathons)
● Bag for preservation and recovery
● For most content
o one bag per collection
▪ except for Scholar@UC
o manual curation and bagging
▪ except for Scholar@UC
● Use bag-info.txt to capture platform
provenance and collection membership
at
UM Digital Collections Environment
● File system based long-term storage
● CONTENTdm for public access
Creating a workflow
● Began at the command line working with
LOC’s BagIt Java Library
● Bash scripts
● Our initial testing used technical criteria
(size, structural complexity, file types)
What to put in the bag?
● CONTENTdm metadata
● FITS technical metadata
● We maintained our file system structure in
the bags
Beginning APTrust Testing
● Began uploading whole collections as bags
● Early attempts at a user interface using
Zenity
UM’s APTrust Automation
● Sinatra based server application
● HTML & JavaScript interface
Starting Screen
Choosing a Repository ID
Choosing Files
Viewing Download Results
Final Upload
Bagging Strategies
Moved from bag per collection strategy to bag
per file
Changed interface to allow selection of
individual files
APTrust Benefits for Developers
● Wide range of open source tools available
● Uploading to APTrust is no different than
uploading to S3
Where to find us
github.com/UMiamiLibraries
http://merrick.library.miami.edu/
Why are we doing this again?
● Libraries’ and Archives’ historic commitment to long-
term preservation is the raison d'être, a core
justification, for our support of repositories. It’s why our
stakeholders trust us and will place their content in our
repositories.
● We expect APTrust to be our long-term preservation
repository to meet our responsibilities to our
stakeholders.
Why are we doing this again?
● We know from some experience, and reports from our
peers, that backup procedures, while still essential, are
fraught with holes.
● We expect APTrust to be our failover if any near-term
restore from backup proves incomplete.
Linda Newman
Head, Digital Collections
and Repositories
linda.newman@uc.edu
UC Libraries, Digital Collections and Repositories
Nathan Tallman
Digital Content Strategist
nathan.tallman@uc.edu
http://digital.libraries.uc.edu https://scholar.uc.edu
https://github.com/uclibs

More Related Content

Viewers also liked

No More Excuses ~ One District’s Journey Into
No More Excuses ~ One District’s Journey IntoNo More Excuses ~ One District’s Journey Into
No More Excuses ~ One District’s Journey IntoDana (Imlay) Simmers
 
Ozone Contactor Design Improvements using CFD Modeling
Ozone Contactor Design Improvements using CFD ModelingOzone Contactor Design Improvements using CFD Modeling
Ozone Contactor Design Improvements using CFD ModelingThomas Bell-Games
 
Morse_Abigail_FinalLitReview
Morse_Abigail_FinalLitReviewMorse_Abigail_FinalLitReview
Morse_Abigail_FinalLitReviewAbigail Morse
 
Presentaciã³n1
Presentaciã³n1Presentaciã³n1
Presentaciã³n1
ValentinaPD
 
Abhinandan K -Highway Engineer with 3 + yr Expr 2014
Abhinandan K -Highway Engineer with 3 + yr Expr 2014 Abhinandan K -Highway Engineer with 3 + yr Expr 2014
Abhinandan K -Highway Engineer with 3 + yr Expr 2014 Abhinandan K
 
Gravitational waves
Gravitational wavesGravitational waves
Gravitational waves
naveen yadav
 

Viewers also liked (8)

No More Excuses ~ One District’s Journey Into
No More Excuses ~ One District’s Journey IntoNo More Excuses ~ One District’s Journey Into
No More Excuses ~ One District’s Journey Into
 
Rulx Jean Baptiste
Rulx Jean BaptisteRulx Jean Baptiste
Rulx Jean Baptiste
 
Ozone Contactor Design Improvements using CFD Modeling
Ozone Contactor Design Improvements using CFD ModelingOzone Contactor Design Improvements using CFD Modeling
Ozone Contactor Design Improvements using CFD Modeling
 
Ozone - Ohio AWWA 2011
Ozone - Ohio AWWA 2011Ozone - Ohio AWWA 2011
Ozone - Ohio AWWA 2011
 
Morse_Abigail_FinalLitReview
Morse_Abigail_FinalLitReviewMorse_Abigail_FinalLitReview
Morse_Abigail_FinalLitReview
 
Presentaciã³n1
Presentaciã³n1Presentaciã³n1
Presentaciã³n1
 
Abhinandan K -Highway Engineer with 3 + yr Expr 2014
Abhinandan K -Highway Engineer with 3 + yr Expr 2014 Abhinandan K -Highway Engineer with 3 + yr Expr 2014
Abhinandan K -Highway Engineer with 3 + yr Expr 2014
 
Gravitational waves
Gravitational wavesGravitational waves
Gravitational waves
 

Similar to APTrust CNI Seattle April 2015

Burrito digital archive system
Burrito digital archive systemBurrito digital archive system
Burrito digital archive systemEdward Iglesias
 
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...
Amazon Web Services
 
EPrints and the Cloud
EPrints and the CloudEPrints and the Cloud
EPrints and the Cloud
Leslie Carr
 
Archivematica and the digital archival chain of custody
Archivematica and the digital archival chain of custodyArchivematica and the digital archival chain of custody
Archivematica and the digital archival chain of custody
Artefactual Systems - Archivematica
 
Sword Cetis 2007 06 29
Sword Cetis 2007 06 29Sword Cetis 2007 06 29
Sword Cetis 2007 06 29
Sheila MacNeill
 
Moving Data into the Cloud with AWS Transfer Services - May 2017 AWS Online ...
Moving Data into the Cloud with AWS Transfer Services  - May 2017 AWS Online ...Moving Data into the Cloud with AWS Transfer Services  - May 2017 AWS Online ...
Moving Data into the Cloud with AWS Transfer Services - May 2017 AWS Online ...
Amazon Web Services
 
Using AWS for Backup and Restore (backup in the cloud, backup to the cloud, a...
Using AWS for Backup and Restore (backup in the cloud, backup to the cloud, a...Using AWS for Backup and Restore (backup in the cloud, backup to the cloud, a...
Using AWS for Backup and Restore (backup in the cloud, backup to the cloud, a...
Amazon Web Services
 
Sword 2007 06 22
Sword 2007 06 22Sword 2007 06 22
Sword 2007 06 22
Julie Allinson
 
Deep Dive on Amazon S3 - AWS Online Tech Talks
Deep Dive on Amazon S3 - AWS Online Tech TalksDeep Dive on Amazon S3 - AWS Online Tech Talks
Deep Dive on Amazon S3 - AWS Online Tech Talks
Amazon Web Services
 
Data Storage for the Long Haul: Compliance and Archive
Data Storage for the Long Haul: Compliance and ArchiveData Storage for the Long Haul: Compliance and Archive
Data Storage for the Long Haul: Compliance and Archive
Amazon Web Services
 
Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks
Deep Dive on Amazon S3 - March 2017 AWS Online Tech TalksDeep Dive on Amazon S3 - March 2017 AWS Online Tech Talks
Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks
Amazon Web Services
 
Initial presentation of swift (for montreal user group)
Initial presentation of swift (for montreal user group)Initial presentation of swift (for montreal user group)
Initial presentation of swift (for montreal user group)Marcos García
 
Oxford Common File Layout (OCFL)
Oxford Common File Layout (OCFL)Oxford Common File Layout (OCFL)
Oxford Common File Layout (OCFL)
Simeon Warner
 
Ingest and storage options
Ingest and storage optionsIngest and storage options
Ingest and storage options
Amazon Web Services
 
ENT306 Migrating Large Scale Data Sets to the Cloud
ENT306 Migrating Large Scale Data Sets to the CloudENT306 Migrating Large Scale Data Sets to the Cloud
ENT306 Migrating Large Scale Data Sets to the Cloud
Amazon Web Services
 
DepositMOre: Applying tools to increase full-text content in institutional re...
DepositMOre: Applying tools to increase full-text content in institutional re...DepositMOre: Applying tools to increase full-text content in institutional re...
DepositMOre: Applying tools to increase full-text content in institutional re...
depositMO
 
AWS Storage and Data Migration
AWS Storage and Data MigrationAWS Storage and Data Migration
AWS Storage and Data Migration
Amazon Web Services
 
How a Major Australian University Brought Backup to the Cloud
 How a Major Australian University Brought Backup to the Cloud How a Major Australian University Brought Backup to the Cloud
How a Major Australian University Brought Backup to the Cloud
Amazon Web Services
 
How a Major Australian University Brought Backup to the Cloud
 How a Major Australian University Brought Backup to the Cloud How a Major Australian University Brought Backup to the Cloud
How a Major Australian University Brought Backup to the Cloud
Amazon Web Services
 
Digitization Basics for Archives and Special Collections – Part 2: Store and ...
Digitization Basics for Archives and Special Collections – Part 2: Store and ...Digitization Basics for Archives and Special Collections – Part 2: Store and ...
Digitization Basics for Archives and Special Collections – Part 2: Store and ...
WiLS
 

Similar to APTrust CNI Seattle April 2015 (20)

Burrito digital archive system
Burrito digital archive systemBurrito digital archive system
Burrito digital archive system
 
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...
 
EPrints and the Cloud
EPrints and the CloudEPrints and the Cloud
EPrints and the Cloud
 
Archivematica and the digital archival chain of custody
Archivematica and the digital archival chain of custodyArchivematica and the digital archival chain of custody
Archivematica and the digital archival chain of custody
 
Sword Cetis 2007 06 29
Sword Cetis 2007 06 29Sword Cetis 2007 06 29
Sword Cetis 2007 06 29
 
Moving Data into the Cloud with AWS Transfer Services - May 2017 AWS Online ...
Moving Data into the Cloud with AWS Transfer Services  - May 2017 AWS Online ...Moving Data into the Cloud with AWS Transfer Services  - May 2017 AWS Online ...
Moving Data into the Cloud with AWS Transfer Services - May 2017 AWS Online ...
 
Using AWS for Backup and Restore (backup in the cloud, backup to the cloud, a...
Using AWS for Backup and Restore (backup in the cloud, backup to the cloud, a...Using AWS for Backup and Restore (backup in the cloud, backup to the cloud, a...
Using AWS for Backup and Restore (backup in the cloud, backup to the cloud, a...
 
Sword 2007 06 22
Sword 2007 06 22Sword 2007 06 22
Sword 2007 06 22
 
Deep Dive on Amazon S3 - AWS Online Tech Talks
Deep Dive on Amazon S3 - AWS Online Tech TalksDeep Dive on Amazon S3 - AWS Online Tech Talks
Deep Dive on Amazon S3 - AWS Online Tech Talks
 
Data Storage for the Long Haul: Compliance and Archive
Data Storage for the Long Haul: Compliance and ArchiveData Storage for the Long Haul: Compliance and Archive
Data Storage for the Long Haul: Compliance and Archive
 
Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks
Deep Dive on Amazon S3 - March 2017 AWS Online Tech TalksDeep Dive on Amazon S3 - March 2017 AWS Online Tech Talks
Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks
 
Initial presentation of swift (for montreal user group)
Initial presentation of swift (for montreal user group)Initial presentation of swift (for montreal user group)
Initial presentation of swift (for montreal user group)
 
Oxford Common File Layout (OCFL)
Oxford Common File Layout (OCFL)Oxford Common File Layout (OCFL)
Oxford Common File Layout (OCFL)
 
Ingest and storage options
Ingest and storage optionsIngest and storage options
Ingest and storage options
 
ENT306 Migrating Large Scale Data Sets to the Cloud
ENT306 Migrating Large Scale Data Sets to the CloudENT306 Migrating Large Scale Data Sets to the Cloud
ENT306 Migrating Large Scale Data Sets to the Cloud
 
DepositMOre: Applying tools to increase full-text content in institutional re...
DepositMOre: Applying tools to increase full-text content in institutional re...DepositMOre: Applying tools to increase full-text content in institutional re...
DepositMOre: Applying tools to increase full-text content in institutional re...
 
AWS Storage and Data Migration
AWS Storage and Data MigrationAWS Storage and Data Migration
AWS Storage and Data Migration
 
How a Major Australian University Brought Backup to the Cloud
 How a Major Australian University Brought Backup to the Cloud How a Major Australian University Brought Backup to the Cloud
How a Major Australian University Brought Backup to the Cloud
 
How a Major Australian University Brought Backup to the Cloud
 How a Major Australian University Brought Backup to the Cloud How a Major Australian University Brought Backup to the Cloud
How a Major Australian University Brought Backup to the Cloud
 
Digitization Basics for Archives and Special Collections – Part 2: Store and ...
Digitization Basics for Archives and Special Collections – Part 2: Store and ...Digitization Basics for Archives and Special Collections – Part 2: Store and ...
Digitization Basics for Archives and Special Collections – Part 2: Store and ...
 

Recently uploaded

June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Po-Chuan Chen
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 

Recently uploaded (20)

June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 

APTrust CNI Seattle April 2015

Editor's Notes

  1. You have files to archive.
  2. You send files to a receiving bucket at AWS US-East in Northern Virginia.
  3. Files go to S3 preservation bucket. Metadata and PREMIS events go to Fedora. We delete the file from the receiving bucket.
  4. APTrust copies them to Glacier in Oregon.
  5. Ingest and fixity generation.
  6. First ID is ID identifier in Fedora. Use this to look up your file in the web UI. Second ID is URL in S3 preservation bucket.
  7. The white entries are pending restore and delete operations.
  8. Use the apt_download tool or one of your own tools to retrieve the restored file. You should delete the file after restoration because we have to pay for that storage.
  9. AWS modular Oregon data centers are in Umatilla, Boardman, and likely other areas as well. Umatilla: http://www.pcmag.com/article2/0,2817,2396153,00.asp Boardman: http://www.stackdriver.com/public-cloud-data-centers/ Glacier on BDXL optical disk: http://storagemojo.com/2014/04/25/amazons-glacier-secret-bdxl/ Glacier on LDO6 tape: https://storageservers.wordpress.com/2012/10/22/amazon-glacier-storage-media-is-a-tape-driven-solution/ Facebooks uses Blu-Ray: http://arstechnica.com/information-technology/2014/01/facebook-uses-10000-blu-ray-discs-to-create-petabytes-of-cold-storage/ and http://money.cnn.com/2014/08/21/technology/facebook-blu-ray/
  10. Store and retrieve files through HTTPS Restrict access to authorized users Virtually unlimited storage space Pay only for what you use
  11. 99.99% percent availability over 1 year. That means the service is guaranteed by SLA to be up and running 99.99% of the time in any given year. Source: http://aws.amazon.com/s3/details/#durability
  12. S3 does ongoing fixity checks, replacing bad files with good ones. Amazon does not specify the schedule for these ongoing checks, but they use MD5. We do our own SHA-256 checks approximately every 30 days.
  13. Source for Glacier eleven nines: http://media.amazonwebservices.com/AWS_Storage_Options.pdf
  14. Nine 9’s = 1 in one hundred million
  15. Eleven nines = one in ten trillion
  16. This makes it sound like your data will still be there, even when the rest of the universe is gone.
  17. I don’t believe anything is this certain.
  18. If we can’t check the receiving buckets for new items, we try again in a few minutes.
  19. If download fails, we try again in a few minutes. If validate fails, we stop trying to process this bag.
  20. Storing files to S3 may fail on item #5000 of 10,000. System is smart enough to resume where it left off.
  21. A large bag may have tens of thousands of metadata records and events to record in Fedora. If Fedora cannot receive all metadata & events, system will try again later, resuming where it left off.
  22. If copy to Glacier fails after #5000 of 10,000 items, system is smart enough to resume where it left off.
  23. AWS has four zones in their Virginia data center, each with its own separate lines for power, backup power, and internet. If one zone goes down, others are not affected.
  24. S3 files are on separate devices in at least three zones.
  25. Our Fedora server is in one zone. We mirror its data to a second zone, so if the server or disk dies, we can bring up a new server with all the data. Ingest services don’t keep any data other than the bags they are working on. If those services die, we just restart them and they pick up where they left off. If the server dies, it can reconstruct its workload by checking with Fedora and the receiving buckets. It may duplicate work on the files it was processing when the server died, but it won’t lose work.
  26. Hoping to change this.
  27. That may take a couple of days.
  28. This may take a couple of days!
  29. These are scriptable, command-line tools. They are pre-compiled and require only a 5-line configuration file.
  30. These are scriptable, command-line tools. They are pre-compiled and require only a 5-line configuration file.
  31. These are scriptable, command-line tools. They are pre-compiled and require only a 5-line configuration file.
  32. First bag is not valid because it’s missing the md5 manifest. Second bag is valid.
  33. These are scriptable, command-line tools. They are pre-compiled and require only a 5-line configuration file.
  34. AVPreserve info about Glacier: http://www.avpreserve.com/wp-content/uploads/2014/04/AVPreserve_Glacier_profile.pdf
  35. A handful of digital collections are only available through our website Our SAN has archival masters for nearly (if not all) all of your digital collections, sometimes the only archival masters
  36. Starting with DSpace, need to work out strategy for other content sources 1st iteration RTS (outputs every community, collection, and record as an individual item) one bag per record Developed scripts to process RTS produced bags into APTrust compliant bags and upload to APTrust receiving bucket (using AWS CLI) <https://github.com/ntallman/APTrust_tools> abandoned because realized late in game that RTS wiped out dates in the file-system metadata
  37. 2nd iteration Tested bagit-Java and bagit-python, decided to use bagit-python Exported single collection from DSpace in Simple Archive Format 8,822 items, ~377 GB 250 GB bag-size limitation Collection had three series two series under 250 GB each series was put into a single bag one series over 250 GB used a multi-part bag to send to APTrust, APTrust merges split bags into one Should we have done one big multi-part bag? I.e. one bag per collection? Did not worry about tracking platform or collection in this bag Expected to one day wipe this content out and replace with something that matches our final strategy
  38. 2nd iteration Tested bagit-Java and bagit-python, decided to use bagit-python Exported single collection from DSpace in Simple Archive Format 8,822 items, ~377 GB 250 GB bag-size limitation Collection had three series two series under 250 GB each series was put into a single bag one series over 250 GB used a multi-part bag to send to APTrust, APTrust merges split bags into one Should we have done one big multi-part bag? I.e. one bag per collection? Did not worry about tracking platform or collection in this bag Expected to one day wipe this content out and replace with something that matches our final strategy
  39. 3rd iteration (really just testing) I'll let Jamie tell you more about this and I'll just say that I wish I had discovered this before creating my artisanal, hand-crafted bags. Basically a GUI wrapper for my same process except, added file characterization
  40. To Do Develop local preservation database or tool to track which content has been sent to APTrust Scholar@UC is a self-submit institutional repository. No hierarchy of collections (as we traditionally think about them) one record (work), many files Public commitment to long-term preservation of all Scholar content using APTrust for all content The flexible architecture of APTrust Preservation Repository (Are we allowed to say Fluctis?) allows us to make local decisions on how we package our content for preservation