A presentation by the Academic Preservation Trust consortium on its progress, given at the Coalition for Networked Information (CNI) Spring meeting on April 14, 2015, at the Seattle Westin.
SWORD (Simple Web-service Offering Repository Deposit) will take forward the Deposit protocol developed by a small working group as part of the JISC Digital Repositories Programme by implementing it as a lightweight web-service in four major repository software platforms: EPrints, DSpace, Fedora and IntraLibrary. The existing protocol documentation will be finalised by project partners and a prototype ‘smart deposit’ tool will be developed to facilitate easier and more effective population of repositories.
Time Travelling Analyst: The Things That Only a Time Machine Can Tell Me... Ross Spencer
My time at Archives New Zealand has been my first, truly hands-on experience with born-digital collections. Material transferred in 2008 containing files created over the period of an entire decade has been the focus of my first born-digital ingests with the organisation. The work in the Systems Standards and Strategies team (SSS) at Archives New Zealand has been split into two initial sets of ingests, one set of two followed by another; the idea: to create processes and develop them incrementally. My surprise after the first two ingests back in late November and December 2014, is that five months into the next two, we're still finding challenges - daily! With only the slightest nod to digital preservation and my title as digital preservation analyst, this paper discusses more the difficulties of wrestling core information received from agencies, organizational issues, and the tools available to us in this agency. Organizations and records managers have an opportunity to make recommendations to their users that can ensure issues are minimized when we place records into long-
term preservation, and over the next few years we'll collect plenty of evidence to see the number of surprises reduced, but it is this author's assertion that despite best efforts, we're always going to receive badly behaved digital material for reasons not always foreseen, and that, despite concerted efforts at control, any agency receiving born-digital material must be prepared to understand it, and must also be prepared to manage it through different mitigation strategies - depending on appetite. This paper will introduce the challenges faced while processing the organization’s first born-digital material looking at where the issues arose and why, before concluding that we must learn by doing, and that the collection of evidence and understanding 'real world' scenarios is our best opportunity to reduce surprises even if we can’t reduce them to zero.
SWORD (Simple Web-service Offering Repository Deposit) will take forward the Deposit protocol developed by a small working group as part of the JISC Digital Repositories Programme by implementing it as a lightweight web-service in four major repository software platforms: EPrints, DSpace, Fedora and IntraLibrary. The existing protocol documentation will be finalised by project partners and a prototype ‘smart deposit’ tool will be developed to facilitate easier and more effective population of repositories.
Time Travelling Analyst: The Things That Only a Time Machine Can Tell Me... Ross Spencer
My time at Archives New Zealand has been my first, truly hands-on experience with born-digital collections. Material transferred in 2008 containing files created over the period of an entire decade has been the focus of my first born-digital ingests with the organisation. The work in the Systems Standards and Strategies team (SSS) at Archives New Zealand has been split into two initial sets of ingests, one set of two followed by another; the idea: to create processes and develop them incrementally. My surprise after the first two ingests back in late November and December 2014, is that five months into the next two, we're still finding challenges - daily! With only the slightest nod to digital preservation and my title as digital preservation analyst, this paper discusses more the difficulties of wrestling core information received from agencies, organizational issues, and the tools available to us in this agency. Organizations and records managers have an opportunity to make recommendations to their users that can ensure issues are minimized when we place records into long-
term preservation, and over the next few years we'll collect plenty of evidence to see the number of surprises reduced, but it is this author's assertion that despite best efforts, we're always going to receive badly behaved digital material for reasons not always foreseen, and that, despite concerted efforts at control, any agency receiving born-digital material must be prepared to understand it, and must also be prepared to manage it through different mitigation strategies - depending on appetite. This paper will introduce the challenges faced while processing the organization’s first born-digital material looking at where the issues arose and why, before concluding that we must learn by doing, and that the collection of evidence and understanding 'real world' scenarios is our best opportunity to reduce surprises even if we can’t reduce them to zero.
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...Amazon Web Services
AWS gives designers of enterprise storage systems a completely new set of options. Aimed at enterprise storage specialists and managers of cloud-integration teams, this session gives you the tools and perspective to confidently integrate your storage workloads with AWS. We show working use cases, a thorough TCO model, and detailed customer blueprints. Throughout we analyze how data-tiering options measure up to the design criteria that matter most: performance, efficiency, cost, security, and integration.
Moving Data into the Cloud with AWS Transfer Services - May 2017 AWS Online ...Amazon Web Services
Learning Objectives:
- Learn about the benefits and features to help you get the most out of your DynamoDB database
- Learn how customers have successfully used these features to deploy their applications
Developers who build applications for ad tech, finance, gaming, IoT, and other performance sensitive use cases require response times in the range of 1-2 milliseconds. They are also constantly pushing to further reduce this latency to achieve a competitive advantage. Amazon DynamoDB Accelerator (DAX) is a fully-managed, highly-available, in-memory cache for DynamoDB that delivers up to a 10x performance improvement – from milliseconds to microseconds – even at millions of requests per second. In this tech talk, we go over best practices for multiple use cases, including gaming, Ad Tech, IoT, and we’ll explore new features to help you get the most out of your DynamoDB database, including DynamoDB Accelerator (DAX), TTL, Tagging, VPC Endpoints for DynamoDB. Learn how customers have successfully used these features to deploy their applications.
Using AWS for Backup and Restore (backup in the cloud, backup to the cloud, a...Amazon Web Services
Companies are using AWS to create and deploy efficient, fast, and cost-effective backup and restore capabilities to protect critical IT systems without incurring the infrastructure expense of a second physical site. In this session, we will talk about cloud-based services AWS provides to enable robust backup and rapid recovery of your IT infrastructure and data.
Learning Objectives:
- Review best practices for to reduce costs, protect against data loss, and increase performance in Amazon S3
- Learn about new S3 storage management features that help you align storage with business needs
- Understand data security capabilities available in S3 that help protect against malicious or accidental deletion or other data loss
Learn about new and existing Amazon S3 features that can help you better protect your data, save on cost, and improve usability, security, and performance. We will cover a wide variety of Amazon S3 features and go into depth on several newer features with configuration and code snippets, so you can apply the learnings on to your object storage workloads.
This session is for IT pros working with compliance managers to deliver solutions that lower costs and still meet compliance demands. You will learn how to move large scale data stores to the cloud, while remaining compliant with existing regulations. Services mentioned: S3, Glacier and the Vault Lock feature, Snowball, ingestion services.
Deep Dive on Amazon S3 - March 2017 AWS Online Tech TalksAmazon Web Services
Learn about new and existing Amazon S3 features that can help you better protect your data, save on cost, and improve usability, security, and performance. We will cover a wide variety of Amazon S3 features and go into depth on several newer features with configuration and code snippets, so you can apply the learnings on to your object storage workloads.
Learning Objectives:
• Review best practices for to reduce costs, protect against data loss, and increase performance in Amazon S3
• Learn about new S3 storage management features that help you align storage with business needs
• Understand data security capabilities available in S3 that help protect against malicious or accidental deletion or other data loss
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...Amazon Web Services
AWS gives designers of enterprise storage systems a completely new set of options. Aimed at enterprise storage specialists and managers of cloud-integration teams, this session gives you the tools and perspective to confidently integrate your storage workloads with AWS. We show working use cases, a thorough TCO model, and detailed customer blueprints. Throughout we analyze how data-tiering options measure up to the design criteria that matter most: performance, efficiency, cost, security, and integration.
Moving Data into the Cloud with AWS Transfer Services - May 2017 AWS Online ...Amazon Web Services
Learning Objectives:
- Learn about the benefits and features to help you get the most out of your DynamoDB database
- Learn how customers have successfully used these features to deploy their applications
Developers who build applications for ad tech, finance, gaming, IoT, and other performance sensitive use cases require response times in the range of 1-2 milliseconds. They are also constantly pushing to further reduce this latency to achieve a competitive advantage. Amazon DynamoDB Accelerator (DAX) is a fully-managed, highly-available, in-memory cache for DynamoDB that delivers up to a 10x performance improvement – from milliseconds to microseconds – even at millions of requests per second. In this tech talk, we go over best practices for multiple use cases, including gaming, Ad Tech, IoT, and we’ll explore new features to help you get the most out of your DynamoDB database, including DynamoDB Accelerator (DAX), TTL, Tagging, VPC Endpoints for DynamoDB. Learn how customers have successfully used these features to deploy their applications.
Using AWS for Backup and Restore (backup in the cloud, backup to the cloud, a...Amazon Web Services
Companies are using AWS to create and deploy efficient, fast, and cost-effective backup and restore capabilities to protect critical IT systems without incurring the infrastructure expense of a second physical site. In this session, we will talk about cloud-based services AWS provides to enable robust backup and rapid recovery of your IT infrastructure and data.
Learning Objectives:
- Review best practices for to reduce costs, protect against data loss, and increase performance in Amazon S3
- Learn about new S3 storage management features that help you align storage with business needs
- Understand data security capabilities available in S3 that help protect against malicious or accidental deletion or other data loss
Learn about new and existing Amazon S3 features that can help you better protect your data, save on cost, and improve usability, security, and performance. We will cover a wide variety of Amazon S3 features and go into depth on several newer features with configuration and code snippets, so you can apply the learnings on to your object storage workloads.
This session is for IT pros working with compliance managers to deliver solutions that lower costs and still meet compliance demands. You will learn how to move large scale data stores to the cloud, while remaining compliant with existing regulations. Services mentioned: S3, Glacier and the Vault Lock feature, Snowball, ingestion services.
Deep Dive on Amazon S3 - March 2017 AWS Online Tech TalksAmazon Web Services
Learn about new and existing Amazon S3 features that can help you better protect your data, save on cost, and improve usability, security, and performance. We will cover a wide variety of Amazon S3 features and go into depth on several newer features with configuration and code snippets, so you can apply the learnings on to your object storage workloads.
Learning Objectives:
• Review best practices for to reduce costs, protect against data loss, and increase performance in Amazon S3
• Learn about new S3 storage management features that help you align storage with business needs
• Understand data security capabilities available in S3 that help protect against malicious or accidental deletion or other data loss
Data migration at petabyte scale is now a simple service from AWS. You can easily migrate large volumes of data from on-premises environments to the cloud, quickly get started with the cloud as a backup target, or burst workloads between your on-premises environments and the AWS Cloud. Learn about AWS Snowball, AWS Snowball Edge, AWS Snowmobile and AWS Storage Gateway, and understand which one is the right fit for your requirements. We will go through customer use cases, review the different applications used, and help you cut IT spend and management time on hardware and backup solutions.
DepositMOre: Applying tools to increase full-text content in institutional re...depositMO
This talk reconsiders the conventional approach towards repository deposit, by reconceptualising deposit workflow and asking how and whether we can use this to substantially increase deposit volumes. We will recap the user tools from DepositMO project, which were underpinned by the then-new SWORDv2. In contrast, deposit tools developed in DepositMOre are EPrints apps aimed at repository managers rather than users. We will explore the use and current status of these apps, and the repository implementations. Is more content the primary target for repositories? We will review the challenges facing repositories as the context for understanding the outcomes of this work and future directions. This talk was presented to the RSP event Increasing the full-text deposits in your institutional repository, in London, 12 June 2013.
Darry Osborne takes us on a journey across the AWS Cloud-based storage solutions. He explains S3, Glacier, Snowball and ends with Snowmobile, petabyte-scale data migration. He also talks about use cases, and customer stories. Presented in Montreal at the AWS Innovate show.
How a Major Australian University Brought Backup to the CloudAmazon Web Services
For years the University of Wollongong, near Sydney, Australia, used a tape backup process for their most crucial asset, their data. Over time, though, tape became less reliable and prone to errors. If a restore was needed for data loss or legal reasons, manual recalls were expensive and time consuming. The university’s IT leadership knew they needed to make a change. In this webinar, learn how the University of Wollongong leveraged NetApp® AltaVault™ and Amazon S3 to move to secure cloud backup, speed up data restore operations, and simplify DR testing, while reducing costs.
How a Major Australian University Brought Backup to the CloudAmazon Web Services
For years the University of Wollongong, near Sydney, Australia, used a tape backup process for their most crucial asset, their data. Over time, though, tape became less reliable and prone to errors. If a restore was needed for data loss or legal reasons, manual recalls were expensive and time consuming. The university’s IT leadership knew they needed to make a change. In this webinar, learn how the University of Wollongong leveraged NetApp® AltaVault™ and Amazon S3 to move to secure cloud backup, speed up data restore operations, and simplify DR testing, while reducing costs.
AWS Speaker: Mike Ruiz, Partner Solutions Architect
NetApp Speaker: Chris Wong, Technical Marketing Manager
Customer Speaker: Craig Muller, Platforms Team Lead, University of Wollongong
Digitization Basics for Archives and Special Collections – Part 2: Store and ...WiLS
Catherine Phan, Metadata Librarian, University of Wisconsin Digital Collections Center
Jesse Henderson, Digital Services Librarian, University of Wisconsin Digital Collections Center
Steven Dast, Digital Asset Librarian, University of Wisconsin Digital Collections Center
This is the second part of a two-part, full-day workshop introducing the core elements of creating digital collections of historic photographs, documents and other archival materials. Part 2 focuses on sharing your digitized materials with the world and steps you can take to ensure that they’ll remain usable and accessible into the future. We’ll define metadata and why it’s important, and consider approaches to creating descriptive metadata for discovery of historical resources. We’ll examine the issue of digital preservation, including practical steps you can take to preserve your digital content with limited resources. And we’ll think about digitization as a path to community engagement, including reaching out to your community for content and promoting your digital collections to your users.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
1. Update at CNI, April 14, 2015
Chip German, APTrust
Andrew Diamond, APTrust
Nathan Tallman, University of Cincinnati
Linda Newman, University of Cincinnati
Jamie Little, University of Miami
APTrust Web Site: aptrust.org
2. Sustaining Members:
Columbia University
Indiana University
Johns Hopkins University
North Carolina State University
Penn State
Syracuse University
University of Chicago
University of Cincinnati
University of Connecticut
University of Maryland
University of Miami
University of Michigan
University of North Carolina
University of Notre Dame
University of Virginia
Virginia Tech
3. • Quick description of APTrust: The Academic Preservation Trust (APTrust)
is a consortium of higher education institutions committed to providing
both a preservation repository for digital content and collaboratively
developed services related to that content. The APTrust repository accepts
digital materials in all formats from member institutions, and provides
redundant storage in the cloud. It is managed and operated by the
University of Virginia and is both a deposit location and a replicating node
for the Digital Preservation Network (DPN).
• 39 of the registrants for CNI in Seattle are from APTrust sustaining-member
institutions
40. S3 = Simple Storage Service
99.999999999% Durability (9 nines)
41. S3 = Simple Storage Service
99.999999999% Durability (9 nines)
Each file stored on multiple devices in
multiple data centers
42. S3 = Simple Storage Service
99.999999999% Durability (9 nines)
Each file stored on multiple devices in
multiple data centers
Fixity check on transfer
43. S3 = Simple Storage Service
99.999999999% Durability (9 nines)
Each file stored on multiple devices in
multiple data centers
Fixity check on transfer
Ongoing data integrity checks & self-
healing
44. Glacier
Long-term storage with 99.999999999%
durability (11 nines)
Not intended for frequent access
Takes five hours or so to restore a file
47. Chances of losing an archived object?
1 in 100,000,000,000
S3
Glacier
1 in 10,000,000,000,000
48. Chances of losing an archived object?
1 in 100,000,000,000
S3
Glacier
1 in 10,000,000,000,000
1 in 1,000,000,000,000,000,000,000,000
chance of losing both to separate events
49. Chances of losing an archived object?
1 in 100,000,000,000
S3
Glacier
1 in 10,000,000,000,000
1 in 1,000,000,000,000,000,000,000,000
chance of losing both to separate events
Very Slim
87. Digital Collection Environment
● 3 digital repositories
o DSpace -- UC Digital Resource Commons
o LUNA -- UC Libraries Image and Media Collections
o Hydra -- Scholar@UC Institutional Repository
● 1 website
● 1 Storage Area Network file system
88. Workflow Considerations
● Preservation vs. Recovery
o platform-specific bags or platform-agnostic bags?
o derivative files
● Bagging level
o record level or collection level?
89. UC Libraries Workflow Choices
● Recovery is as important as preservation
o Include platform specific files and derivatives
● Track platform provenance
o use bag naming convention or additional fields in
bag-info.text
● Track collection membership
o use bag naming convention or additional fields in
bag-info.text
90. Bagging (crawling)
1st iteration
● Start with DSpace, Cincinnati Subway and
Street Improvements collection
● Replication Task Suite to produce Bags
o one bag per record
● Bash scripts to transform to APTrust Bags
● Bash scripts to upload to test repository
● Worked well, but…
Scripts available at https://github.com/ntallman/APTrust_tools.
91. Bagging (walking)
2nd iteration
● Same starting place
● Export collection as Simple Archive
Format
o ~ 8,800 records, ~ 375 GB
● Create bags using bagit-python
o 250 GB is bag-size upper limit
92. Bagging (walking)
2nd iteration (continued)
● Collection had three series
o two series under 250 GB
▪ each series was put into a single bag
o one series over 250 GB
▪ used a multi-part bag to send to APTrust,
APTrust merges split bags into one
● Bash script to upload bags to production
repository
94. Long-Term Strategy (marathons)
● Bag for preservation and recovery
● For most content
o one bag per collection
▪ except for Scholar@UC
o manual curation and bagging
▪ except for Scholar@UC
● Use bag-info.txt to capture platform
provenance and collection membership
96. UM Digital Collections Environment
● File system based long-term storage
● CONTENTdm for public access
97. Creating a workflow
● Began at the command line working with
LOC’s BagIt Java Library
● Bash scripts
● Our initial testing used technical criteria
(size, structural complexity, file types)
98. What to put in the bag?
● CONTENTdm metadata
● FITS technical metadata
● We maintained our file system structure in
the bags
99. Beginning APTrust Testing
● Began uploading whole collections as bags
● Early attempts at a user interface using
Zenity
106. Bagging Strategies
Moved from bag per collection strategy to bag
per file
Changed interface to allow selection of
individual files
107. APTrust Benefits for Developers
● Wide range of open source tools available
● Uploading to APTrust is no different than
uploading to S3
108. Where to find us
github.com/UMiamiLibraries
http://merrick.library.miami.edu/
109. Why are we doing this again?
● Libraries’ and Archives’ historic commitment to long-
term preservation is the raison d'être, a core
justification, for our support of repositories. It’s why our
stakeholders trust us and will place their content in our
repositories.
● We expect APTrust to be our long-term preservation
repository to meet our responsibilities to our
stakeholders.
110. Why are we doing this again?
● We know from some experience, and reports from our
peers, that backup procedures, while still essential, are
fraught with holes.
● We expect APTrust to be our failover if any near-term
restore from backup proves incomplete.
111. Linda Newman
Head, Digital Collections
and Repositories
linda.newman@uc.edu
UC Libraries, Digital Collections and Repositories
Nathan Tallman
Digital Content Strategist
nathan.tallman@uc.edu
http://digital.libraries.uc.edu https://scholar.uc.edu
https://github.com/uclibs
Editor's Notes
You have files to archive.
You send files to a receiving bucket at AWS US-East in Northern Virginia.
Files go to S3 preservation bucket. Metadata and PREMIS events go to Fedora. We delete the file from the receiving bucket.
APTrust copies them to Glacier in Oregon.
Ingest and fixity generation.
First ID is ID identifier in Fedora. Use this to look up your file in the web UI. Second ID is URL in S3 preservation bucket.
The white entries are pending restore and delete operations.
Use the apt_download tool or one of your own tools to retrieve the restored file. You should delete the file after restoration because we have to pay for that storage.
AWS modular Oregon data centers are in Umatilla, Boardman, and likely other areas as well.
Umatilla: http://www.pcmag.com/article2/0,2817,2396153,00.asp
Boardman: http://www.stackdriver.com/public-cloud-data-centers/
Glacier on BDXL optical disk: http://storagemojo.com/2014/04/25/amazons-glacier-secret-bdxl/
Glacier on LDO6 tape: https://storageservers.wordpress.com/2012/10/22/amazon-glacier-storage-media-is-a-tape-driven-solution/
Facebooks uses Blu-Ray: http://arstechnica.com/information-technology/2014/01/facebook-uses-10000-blu-ray-discs-to-create-petabytes-of-cold-storage/ and http://money.cnn.com/2014/08/21/technology/facebook-blu-ray/
Store and retrieve files through HTTPS
Restrict access to authorized users
Virtually unlimited storage space
Pay only for what you use
99.99% percent availability over 1 year. That means the service is guaranteed by SLA to be up and running 99.99% of the time in any given year.
Source: http://aws.amazon.com/s3/details/#durability
S3 does ongoing fixity checks, replacing bad files with good ones. Amazon does not specify the schedule for these ongoing checks, but they use MD5. We do our own SHA-256 checks approximately every 30 days.
Source for Glacier eleven nines: http://media.amazonwebservices.com/AWS_Storage_Options.pdf
Nine 9’s = 1 in one hundred million
Eleven nines = one in ten trillion
This makes it sound like your data will still be there, even when the rest of the universe is gone.
I don’t believe anything is this certain.
If we can’t check the receiving buckets for new items, we try again in a few minutes.
If download fails, we try again in a few minutes.
If validate fails, we stop trying to process this bag.
Storing files to S3 may fail on item #5000 of 10,000. System is smart enough to resume where it left off.
A large bag may have tens of thousands of metadata records and events to record in Fedora. If Fedora cannot receive all metadata & events, system will try again later, resuming where it left off.
If copy to Glacier fails after #5000 of 10,000 items, system is smart enough to resume where it left off.
AWS has four zones in their Virginia data center, each with its own separate lines for power, backup power, and internet. If one zone goes down, others are not affected.
S3 files are on separate devices in at least three zones.
Our Fedora server is in one zone. We mirror its data to a second zone, so if the server or disk dies, we can bring up a new server with all the data.
Ingest services don’t keep any data other than the bags they are working on. If those services die, we just restart them and they pick up where they left off. If the server dies, it can reconstruct its workload by checking with Fedora and the receiving buckets. It may duplicate work on the files it was processing when the server died, but it won’t lose work.
Hoping to change this.
That may take a couple of days.
This may take a couple of days!
These are scriptable, command-line tools. They are pre-compiled and require only a 5-line configuration file.
These are scriptable, command-line tools. They are pre-compiled and require only a 5-line configuration file.
These are scriptable, command-line tools. They are pre-compiled and require only a 5-line configuration file.
First bag is not valid because it’s missing the md5 manifest.
Second bag is valid.
These are scriptable, command-line tools. They are pre-compiled and require only a 5-line configuration file.
AVPreserve info about Glacier: http://www.avpreserve.com/wp-content/uploads/2014/04/AVPreserve_Glacier_profile.pdf
A handful of digital collections are only available through our website
Our SAN has archival masters for nearly (if not all) all of your digital collections, sometimes the only archival masters
Starting with DSpace, need to work out strategy for other content sources
1st iteration
RTS (outputs every community, collection, and record as an individual item)
one bag per record
Developed scripts to process RTS produced bags into APTrust compliant bags and upload to APTrust receiving bucket (using AWS CLI) <https://github.com/ntallman/APTrust_tools>
abandoned because realized late in game that RTS wiped out dates in the file-system metadata
2nd iteration
Tested bagit-Java and bagit-python, decided to use bagit-python
Exported single collection from DSpace in Simple Archive Format
8,822 items, ~377 GB
250 GB bag-size limitation
Collection had three series
two series under 250 GB
each series was put into a single bag
one series over 250 GB
used a multi-part bag to send to APTrust, APTrust merges split bags into one
Should we have done one big multi-part bag? I.e. one bag per collection?
Did not worry about tracking platform or collection in this bag
Expected to one day wipe this content out and replace with something that matches our final strategy
2nd iteration
Tested bagit-Java and bagit-python, decided to use bagit-python
Exported single collection from DSpace in Simple Archive Format
8,822 items, ~377 GB
250 GB bag-size limitation
Collection had three series
two series under 250 GB
each series was put into a single bag
one series over 250 GB
used a multi-part bag to send to APTrust, APTrust merges split bags into one
Should we have done one big multi-part bag? I.e. one bag per collection?
Did not worry about tracking platform or collection in this bag
Expected to one day wipe this content out and replace with something that matches our final strategy
3rd iteration (really just testing)
I'll let Jamie tell you more about this and I'll just say that I wish I had discovered this before creating my artisanal, hand-crafted bags.
Basically a GUI wrapper for my same process except,
added file characterization
To Do
Develop local preservation database or tool to track which content has been sent to APTrust
Scholar@UC is a self-submit institutional repository.
No hierarchy of collections (as we traditionally think about them)
one record (work), many files
Public commitment to long-term preservation of all Scholar content using APTrust
for all content
The flexible architecture of APTrust Preservation Repository (Are we allowed to say Fluctis?) allows us to make local decisions on how we package our content for preservation