This document contains the notes from a presentation on digital preservation challenges for arts and humanities materials. It discusses threats like physical medium failure, file format obsolescence, and organizational commitment. The presenter emphasizes approaching digital preservation the same way as print by identifying your threat model and priorities. Migration, normalization, and describing content are presented as strategies alongside ensuring sustainable policies and organizational support for long-term preservation.
Ψηφιακές βιβλιοθήκες, ψηφιακά αποθετήρια, υποδομές δεδομένων: θεμέλια της νέα...kebepcy
Παρουσίαση από τη διάλεξη με θέμα
«Ψηφιακές βιβλιοθήκες, ψηφιακά αποθετήρια, υποδομές δεδομένων: θέτοντας τις βάσεις για επιστήμες βασισμένες στα δεδομένα» του Kαθηγητή του τμήματος Πληροφορικής και Τηλεπικοινωνιών του Πανεπιστημίου Αθηνών Γιάννη Ιωαννίδη,
που πραγματοποιήθηκε την Τρίτη 29 Ιουνίου στο Πανεπιστήμιο Λευκωσίας Την εκδήλωση διοργάνωσαν η Βιβλιοθήκη και το Τμήμα Πληροφορικής Πανεπιστημίου Λευκωσίας, η Βιβλιοθήκη και το Τμήμα Πληροφορικής Πανεπιστημίου Κύπρου και η Κυπριακή Ένωση Βιβλιοθηκονόμων - Επιστημόνων Πληροφόρησης (ΚΕΒΕΠ).
Librarians and Open Access: the case of E-LIS Fatima Darries
The literature abounds with information on Open Access. Librarians rally to the cause as part of our responsibility of providing access to information. But what are librarians doing to further the cause of Open Access in their own discipline? E-LIS, short for Eprints in Library and Information Science, aims to further the Open Access philosophy by making available papers in LIS and related fields. It is a free-access international repository and archive, in line with the Free Online Scholaship movement (FOS) and the Eprints movement.
Using Fedora Commons To Create A Persistent ArchivePhil Cryer
With the increasing amount of digital data and demand for open access to view and reuse such data continually increasing, the adoption of open source digital repository software is critical for long term storage and management of digital objects. By utilizing the open source Fedora Commons software, the Missouri Botanical Garden has created a stable, persistent archive for Tropicos digital objects, including specimen images, plant photos, and other digital media. Metadata, organized in standard Dublin Core extracted from Tropicos, are stored alongside the digital objects providing search and sharing of data via open standards such as REST and OAI, opening the door for mash-ups and alternative uses. The presentation will cover initial discovery, required hardware and software, and an overview of our experience implementing Fedora Commons. Lessons learned, pros and cons, and other options will also be covered.
Ψηφιακές βιβλιοθήκες, ψηφιακά αποθετήρια, υποδομές δεδομένων: θεμέλια της νέα...kebepcy
Παρουσίαση από τη διάλεξη με θέμα
«Ψηφιακές βιβλιοθήκες, ψηφιακά αποθετήρια, υποδομές δεδομένων: θέτοντας τις βάσεις για επιστήμες βασισμένες στα δεδομένα» του Kαθηγητή του τμήματος Πληροφορικής και Τηλεπικοινωνιών του Πανεπιστημίου Αθηνών Γιάννη Ιωαννίδη,
που πραγματοποιήθηκε την Τρίτη 29 Ιουνίου στο Πανεπιστήμιο Λευκωσίας Την εκδήλωση διοργάνωσαν η Βιβλιοθήκη και το Τμήμα Πληροφορικής Πανεπιστημίου Λευκωσίας, η Βιβλιοθήκη και το Τμήμα Πληροφορικής Πανεπιστημίου Κύπρου και η Κυπριακή Ένωση Βιβλιοθηκονόμων - Επιστημόνων Πληροφόρησης (ΚΕΒΕΠ).
Librarians and Open Access: the case of E-LIS Fatima Darries
The literature abounds with information on Open Access. Librarians rally to the cause as part of our responsibility of providing access to information. But what are librarians doing to further the cause of Open Access in their own discipline? E-LIS, short for Eprints in Library and Information Science, aims to further the Open Access philosophy by making available papers in LIS and related fields. It is a free-access international repository and archive, in line with the Free Online Scholaship movement (FOS) and the Eprints movement.
Using Fedora Commons To Create A Persistent ArchivePhil Cryer
With the increasing amount of digital data and demand for open access to view and reuse such data continually increasing, the adoption of open source digital repository software is critical for long term storage and management of digital objects. By utilizing the open source Fedora Commons software, the Missouri Botanical Garden has created a stable, persistent archive for Tropicos digital objects, including specimen images, plant photos, and other digital media. Metadata, organized in standard Dublin Core extracted from Tropicos, are stored alongside the digital objects providing search and sharing of data via open standards such as REST and OAI, opening the door for mash-ups and alternative uses. The presentation will cover initial discovery, required hardware and software, and an overview of our experience implementing Fedora Commons. Lessons learned, pros and cons, and other options will also be covered.
This is a brief overview of how we'll use glue Biblio and Fedora Commons together for the Biodiversity Heritage Library. This binds together many pieces of the project and touches on how we'll use Fedora Commons as a preservation layer for the corpus of BHL data.
11.5.14 Presentation Slides, “Fedora 4.0 in Action at Penn State and Stanford”DuraSpace
Hot Topics: The DuraSpace Community Webinar Series
Series 9: Early Advantage: Introducing New Fedora 4.0 Repositories
Curated by David Wilcox, Fedora Product Manager, DuraSpace
“Fedora 4.0 in Action at Penn State and Stanford”
Wednesday, November 5, 1:00-2:00pm ET
Presented by:
David Wilcox, Fedora Product Manager, DuraSpace
Adam Wead, Developer, Pennsylvania State University and Tom Cramer, Chief Technology Strategist and Associate Director of Digital Library Systems and Services, Stanford University
Presentation slides from a talk given at RSP 'Goes back to' School 2009, Matfen Hall, Nr. Hexham, Northumberland, 14-16 September 2009. The actual presentation on the 15 September only covered the content up to Slide 33. The remainder includes a more detailed reflection on the curation of research data, left in to provide additional context for those using the full presentation.
Hot Topics: The DuraSpace Community Webinar Series,
“Introducing DSpace 7: Next Generation UI”
Curated by Claire Knowles, Library Digital Development Manager, The University of Edinburgh.
Introducing DSpace 7
February 28, 2017 presented by: Claire Knowles - The University of Edinburgh, Art Lowel - Atmire, Andrea Bollini - 4Science, Tim Donohue – DuraSpace
3.7.17 DSpace for Data: issues, solutions and challenges Webinar SlidesDuraSpace
Hot Topics: The DuraSpace Community Webinar Series,
“Introducing DSpace 7: Next Generation UI”
Curated by Claire Knowles, Library Digital Development Manager, The University of Edinburgh.
DSpace for Data: issues, solutions and challenges
March 7, 2017 presented by: Claire Knowles & Pauline Ward - The University of Edinburgh & Ryan Scherle - Dryad Digital Repository
Presentation for Open Access week 2016 "Open Science on the Move" conference on October 25th in Brussels
https://openaccess.be/2016/08/29/join-the-belgian-open-access-week-event-open-science-on-the-move-24-25-october-2016/
The presentation gives an overview of the DSpace community and explores how repository success can be assessed.
@mire presentation at the 2014 CGSpace partner meeting. The presentation lists a number of new features in the upcoming DSpace 5 release as well as a call for participation to DCAT, the DSpace Community Advisory Team.
The DSpace 5 features that are covered include:
- ORCID
- Sherpa Romeo
- The Mirage 2 responsive theme for the XML User Interface
This is a brief overview of how we'll use glue Biblio and Fedora Commons together for the Biodiversity Heritage Library. This binds together many pieces of the project and touches on how we'll use Fedora Commons as a preservation layer for the corpus of BHL data.
11.5.14 Presentation Slides, “Fedora 4.0 in Action at Penn State and Stanford”DuraSpace
Hot Topics: The DuraSpace Community Webinar Series
Series 9: Early Advantage: Introducing New Fedora 4.0 Repositories
Curated by David Wilcox, Fedora Product Manager, DuraSpace
“Fedora 4.0 in Action at Penn State and Stanford”
Wednesday, November 5, 1:00-2:00pm ET
Presented by:
David Wilcox, Fedora Product Manager, DuraSpace
Adam Wead, Developer, Pennsylvania State University and Tom Cramer, Chief Technology Strategist and Associate Director of Digital Library Systems and Services, Stanford University
Presentation slides from a talk given at RSP 'Goes back to' School 2009, Matfen Hall, Nr. Hexham, Northumberland, 14-16 September 2009. The actual presentation on the 15 September only covered the content up to Slide 33. The remainder includes a more detailed reflection on the curation of research data, left in to provide additional context for those using the full presentation.
Hot Topics: The DuraSpace Community Webinar Series,
“Introducing DSpace 7: Next Generation UI”
Curated by Claire Knowles, Library Digital Development Manager, The University of Edinburgh.
Introducing DSpace 7
February 28, 2017 presented by: Claire Knowles - The University of Edinburgh, Art Lowel - Atmire, Andrea Bollini - 4Science, Tim Donohue – DuraSpace
3.7.17 DSpace for Data: issues, solutions and challenges Webinar SlidesDuraSpace
Hot Topics: The DuraSpace Community Webinar Series,
“Introducing DSpace 7: Next Generation UI”
Curated by Claire Knowles, Library Digital Development Manager, The University of Edinburgh.
DSpace for Data: issues, solutions and challenges
March 7, 2017 presented by: Claire Knowles & Pauline Ward - The University of Edinburgh & Ryan Scherle - Dryad Digital Repository
Presentation for Open Access week 2016 "Open Science on the Move" conference on October 25th in Brussels
https://openaccess.be/2016/08/29/join-the-belgian-open-access-week-event-open-science-on-the-move-24-25-october-2016/
The presentation gives an overview of the DSpace community and explores how repository success can be assessed.
@mire presentation at the 2014 CGSpace partner meeting. The presentation lists a number of new features in the upcoming DSpace 5 release as well as a call for participation to DCAT, the DSpace Community Advisory Team.
The DSpace 5 features that are covered include:
- ORCID
- Sherpa Romeo
- The Mirage 2 responsive theme for the XML User Interface
About the Webinar
The digitization of resources can provide expanded access to information as well as a preservation mechanism for now-fragile materials. Preserving the digital copy of the resource is an issue now being addressed, but what about the software used to create digital files? How can software on media which can no longer be read -- or no longer be read easily -- be preserved? If that software can’t be accessed, what happens to the material created by, and only read by, that software?
Progress has been made in formulating standards for the preservation and description of digital materials and a framework for addressing digital item preservation has been proposed. Despite, however, meetings such as the Library of Congress’ “Preserving.exe: Toward a National Strategy for Preserving Software,” no formal standard or framework yet exists for software digitization and preservation. This webinar will feature three presenters who will speak on aspects of software digitization and preservation, including a how-to approach (technical aspects), a metadata component, and observations from the field as part of the continuing discussion on the state of the field and the need for standardization.
Agenda
Introduction
Todd Carpenter, Executive Director, NISO
Software artifacts: Migration and Emulation
Michael Lesk, Professor of Library and Information Science, Rutgers University
Emulation in practice: Emulation as a Service at Yale University Library: Lessons learnt and plans for the future
Euan Cochrane, Digital Preservation Manager, Yale University Library
No (You Can't Expect To Run Your Files Just Because You Saved Them)
Jon Ippolito, Professor of New Media and Director of the Digital Curation graduate program, University of Maine
Can we write successful enterprise software without challenging assumptions? Agile doesn't happen in a vacuum. Here's what I discovered using EventStorming as a blade to cut through business, software and organisation dysfunctions. From XP2017 Cologne.
My talk at the Scandinavian Developer Conference 2010 about following the wrong principles and getting too excited about shiny demos rather than building things that work and proving our technologies as professional tools.
ForgetIT – Some store to remember, some store to forgetSøren Schaffstein
With growing storage capacities and sinking storage prices, the paradigm of keeping everything is prevailing. However, keeping information accessible, useable and useful goes far beyond purely keeping things, especially in the long run, and entails expenses much larger than just the storage costs. This issue especially applies to content in Content Management Systems where we increasingly face the situation of creating, managing and storing (preserving) multimedia content, which we might never access again due to the pure volume of content.
To overcome these issues, we envision the concept of flexible managed forgetting for information that progressively ceases in importance and finally becomes obsolete as well as for redundant information. We will extend TYPO3 with preservation and forgetting. The forgetting will also reduce the user’s cognitive burden for past activities and information in TYPO3 but still allows access if needed. The same as our brain will retrieve details of our past when remembering and getting associations, the approach will provide such means.
Within the Seventh Framework Programme for Research (FP7) of the European Union the "ForgetIT" project strives to build a solution for the mentioned problems. The project has a scope of 3 years and TYPO3 has been selected as CMS to build upon as it is Open Source Software and has an open and active community.
This talk will give an introduction into digital preservation and why companies can greatly profit from it. The current status of the research project will be demonstrated.
An overview of the project can be found on the projects website (of course made with TYPO3): http://www.forgetit-project.eu/
Similar to Digital preservation and institutional repositories (20)
Lecture for LIS 644 "Digital Trends, Tools, and Debates." Not my strong point, so I won't swear there are no errors. If you reuse, please respect the CC-BY-NC-SA license on the photo.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
5. Threat model
• “Preservation” means nothing unmodified.
• This is why it becomes such a bogeyman!
• Two things you need to know first:
• why you’re preserving what you’re preserving, and
• what you’re preserving it against.
• Your collection-development policy should
inform the first question.
• Your coll-dev policy doesn’t include local born-digital or
digitized materials? This is a problem. Fix it.
• The second question is your “threat model.”
14. Why did I just make you
do that?
• I’m weird.
• I’m trying to destroy the myth that any given
medium “preserves itself.”
• Media do not preserve themselves. People preserve media
—or media get bizarrely lucky.
• We need not panic over digital preservation
any more than we panic about print.
• Approach digital preservation the same way you approach
print preservation.
• Strategically: this approach helps your colleagues get a
grip, too. Your colleagues may well be the biggest barrier
to digital preservation in your library!
24. Ignorance
?
• “It’s in Google, so it’s preserved.” (Not even
“Google Books!”)
• “I make backups, so I’m fine.”
• “I have a graduate student who takes care of
these things.”
• “Metadata? What’s that? I have to have it?”
• “Digital preservation is an unsolvable problem,
so why even try?” (I’ve heard this one from
librarians. I bet you have too.)
27. Salo’s needs pyramid
Less Less
immediate Fidelity tractable
to original
Usability
Format viability
Bitrot
Physical medium issues
More More
immediate Acquisition issues
tractable
29. But first, a word about
failure
• “We can’t save everything digital!”
• Well, no, we can’t.
• We can’t save everything printed either.
• That’s no excuse, in either medium. Why do we
let it be one for digital materials?
• Yes, we will lose some stuff. That’s life in the
big city. Dive in anyway.
30. And a word about scale
• Many of those currently panicking about digital
preservation are thinking about huge scales.
• At some repository size, bitrot happens faster than you can
detect and fix it.
• Last I heard, this was somewhere in the exabyte range.
• We’re not. So let’s relax about some of this
stuff. At our scale, many problems are solvable.
• Unless your problem is digital video. Good luck with that.
• Our scale problems happen on the front end, as
we’ve been learning this week.
31. Physical medium failure
• Gold CDs are not the panacea we thought.
• They’re not bad; they’re just hard to audit, so they fail
(when they fail) silently. Silent failure is DEADLY.
• How long will hardware be able to read them?
• ALL such physical media are risky, for the same reasons!
• Current state of the art: get it on spinning disk.
• Back up often. Distribute your backups
geographically. Test them now and then.
• Consider a LOCKSS cooperative agreement. Others have.
• Any physical medium WILL FAIL. Have a plan
for when it does.
32. Bitrot
• Sometimes used for “file format obsolescence.”
• I use it for “the bits flipped unexpectedly.”
• Checking a file bit-by-bit against a backup copy
is computationally impractical for every day.
• Though on ingest it’s a good idea to verify bit-by-bit!
• Checksums
• A file is, fundamentally, a great big number.
• Do math on the number file. Store the result as metadata.
• To check for bitrot, redo the math and check the answer
against the stored result. If they’re different, scream.
• Several checksum algorithms; for our purposes, which one
you use doesn’t matter much.
33. File format obsolescence
• When possible, prefer file formats that are:
• Open/non-proprietary. (If a software vendor goes out of
business, does their format?)
• Documented
• Standardized, non-patent-encumbered
• In widespread use. (If the format dies, lots of people have
incentive to solve the problem.)
• For text, non-binary
• For everything else, lossless rather than lossy
• For compound objects, compound documents rather than
embedded
• Realistically? We often have to take what
we’re given.
34. Lossless? Lossy? What?
• Essential tradeoff: quality and fidelity vs. file size
• Clipping information out makes the file size
smaller! But once it’s gone, it’s gone.
• Tremendous problem with video. Lossless video
formats are HUGE.
• Lossy image formats: JPEG, JPEG2000 (much
less so)
• (more or less) Lossless: TIFF, PNG, GIF
• Compression may be lossless or lossy. Find out!
36. Audio formats
• I am NOT going to talk about codecs vs.
container formats. Consider it homework.
• No ideal choice here; lossless formats are
patent-encumbered and/or proprietary
• WAV and AIFF are okay. Ogg Vorbis is ideal, but
nobody supports it.
• mp3: if you must, it’s lossy.
37. Migration vs. emulation
• Migration: move the file to a new format
• Don’t throw away your original! You may have made the
wrong migration decision.
• Not necessarily a lossless process. (Fonts!)
• Emulation: create a modern hardware/software
environment that can deal with the old format
• For some cultural artifacts such as games, this is the only
reasonable option.
• Emulation advocates make big claims that I’m not sure
they can back up. Proceed with caution.
38. Normalization
• Migration of a dataset toward a well-defined
target.
• “Treat the same thing the same way.”
• E.g. census data... define a set of data tables, move all
data into them.
• Great for interoperability and preservation!
• Pitfall: “the same thing”?
• Humanities: TEI is a de facto normalizer for
humanities textual data.
• (Other XML formats in other fields: e.g. ChemML, NLM
DTD.)
39. Problem: BEHAVIOR.
• Migration can preserve information content
and (often but not always) appearance.
• Preserving interaction patterns is much
harder!
• E.g. a web page containing Javascript
• Or a database with a query engine
• Or an applet or Flash object
• Or a collection whose interactions are based on an
obsolete software system. (DynaText anyone?)
• Hard problem. No obvious solutions; certainly
no easy ones.
40. When is a PDF not a PDF?
• When it’s a .doc with the wrong file extension
• When there’s no file extension on it at all
• When it’s so old it doesn’t follow the
standardized PDF conventions
• When it’s otherwise malformed, made by a
bad piece of software.
• How do you know whether you have a good
PDF? (Or .doc, or .jpg, or .xml, or anything else.)
41. File format registries and
testing tools
• JHOVE: JSTOR/Harvard Object Validation
Environment
• Java software intended to be pluggable into other
software environments
• Answers “What format is this thing?” and “Is this thing a
good example of the format?”
• Limited repertoire of formats
• PRONOM/DROID + GDFR = Unified Digital
Formats Registry
42. Forgetting what you have
• Absolutely pernicious problem. We don’t know
what we have to begin with!
• Do you know how much Faculty Stuff is scattered
throughout your institution’s .edu domain? Me neither.
But I know it’s a lot. How much of that is irreplaceable?
• We’re also bad at labelling and tracking what
we have.
• No easy answer to this one; the solution lies in
a complete praxis reinvention.
• Yeah. Good luck with that.
43. ... but I thought you meant
in libraries, Dorothea!
• Come on, we’ve solved that one: Metadata!
• Once it’s in the library, it’s probably fine. The
real problem is all that Other Stuff Out There.
• This is a collection-development problem and
should be treated as one.
• Don’t dump it on some poor “digital preservation
librarian!” That flat out doesn’t scale.
• Don’t make the mistake of drawing thick lines around
“our stuff” and “their stuff.” Like it or not, our coll-dev
universe has moved beyond what’s published and what’s
canonically “library.”
44. What the stuff you have
means
• Collect whatever it takes to answer this
question:
• If the owner of this material were hit by a bus tomorrow,
what would be needed for others to use it?
• Nasty discipline-specific problem.
• This is what the NARA/RLG Trusted Digital Repository
checklist is aiming at with “designated community.”
• Where NARA/RLG goes off the rails is assuming you have
to go through this exercise with EVERYTHING YOU HAVE.
• Data-dictionaries, algorithms, specifications, tech
metadata, whatever it takes. Use common sense!
45. Rights and DRM
• Not having IP rights to something may mean
you can’t preserve it.
• Brian Lavoie writes well about this problem.
• Copyright law and its exceptions haven’t caught up to the
digital age!
• Third-party services (e.g. blogs, ITunesU, Slideshare) are a
headache here.
• DRM means that no matter the rights
situation, you’re stuck.
• PDFs: Users turn on “security” features. This is DRM. Tell
them not to do that!
• Huge headache with third-party services, again.
46. ... and other hassles
• Privacy, confidentiality, and human-subject
research issues
• Think “we’re the humanities; IRBs don’t happen to us”?
Think again. One word: FOLKLORE.
• Third-party copyright
• Campus musical or dramatic performances
• Issues of cultural sensitivity, heritage,
repatriation
• You need a dark (or at least dim) archive if
you’re serious about digital preservation.
There is no way around this. Sorry.
47. Organizational commitment
• There is only one answer: POLICY.
• Unfortunately, it’s not a quick, easy, or
uncomplicated answer.
• Digital preservation costs money.
• People in high places are scared of it.
• It requires process and staff change.
• You have to make the case. And then make it
again. And again. Until they get it!
• Where I am, Somebody Else’s Problem fields are
everywhere around this issue.
48. You are probably the
preservation option
of last resort.
Be prepared for anything
excluded from your policy
to disappear.
49. When organizations fail
• Remember Geocities? We’re worse.
• Mellon: Can’t make a list of its funded on-the-web
projects, because most of them are GONE. G-O-N-E.
• We do not, as a profession, have a safety net
for each others’ projects and materials.
• This is, frankly, unconscionable.
• I don’t know how to fix it; I am just warning
you that project rescues are and will continue
to be necessary.
• Institutional boundaries are a barrier here.
50. Great policy guidance
• Policy-making for research data in repositories:
a guide
• http://www.disc-uk.org/docs/guide.pdf
• Practical data management: a legal and policy
guide
• http://eprints.qut.edu.au/archive/00014923/01/
Microsoft_Word_-_Practical_Data_Management_-
_A_Legal_and_Policy_Guide_doc.pdf
• Australian, so take “legal” with a grain of salt
• Guide to social science data preparation and
archiving
• http://www.icpsr.umich.edu/ICPSR/access/dataprep.pdf
51. Summary: the OAIS model
• “Reference model” for archival systems
• All theory, no praxis, by design. (Because praxis changes!)
• Four parts
• Vocabulary
• Data (and interaction) model
• Required responsibilities of an archive
• Recommended functions (in the computer-programming
sense) for carrying out those responsibilities
• My favorite distillation: Ockerbloom
• http://everybodyslibraries.com/2008/10/13/what-
repositories-do-the-oais-model/
53. For our purposes...
• We’re talking about the software.
• I’m not going to rant (much) about what IRs
are for or how they’re run.
• If you want that, read Roach Motel. Better yet, read
Palmer et al. 2009.
• We’re interested in the application (or lack
thereof ) of IRs to data curation in the arts and
humanities. Right? Right.
• I’m not afraid of the technical, and neither
should you be.
60. The IR content use-case
• A research paper
• In a single file; possibly more than one format
available
• Is not related to any other item in the history
of ever
• The user can download it, and... um... just
download it, really.
61. How much of our stuff
does that work for?
• Image collections
• Page-scanned books (with or without OCR)
• Marked-up books
• Theses and dissertations
• Website preservation
• Audio and video
• Complex multimedia
• Database (linguistic, geographic...)
• Software
63. One metadata standard
does not fit all
• EAD
• METS
• The simple fact is that
• VRA Core
EPrints and DSpace do
• MODS Dublin Core, METS, and
• TEI Header nothing else natively.
• Dublin Core This is purely inadequate
for humanities data
• MARC
curation.
• ... the beat goes on.
64. One file format does not
fit all
• Yes, we have to take what we get.
• With discrete files, most IR software is fine.
• Forget about streaming audio/video.
• DSpace is good with static websites.
• For other composite objects, you’re in trouble.
• For anything like a database, you’re in trouble.
65. The DSpace/EPrints view
of the universe
• Communities and collections
• “EPeople”
• must be given explicit permission to add or edit materials
• Metadata entry forms
• DSpace: fields configurable by collection
• EPrints: auto-configures fields based on content type
• Files/bitstreams
• Many permitted per item; must upload one by one in DSpace!
• Get friendly with the DSpace batch importer. You’ll need it.
66. The Fedora view of the
universe
• You can do anything at all with anything at all
as long as you’re willing to tell Fedora how to
do it. Infinite flexibility! But also infinite
responsibility.
• “Content model:” what’s in this thing?
• “Service:” what should the user-interface do
with what’s in this thing?
• Metadata, relationships, stuff
67. Can you use Fedora for
an IR?
• Yes, but not alone; you need all the Content
Models and Services bolted on top.
• Try Islandora or Muradora. Fez is a last resort; it
acts like DSpace, and this is not a good thing.
• Even if you can’t build a real Fedora digital
library now, you may not be able to do so in
future if you stick with DSpace...
• ... but the Fedora/DSpace merger may change
things.
68. What is this FOXML
stuff anyway?
• Think of it as the Fedora batch-import format.
• It’s complex! But it can absorb any amount or
type of XML metadata or data, which is really
quite nice.
69. Summing up
• Out-of-the-box IR software will handle some
A&H data-curation jobs adequately...
• ... but by no means all of them.
• If you need sophisticated UI, bite the bullet
and go with Fedora. Islandora and Muradora
make Fedora simpler for simple things than it
once was.
• If you don’t need sophisticated user-facing UI,
go with EPrints.
• DSpace is a loser choice.
71. Thank you!
• This presentation is available under a Creative
Commons Attribution 3.0 United States
license.
• Please remember to credit images if you reuse
individual slides. Thank you!