SlideShare a Scribd company logo
Risk management and
auditing
Dorothea Salo
Threat model
•“Preservation” means nothing unmodified.
• This is why it becomes such a bogeyman!
•Two things you need to know first:
• why you’re preserving what you’re preserving, and
• what you’re preserving it against.
•Libraries: your collection-development policy
should inform the first question.
• Your coll-dev policy doesn’t include local born-digital or
digitized materials? This is a problem. Fix it.
•The second question is your “threat model.”
What is your threat
model for print?
Homelessness
Water
Flora and fauna
Physical damage
Loss or destruction
Why did I just make you
do that?
•I’m weird.
•I’m trying to destroy the myth that any given
medium “preserves itself.”
•Media do not preserve themselves. People
preserve media—or media get bizarrely lucky.
•We need not panic over digital preservation
any more than we panic about print.
•Approach digital preservation the same way
you approach print preservation.
Now...
List important threats
to digital data.
Physical medium failure
“Bitrot”
File format obsolescence
Forgetting what you have
Forgetting what the
stuff you have means
Rights and DRM
Lack (or disappearance)
of organizational commitment
One word: Geocities.
?
Ignorance
•“It’s in Google, so it’s preserved.” (Not even
“Google Books!”)
•“I make backups, so I’m fine.”
•“I have a graduate student who takes care of
these things.”
•“Metadata? What’s that? I have to have it?”
•“Digital preservation is an unsolvable problem,
so why even try?” (I’ve heard this one from
librarians. I bet you have too.)
Apathy
Mitigating the risks:
planning and auditing
tools
Audit frameworks
• Trusted Repository Audit Checklist
• (If you see “NARA/RLG” somewhere? This is the framework that
evolved into TRAC. Long story.)
• You can get an actual formal TRAC audit from CRL! Who has? Portico,
Hathi, “Chronicle of Life,” two-three others. This audit is HARSH. (So
don’t write off a repo because it hasn’t had a TRAC audit.)
• If you hear the phrase “trusted digital repository,” it should mean
that the repo has had (or is pursuing) a TRAC audit.
• DRAMBORA
• More flexible, less finger-shaking than TRAC.
• Less of this “designated community” nonsense.
• Less dependent on OAIS model (which I consider a strength).
• Encourages archives to consider and document their individual
situations and think hard about risk mitigation.
Newer: SPOT model
•Even less clunky than DRAMBORA.
•I quite like this one.
•Identifying Threats to Successful Digital
Preservation: the SPOT Model for Risk
Assessment
• http://www.dlib.org/dlib/september12/vermaaten/
09vermaaten.html
So what do they audit?
•Mission (and adherence to it)
•Plans and policies
• including contingency plans
•Staff infrastructure
•Operations documentation
• including tech infrastructure, service infrastructure
•Sustainable funding
•“Doing the right things with the stuff.”
• identifiers, ingest file format management, migration, etc.
•NOTICE WHAT’S FIRST ON THE LIST.
• remember, the tech part is the easy part!
TRAC, DRAMBORA, and DH
•TRAC, DRAMBORA, and SPOT are designed to
audit repositories, not individual datasets, data
files, or research projects.
• They assume a lot of infrastructure and (in TRAC’s case) a
long-term time horizon that you probably aren’t.
•So if you’re trying to think through a project,
where do you go?
• TRAC and DRAMBORA are probably overkill!
• (Though parts of DRAMBORA won’t hurt you.)
Data Curation Profiles
•Research project out of Purdue’s Digital Data
Curation Center (“D2C2”)
•“Toolkit:” interview instrument, user guide for
interview instrument, worksheet.
•Small library of completed profiles
•Ignore the user guide. Grab the worksheet, and
use the interview instrument for reference.
•http://datacurationprofiles.org
• You have to make a login to download the toolkit pieces.
Mitigating specific
risks
Physical medium failure
•Gold CDs are not the panacea we thought.
• They’re not bad; they’re just hard to audit, so they fail
(when they fail) silently. Silent failure is DEADLY.
•Current state of the art: get it on spinning disk.
•Back up often. Distribute your backups
geographically. Test them now and then.
• Consider a LOCKSS cooperative agreement. Others have.
•Bitrot-detection techniques may help here too.
•Any physical medium WILL FAIL. Have a plan
for when it does.
“Digital forensics”
•The art and science of investigating digital file
formats and media.
• Reading obsolete ones.
• Reverse-engineering and/or documenting existing ones so
they don’t go obsolete.
• Ensuring secure deletion, when necessary.
• Reconstructing what used to be on a physical storage
medium. (Surprising how often this is possible!)
• Audit trails for legal and records-management purposes.
• AMAZING report (highly highly recommended!): “Digital
Forensics and Born-Digital Content in Cultural Heritage
Institutions.” http://www.clir.org/pubs/abstract/
pub149abst.html. Both computer-nerdy and humanities-
nerdy in the best possible way.
Avoiding “bitrot”
•Sometimes used for “file format obsolescence.”
•I use it for “the bits flipped unexpectedly.”
•Checking a file bit-by-bit against a backup copy
is computationally impractical for every day.
• Though on ingest it’s a good idea to verify bit-by-bit!
•Checksums
• A file is, fundamentally, a great big number.
• Do math on the number file. Store the result as metadata.
• To check for bitrot, redo the math and check the answer
against the stored result. If they’re different, scream.
• Several checksum algorithms; for our purposes, which one
you use doesn’t matter much.
• “Hash collision:” it’s possible, but unlikely, for different files
to have the same checksum. Potential hack vector!
Migration vs. emulation:
dealing with obsolescence
•Migration
• change the file to be usable in new software/hardware
configurations
• risks: information loss (FONTS!), imperfect transfer,
choosing the wrong migration path
• smart systems don’t throw away the old files!
•Emulation
• keep the file, train new software/hardware to behave like
the old
• risks: imperfect emulation, impractical emulation
• makes more sense for software (games!), less for files
•Pragmatically: redigitization.
Finding tools
•Migration
• Current versions of the original software may be able to
open old files.
• Open-source software in the same genre may be able to
translate proprietary file formats (often imperfectly). Tend
to maintain translators longer than you’d think.
• Look on the web!
• MIGRATE FAST. Once it’s damaged or obsolete, it’s
probably too late.
•Emulation
• look for the gamers! it’s WILD what they’ll emulate!
• Look to the open-source community for operating-
system, hardware-driver emulators.
• Frankly, there’s a lot of hype and vaporware here.
When is a PDF not a PDF?
•When it’s a .doc with the wrong file extension
•When there’s no file extension on it at all
•When it’s so old it doesn’t follow the
standardized PDF conventions
•When it’s otherwise malformed, made by a
bad piece of software.
•How do you know whether you have a good
PDF? (Or .doc, or .jpg, or .xml, or anything else.)
File format registries and
testing tools
•JHOVE: JSTOR/Harvard Object Validation
Environment
• Java software intended to be pluggable into other
software environments
• Answers “What format is this thing?” and “Is this thing a
good example of the format?”
• Limited repertoire of formats
•PRONOM/DROID + GDFR = Unified Digital
Formats Registry
•Wrapper tool: FITS, File Information Tool Set
• JHOVE + DROID + various other testers. State of the art.
Thanks!
•Copyright 2011 by Dorothea Salo.
•This lecture and slide deck are licensed under a
Creative Commons Attribution 3.0 United
States License.

More Related Content

What's hot

Guy avoiding-dat apocalypse
Guy avoiding-dat apocalypseGuy avoiding-dat apocalypse
Guy avoiding-dat apocalypse
ENUG
 
Organizing Your Research Data
Organizing Your Research DataOrganizing Your Research Data
Organizing Your Research Data
Kristin Briney
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic Web
Roberto García
 
Practical Best Practices for Data Management
Practical Best Practices for Data ManagementPractical Best Practices for Data Management
Practical Best Practices for Data Management
UW Research Data Services
 
Sarah Callaghan Research Data Overview
Sarah Callaghan Research Data OverviewSarah Callaghan Research Data Overview
Sarah Callaghan Research Data OverviewOpenAIRE
 
Data Management
Data ManagementData Management
Data Management
Jackie Wirz, PhD
 
NGP Retreat Open Science 2015
NGP Retreat Open Science 2015NGP Retreat Open Science 2015
NGP Retreat Open Science 2015
Jackie Wirz, PhD
 
Laurie Goodman at NDIC: Big Data Publishing, Handling & Reuse
Laurie Goodman at NDIC: Big Data Publishing, Handling & ReuseLaurie Goodman at NDIC: Big Data Publishing, Handling & Reuse
Laurie Goodman at NDIC: Big Data Publishing, Handling & Reuse
GigaScience, BGI Hong Kong
 
Data Science Folk Knowledge
Data Science Folk KnowledgeData Science Folk Knowledge
Data Science Folk Knowledge
Krishna Sankar
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
c.titus.brown
 
DMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data MiningDMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data Mining
Pier Luca Lanzi
 
Towards An Improvement Community Platform for Service Innovation
Towards An Improvement Community Platform for Service InnovationTowards An Improvement Community Platform for Service Innovation
Towards An Improvement Community Platform for Service Innovation
Jack Park
 
Service and Support for Science IT -Peter Kunzst, University of Zurich
Service and Support for Science IT-Peter Kunzst, University of ZurichService and Support for Science IT-Peter Kunzst, University of Zurich
Service and Support for Science IT -Peter Kunzst, University of ZurichMind the Byte
 
Publishing and Using Linked Open Data - Day 1
Publishing and Using Linked Open Data - Day 1 Publishing and Using Linked Open Data - Day 1
Publishing and Using Linked Open Data - Day 1
Richard Urban
 
Information Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteInformation Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ Deloitte
Deep Kayal
 
"Curation-Ready" Workflows for Digitized Photograph Collections: A Temporary ...
"Curation-Ready" Workflows for Digitized Photograph Collections: A Temporary ..."Curation-Ready" Workflows for Digitized Photograph Collections: A Temporary ...
"Curation-Ready" Workflows for Digitized Photograph Collections: A Temporary ...
Visual Resources Association
 
Informatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data DecadeInformatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data Decade
Liz Lyon
 
DMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course IntroductionDMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course Introduction
Pier Luca Lanzi
 

What's hot (20)

Guy avoiding-dat apocalypse
Guy avoiding-dat apocalypseGuy avoiding-dat apocalypse
Guy avoiding-dat apocalypse
 
Organizing Your Research Data
Organizing Your Research DataOrganizing Your Research Data
Organizing Your Research Data
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic Web
 
Practical Best Practices for Data Management
Practical Best Practices for Data ManagementPractical Best Practices for Data Management
Practical Best Practices for Data Management
 
Sarah Callaghan Research Data Overview
Sarah Callaghan Research Data OverviewSarah Callaghan Research Data Overview
Sarah Callaghan Research Data Overview
 
Data Management
Data ManagementData Management
Data Management
 
NGP Retreat Open Science 2015
NGP Retreat Open Science 2015NGP Retreat Open Science 2015
NGP Retreat Open Science 2015
 
Laurie Goodman at NDIC: Big Data Publishing, Handling & Reuse
Laurie Goodman at NDIC: Big Data Publishing, Handling & ReuseLaurie Goodman at NDIC: Big Data Publishing, Handling & Reuse
Laurie Goodman at NDIC: Big Data Publishing, Handling & Reuse
 
Data Science Folk Knowledge
Data Science Folk KnowledgeData Science Folk Knowledge
Data Science Folk Knowledge
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 
DMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data MiningDMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data Mining
 
Towards An Improvement Community Platform for Service Innovation
Towards An Improvement Community Platform for Service InnovationTowards An Improvement Community Platform for Service Innovation
Towards An Improvement Community Platform for Service Innovation
 
Service and Support for Science IT -Peter Kunzst, University of Zurich
Service and Support for Science IT-Peter Kunzst, University of ZurichService and Support for Science IT-Peter Kunzst, University of Zurich
Service and Support for Science IT -Peter Kunzst, University of Zurich
 
Publishing and Using Linked Open Data - Day 1
Publishing and Using Linked Open Data - Day 1 Publishing and Using Linked Open Data - Day 1
Publishing and Using Linked Open Data - Day 1
 
Carl idigpres
Carl idigpresCarl idigpres
Carl idigpres
 
CARLIdigpres
CARLIdigpresCARLIdigpres
CARLIdigpres
 
Information Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteInformation Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ Deloitte
 
"Curation-Ready" Workflows for Digitized Photograph Collections: A Temporary ...
"Curation-Ready" Workflows for Digitized Photograph Collections: A Temporary ..."Curation-Ready" Workflows for Digitized Photograph Collections: A Temporary ...
"Curation-Ready" Workflows for Digitized Photograph Collections: A Temporary ...
 
Informatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data DecadeInformatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data Decade
 
DMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course IntroductionDMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course Introduction
 

Similar to Risk management and auditing

Archiving Best Practices -- Creative Operations Essentials
Archiving Best Practices -- Creative Operations EssentialsArchiving Best Practices -- Creative Operations Essentials
Archiving Best Practices -- Creative Operations Essentials
globaledit®
 
Lipstick on a Pig: Integrated Library Systems
Lipstick on a Pig: Integrated Library SystemsLipstick on a Pig: Integrated Library Systems
Lipstick on a Pig: Integrated Library Systems
Dorothea Salo
 
ISACA Ethical Hacking Presentation 10/2011
ISACA Ethical Hacking Presentation 10/2011ISACA Ethical Hacking Presentation 10/2011
ISACA Ethical Hacking Presentation 10/2011
Xavier Mertens
 
Keith J. Jones, Ph.D. - Crash Course malware analysis
Keith J. Jones, Ph.D. - Crash Course malware analysisKeith J. Jones, Ph.D. - Crash Course malware analysis
Keith J. Jones, Ph.D. - Crash Course malware analysis
Keith Jones, PhD
 
The economies of scaling software - Abdel Remani
The economies of scaling software - Abdel RemaniThe economies of scaling software - Abdel Remani
The economies of scaling software - Abdel Remani
jaxconf
 
Just digitise it - Daniel Wilksch of the Public Records Office Victoria
Just digitise it - Daniel Wilksch of the Public Records Office VictoriaJust digitise it - Daniel Wilksch of the Public Records Office Victoria
Just digitise it - Daniel Wilksch of the Public Records Office Victoria
National Library of Australia
 
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Alex Pinto
 
Just Digitise It! - Daniel Wilksch
Just Digitise It! - Daniel WilkschJust Digitise It! - Daniel Wilksch
Just Digitise It! - Daniel Wilksch
National Library of Australia
 
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership GrantPOWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
Lynne Thomas
 
Digital preservation and institutional repositories
Digital preservation and institutional repositoriesDigital preservation and institutional repositories
Digital preservation and institutional repositories
Dorothea Salo
 
2010 AIRI Petabyte Challenge - View From The Trenches
2010 AIRI Petabyte Challenge - View From The Trenches2010 AIRI Petabyte Challenge - View From The Trenches
2010 AIRI Petabyte Challenge - View From The TrenchesGeorge Ang
 
The Economies of Scaling Software
The Economies of Scaling SoftwareThe Economies of Scaling Software
The Economies of Scaling Software
Abdelmonaim Remani
 
Fun with Application Security
Fun with Application SecurityFun with Application Security
Fun with Application Security
Bruce Abernethy
 
“Mobile Choices” and Library Anywhere (CILIP)
“Mobile Choices” and Library Anywhere (CILIP)“Mobile Choices” and Library Anywhere (CILIP)
“Mobile Choices” and Library Anywhere (CILIP)Tim Spalding
 
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
TheFamily
 
Setting a Course for Success: Getting Started with Digital Preservation in Yo...
Setting a Course for Success: Getting Started with Digital Preservation in Yo...Setting a Course for Success: Getting Started with Digital Preservation in Yo...
Setting a Course for Success: Getting Started with Digital Preservation in Yo...
WiLS
 
Defending Enterprise IT - beating assymetricality
Defending Enterprise IT - beating assymetricalityDefending Enterprise IT - beating assymetricality
Defending Enterprise IT - beating assymetricality
Claus Cramon Houmann
 
Presentation infra and_datacentrre_dialogue_v2
Presentation infra and_datacentrre_dialogue_v2Presentation infra and_datacentrre_dialogue_v2
Presentation infra and_datacentrre_dialogue_v2
Claus Cramon Houmann
 
Binary crosswords
Binary crosswordsBinary crosswords
Binary crosswords
Laurent Cerveau
 

Similar to Risk management and auditing (20)

Preserve or preserve not
Preserve or preserve notPreserve or preserve not
Preserve or preserve not
 
Archiving Best Practices -- Creative Operations Essentials
Archiving Best Practices -- Creative Operations EssentialsArchiving Best Practices -- Creative Operations Essentials
Archiving Best Practices -- Creative Operations Essentials
 
Lipstick on a Pig: Integrated Library Systems
Lipstick on a Pig: Integrated Library SystemsLipstick on a Pig: Integrated Library Systems
Lipstick on a Pig: Integrated Library Systems
 
ISACA Ethical Hacking Presentation 10/2011
ISACA Ethical Hacking Presentation 10/2011ISACA Ethical Hacking Presentation 10/2011
ISACA Ethical Hacking Presentation 10/2011
 
Keith J. Jones, Ph.D. - Crash Course malware analysis
Keith J. Jones, Ph.D. - Crash Course malware analysisKeith J. Jones, Ph.D. - Crash Course malware analysis
Keith J. Jones, Ph.D. - Crash Course malware analysis
 
The economies of scaling software - Abdel Remani
The economies of scaling software - Abdel RemaniThe economies of scaling software - Abdel Remani
The economies of scaling software - Abdel Remani
 
Just digitise it - Daniel Wilksch of the Public Records Office Victoria
Just digitise it - Daniel Wilksch of the Public Records Office VictoriaJust digitise it - Daniel Wilksch of the Public Records Office Victoria
Just digitise it - Daniel Wilksch of the Public Records Office Victoria
 
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
 
Just Digitise It! - Daniel Wilksch
Just Digitise It! - Daniel WilkschJust Digitise It! - Daniel Wilksch
Just Digitise It! - Daniel Wilksch
 
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership GrantPOWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
 
Digital preservation and institutional repositories
Digital preservation and institutional repositoriesDigital preservation and institutional repositories
Digital preservation and institutional repositories
 
2010 AIRI Petabyte Challenge - View From The Trenches
2010 AIRI Petabyte Challenge - View From The Trenches2010 AIRI Petabyte Challenge - View From The Trenches
2010 AIRI Petabyte Challenge - View From The Trenches
 
The Economies of Scaling Software
The Economies of Scaling SoftwareThe Economies of Scaling Software
The Economies of Scaling Software
 
Fun with Application Security
Fun with Application SecurityFun with Application Security
Fun with Application Security
 
“Mobile Choices” and Library Anywhere (CILIP)
“Mobile Choices” and Library Anywhere (CILIP)“Mobile Choices” and Library Anywhere (CILIP)
“Mobile Choices” and Library Anywhere (CILIP)
 
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
 
Setting a Course for Success: Getting Started with Digital Preservation in Yo...
Setting a Course for Success: Getting Started with Digital Preservation in Yo...Setting a Course for Success: Getting Started with Digital Preservation in Yo...
Setting a Course for Success: Getting Started with Digital Preservation in Yo...
 
Defending Enterprise IT - beating assymetricality
Defending Enterprise IT - beating assymetricalityDefending Enterprise IT - beating assymetricality
Defending Enterprise IT - beating assymetricality
 
Presentation infra and_datacentrre_dialogue_v2
Presentation infra and_datacentrre_dialogue_v2Presentation infra and_datacentrre_dialogue_v2
Presentation infra and_datacentrre_dialogue_v2
 
Binary crosswords
Binary crosswordsBinary crosswords
Binary crosswords
 

More from Dorothea Salo

Soylent Semantic Web Is People! (with notes)
Soylent Semantic Web Is People! (with notes)Soylent Semantic Web Is People! (with notes)
Soylent Semantic Web Is People! (with notes)
Dorothea Salo
 
Soylent SemanticWeb Is People!
Soylent SemanticWeb Is People!Soylent SemanticWeb Is People!
Soylent SemanticWeb Is People!
Dorothea Salo
 
Encryption
EncryptionEncryption
Encryption
Dorothea Salo
 
Privacy and libraries
Privacy and librariesPrivacy and libraries
Privacy and libraries
Dorothea Salo
 
Paying for it
Paying for itPaying for it
Paying for it
Dorothea Salo
 
The Canonically Bad (Digital) Humanities Proposal (and how to avoid it)
The Canonically Bad (Digital) Humanities Proposal (and how to avoid it)The Canonically Bad (Digital) Humanities Proposal (and how to avoid it)
The Canonically Bad (Digital) Humanities Proposal (and how to avoid it)
Dorothea Salo
 
Is this BIG DATA which I see before me?
Is this BIG DATA which I see before me?Is this BIG DATA which I see before me?
Is this BIG DATA which I see before me?
Dorothea Salo
 
Library Linked Data
Library Linked DataLibrary Linked Data
Library Linked Data
Dorothea Salo
 
Research Data and Scholarly Communication
Research Data and Scholarly CommunicationResearch Data and Scholarly Communication
Research Data and Scholarly Communication
Dorothea Salo
 
Research Data and Scholarly Communication (with notes)
Research Data and Scholarly Communication (with notes)Research Data and Scholarly Communication (with notes)
Research Data and Scholarly Communication (with notes)
Dorothea Salo
 
Manufacturing Serendipity
Manufacturing SerendipityManufacturing Serendipity
Manufacturing Serendipity
Dorothea Salo
 
Occupy Copyright!
Occupy Copyright!Occupy Copyright!
Occupy Copyright!
Dorothea Salo
 
RDF, RDA, and other TLAs
RDF, RDA, and other TLAsRDF, RDA, and other TLAs
RDF, RDA, and other TLAsDorothea Salo
 
I own copyright, so I pwn you!
I own copyright, so I pwn you!I own copyright, so I pwn you!
I own copyright, so I pwn you!
Dorothea Salo
 
Librarians love data!
Librarians love data!Librarians love data!
Librarians love data!
Dorothea Salo
 
Taming the Monster: Digital Preservation Planning and Implementation Tools
Taming the Monster: Digital Preservation Planning and Implementation ToolsTaming the Monster: Digital Preservation Planning and Implementation Tools
Taming the Monster: Digital Preservation Planning and Implementation Tools
Dorothea Salo
 
Avoiding the Heron's Way
Avoiding the Heron's WayAvoiding the Heron's Way
Avoiding the Heron's WayDorothea Salo
 
Manufacturing Serendipity
Manufacturing SerendipityManufacturing Serendipity
Manufacturing Serendipity
Dorothea Salo
 

More from Dorothea Salo (20)

Soylent Semantic Web Is People! (with notes)
Soylent Semantic Web Is People! (with notes)Soylent Semantic Web Is People! (with notes)
Soylent Semantic Web Is People! (with notes)
 
Soylent SemanticWeb Is People!
Soylent SemanticWeb Is People!Soylent SemanticWeb Is People!
Soylent SemanticWeb Is People!
 
Encryption
EncryptionEncryption
Encryption
 
Privacy and libraries
Privacy and librariesPrivacy and libraries
Privacy and libraries
 
Paying for it
Paying for itPaying for it
Paying for it
 
The Canonically Bad (Digital) Humanities Proposal (and how to avoid it)
The Canonically Bad (Digital) Humanities Proposal (and how to avoid it)The Canonically Bad (Digital) Humanities Proposal (and how to avoid it)
The Canonically Bad (Digital) Humanities Proposal (and how to avoid it)
 
Is this BIG DATA which I see before me?
Is this BIG DATA which I see before me?Is this BIG DATA which I see before me?
Is this BIG DATA which I see before me?
 
Library Linked Data
Library Linked DataLibrary Linked Data
Library Linked Data
 
FRBR and RDA
FRBR and RDAFRBR and RDA
FRBR and RDA
 
Research Data and Scholarly Communication
Research Data and Scholarly CommunicationResearch Data and Scholarly Communication
Research Data and Scholarly Communication
 
Research Data and Scholarly Communication (with notes)
Research Data and Scholarly Communication (with notes)Research Data and Scholarly Communication (with notes)
Research Data and Scholarly Communication (with notes)
 
Manufacturing Serendipity
Manufacturing SerendipityManufacturing Serendipity
Manufacturing Serendipity
 
Occupy Copyright!
Occupy Copyright!Occupy Copyright!
Occupy Copyright!
 
RDF, RDA, and other TLAs
RDF, RDA, and other TLAsRDF, RDA, and other TLAs
RDF, RDA, and other TLAs
 
I own copyright, so I pwn you!
I own copyright, so I pwn you!I own copyright, so I pwn you!
I own copyright, so I pwn you!
 
Librarians love data!
Librarians love data!Librarians love data!
Librarians love data!
 
Taming the Monster: Digital Preservation Planning and Implementation Tools
Taming the Monster: Digital Preservation Planning and Implementation ToolsTaming the Monster: Digital Preservation Planning and Implementation Tools
Taming the Monster: Digital Preservation Planning and Implementation Tools
 
Avoiding the Heron's Way
Avoiding the Heron's WayAvoiding the Heron's Way
Avoiding the Heron's Way
 
Manufacturing Serendipity
Manufacturing SerendipityManufacturing Serendipity
Manufacturing Serendipity
 
Open Content
Open ContentOpen Content
Open Content
 

Recently uploaded

Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
rosedainty
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
PedroFerreira53928
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
EduSkills OECD
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
Celine George
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
AzmatAli747758
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
Col Mukteshwar Prasad
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
bennyroshan06
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 

Recently uploaded (20)

Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 

Risk management and auditing

  • 2. Threat model •“Preservation” means nothing unmodified. • This is why it becomes such a bogeyman! •Two things you need to know first: • why you’re preserving what you’re preserving, and • what you’re preserving it against. •Libraries: your collection-development policy should inform the first question. • Your coll-dev policy doesn’t include local born-digital or digitized materials? This is a problem. Fix it. •The second question is your “threat model.”
  • 3. What is your threat model for print?
  • 9. Why did I just make you do that? •I’m weird. •I’m trying to destroy the myth that any given medium “preserves itself.” •Media do not preserve themselves. People preserve media—or media get bizarrely lucky. •We need not panic over digital preservation any more than we panic about print. •Approach digital preservation the same way you approach print preservation.
  • 15. Forgetting what the stuff you have means
  • 17. Lack (or disappearance) of organizational commitment
  • 19. ? Ignorance •“It’s in Google, so it’s preserved.” (Not even “Google Books!”) •“I make backups, so I’m fine.” •“I have a graduate student who takes care of these things.” •“Metadata? What’s that? I have to have it?” •“Digital preservation is an unsolvable problem, so why even try?” (I’ve heard this one from librarians. I bet you have too.)
  • 21. Mitigating the risks: planning and auditing tools
  • 22. Audit frameworks • Trusted Repository Audit Checklist • (If you see “NARA/RLG” somewhere? This is the framework that evolved into TRAC. Long story.) • You can get an actual formal TRAC audit from CRL! Who has? Portico, Hathi, “Chronicle of Life,” two-three others. This audit is HARSH. (So don’t write off a repo because it hasn’t had a TRAC audit.) • If you hear the phrase “trusted digital repository,” it should mean that the repo has had (or is pursuing) a TRAC audit. • DRAMBORA • More flexible, less finger-shaking than TRAC. • Less of this “designated community” nonsense. • Less dependent on OAIS model (which I consider a strength). • Encourages archives to consider and document their individual situations and think hard about risk mitigation.
  • 23. Newer: SPOT model •Even less clunky than DRAMBORA. •I quite like this one. •Identifying Threats to Successful Digital Preservation: the SPOT Model for Risk Assessment • http://www.dlib.org/dlib/september12/vermaaten/ 09vermaaten.html
  • 24. So what do they audit? •Mission (and adherence to it) •Plans and policies • including contingency plans •Staff infrastructure •Operations documentation • including tech infrastructure, service infrastructure •Sustainable funding •“Doing the right things with the stuff.” • identifiers, ingest file format management, migration, etc. •NOTICE WHAT’S FIRST ON THE LIST. • remember, the tech part is the easy part!
  • 25. TRAC, DRAMBORA, and DH •TRAC, DRAMBORA, and SPOT are designed to audit repositories, not individual datasets, data files, or research projects. • They assume a lot of infrastructure and (in TRAC’s case) a long-term time horizon that you probably aren’t. •So if you’re trying to think through a project, where do you go? • TRAC and DRAMBORA are probably overkill! • (Though parts of DRAMBORA won’t hurt you.)
  • 26. Data Curation Profiles •Research project out of Purdue’s Digital Data Curation Center (“D2C2”) •“Toolkit:” interview instrument, user guide for interview instrument, worksheet. •Small library of completed profiles •Ignore the user guide. Grab the worksheet, and use the interview instrument for reference. •http://datacurationprofiles.org • You have to make a login to download the toolkit pieces.
  • 28. Physical medium failure •Gold CDs are not the panacea we thought. • They’re not bad; they’re just hard to audit, so they fail (when they fail) silently. Silent failure is DEADLY. •Current state of the art: get it on spinning disk. •Back up often. Distribute your backups geographically. Test them now and then. • Consider a LOCKSS cooperative agreement. Others have. •Bitrot-detection techniques may help here too. •Any physical medium WILL FAIL. Have a plan for when it does.
  • 29. “Digital forensics” •The art and science of investigating digital file formats and media. • Reading obsolete ones. • Reverse-engineering and/or documenting existing ones so they don’t go obsolete. • Ensuring secure deletion, when necessary. • Reconstructing what used to be on a physical storage medium. (Surprising how often this is possible!) • Audit trails for legal and records-management purposes. • AMAZING report (highly highly recommended!): “Digital Forensics and Born-Digital Content in Cultural Heritage Institutions.” http://www.clir.org/pubs/abstract/ pub149abst.html. Both computer-nerdy and humanities- nerdy in the best possible way.
  • 30. Avoiding “bitrot” •Sometimes used for “file format obsolescence.” •I use it for “the bits flipped unexpectedly.” •Checking a file bit-by-bit against a backup copy is computationally impractical for every day. • Though on ingest it’s a good idea to verify bit-by-bit! •Checksums • A file is, fundamentally, a great big number. • Do math on the number file. Store the result as metadata. • To check for bitrot, redo the math and check the answer against the stored result. If they’re different, scream. • Several checksum algorithms; for our purposes, which one you use doesn’t matter much. • “Hash collision:” it’s possible, but unlikely, for different files to have the same checksum. Potential hack vector!
  • 31. Migration vs. emulation: dealing with obsolescence •Migration • change the file to be usable in new software/hardware configurations • risks: information loss (FONTS!), imperfect transfer, choosing the wrong migration path • smart systems don’t throw away the old files! •Emulation • keep the file, train new software/hardware to behave like the old • risks: imperfect emulation, impractical emulation • makes more sense for software (games!), less for files •Pragmatically: redigitization.
  • 32. Finding tools •Migration • Current versions of the original software may be able to open old files. • Open-source software in the same genre may be able to translate proprietary file formats (often imperfectly). Tend to maintain translators longer than you’d think. • Look on the web! • MIGRATE FAST. Once it’s damaged or obsolete, it’s probably too late. •Emulation • look for the gamers! it’s WILD what they’ll emulate! • Look to the open-source community for operating- system, hardware-driver emulators. • Frankly, there’s a lot of hype and vaporware here.
  • 33. When is a PDF not a PDF? •When it’s a .doc with the wrong file extension •When there’s no file extension on it at all •When it’s so old it doesn’t follow the standardized PDF conventions •When it’s otherwise malformed, made by a bad piece of software. •How do you know whether you have a good PDF? (Or .doc, or .jpg, or .xml, or anything else.)
  • 34. File format registries and testing tools •JHOVE: JSTOR/Harvard Object Validation Environment • Java software intended to be pluggable into other software environments • Answers “What format is this thing?” and “Is this thing a good example of the format?” • Limited repertoire of formats •PRONOM/DROID + GDFR = Unified Digital Formats Registry •Wrapper tool: FITS, File Information Tool Set • JHOVE + DROID + various other testers. State of the art.
  • 35. Thanks! •Copyright 2011 by Dorothea Salo. •This lecture and slide deck are licensed under a Creative Commons Attribution 3.0 United States License.