SlideShare a Scribd company logo
1 of 28
Prepared for
MIT Libraries Informatics Program Brown Bag Talk
June 2013
Approaches to Preservation Storage
Technologies
Dr. Micah Altman
<escience@mit.edu>
Director of Research, MIT Libraries
DISCLAIMER
These opinions are my own, they are not the opinions
of MIT, Brookings, any of the project funders, nor (with
the exception of co-authored previously published
work) my collaborators
Secondary disclaimer:
“It’s tough to make predictions, especially about the
future!”
-- Attributed to Woody Allen, Yogi Berra, Niels Bohr, Vint Cerf, Winston Churchill,
Confucius, Disreali [sic], Freeman Dyson, Cecil B. Demille, Albert Einstein, Enrico Fermi,
Edgar R. Fiedler, Bob Fourer, Sam Goldwyn, Allan Lamport, Groucho Marx, Dan Quayle,
George Bernard Shaw, Casey Stengel, Will Rogers, M. Taub, Mark Twain, Kerr L. White,
etc.
Approaches to Preservation Storage Technologies 2
Collaborators & Co-Conspirators
• Jefferson Bailey, Karen Cariani, Jonathan
Crabtree, Michelle Gallinger, Jane
Mandelbaum, Nancy McGovern Trevor
Owens
• NDSA Coordination Committee & Working
Group Chairs
• Research Support
Thanks to the Library of Congress, & the
Massachusetts Institute of Technology.
Approaches to Preservation Storage Technologies 3
Related Work
• Altman, et. al, 2013. “NDSA Storage Report: Reflections on
National Digital Stewardship Alliance Member Approaches to
Preservation Storage Technologies”, Dlib 19 (5/6)
• National Digital Stewardship Alliance, 2013 (Forthcoming), 2014
National Agenda for Digital Stewardship.
• Micah Altman, Jonathan Crabtree (2011) Using the SafeArchive
System : TRAC-Based Auditing of LOCKSS, 165-170. In Archiving
2011.
Most reprints available from:
informatics.mit.edu
Approaches to Preservation Storage Technologies 4
Simple question?
• If you have 1000 files (bitstreams), and you’d
like to have 99.99% chance of accessing them
in 20 years. How do you store them?
Approaches to Preservation Storage Technologies 5
Simplistic Answer: Put it in AWS
• Amazon Glacier claims a design reliability of
99.999999999%
• Neat-o !!!!!!!!!!
– Longer odds than winning Powerball OR
– Getting struck by a lightning, three times OR
– (Possibly) eventually finding alien civilization
• But …
Approaches to Preservation Storage Technologies 6
Clarifying Requirements
• What are the units of reliability? - Collection?
Object? Bit?
• What is the natural unit of risk?
• Is value of information uniform across units?
• How many of these do you have?
Approaches to Preservation Storage Technologies 7
Hidden Assumptions
• Reliability estimates appear entirely theoretical
– (MTBF + Independence)* enough replicas -> as many 9’s as you like…
– No details for estimate provided
– No historical reliability statistics provided
– No service reliability auditing provided
• Empirical Issues
– Storage manufacture hardware MTBF (mean time between failures) does not
match observed error rates in real environments…
– Failures across hardware replicas are observed to correlated
• Unmodeled failure modes
– software failure
(e.g. a bug in the AWS software for its control backplane might result in
permanent loss that would go undetected for a substantial time_
– legal threats (leading to account lock-out — such as this, deletion, or content
removal);
– institutional threats (such as a change in Amazon’s business model)
– Process threats (someone hits the delete button by mistake; forgets to pay
the bill; or AWS rejects the payment)
Approaches to Preservation Storage Technologies 8
Business Risks?
• Amazon SLA’s do not incorporate or reflect
“design” reliability claims:
– No claim to reliability in SLA’s
– Sole recover for breach limited to refund of fees for
periods the service was unavailable
– No right to audit logs, or other evidence of reliability
Approaches to Preservation Storage Technologies 9
What practices are
leading stewardhip
organizations using?
Approaches to Preservation Storage Technologies 10
Results from the NDSA Bi-Annual
Preservation Storage Survey
• 74 institutions surveyed.
58 met selection criteria.
– Follow up on non-responders: 100% response rate.
– Low rolloff on individual questions
– Next round will be > 2x bigger
• Survey Methods
– Close ended, with open ended extensions
– Selected qualitative followup
• Survey Data
– Instrument and data available as open data
Approaches to Preservation Storage Technologies 11
About the NDSA
Approaches to Preservation Storage Technologies 12
Key Findings: What are Current
Institutional Practices?
• 90% of respondents are distributing copies of at least part of their content
geographically
• 88% of respondents are responsible for their content for an indefinite
period of time
• 80% of respondents use some form of fixity checking for their content
• 75% of respondents report a strong preference to host and control their
own technical infrastructure for preservation storage
• 69% of respondents are considering or currently participating in a
distributed storage cooperative or system (ex. LOCKSS alliance, MetaArchive,
Data-PASS)
• 64% of respondents are planning to make significant changes in the
technologies in their preservation storage architecture in the next three
years
• 51% of respondents are considering or already using a cloud storage
provider to keep one or more copies of their content
• 48% of respondents are considering, or currently contracting out storage
services to be managed by another organization or company
Approaches to Preservation Storage Technologies 13
How Many Copies
Approaches to Preservation Storage Technologies 14
How Many Copies? …
by Role
Approaches to Preservation Storage Technologies 15
Preservation Storage -- New
Approaches
Approaches to Preservation Storage Technologies 16
What do organizations want from their
preservation systems?
Approaches to Preservation Storage Technologies 17
What are most memory organizations
not doing yet?
• Formal cost and valuation models
• Auditing&evaluation
• Certification
• Comprehensive content review
Approaches to Preservation Storage Technologies 18
Plans for future
Approaches to Preservation Storage Technologies 19
Emerging State of the
Practice
Approaches to Preservation Storage Technologies 20
Methods for Mitigating Bit-Level Risk
Physical:
Media,
Hardware,
Environment
Number
of copies
Diversification
of copies
Formats File
Transforms:
compression,
encoding,
encryption
Fixity Repair
Local
Storage
File
Systems:
transforms,
deduplication,
redundancy
Replication
Verification
Audit
Approaches to Preservation Storage Technologies 21
Emerging State of Practice
• Organizational – Multi Institutional Stewardship
– Institutions hold digital assets they wish to preserve,
many unique
– Many of these assets are not replicated at all
– Even when institutions keep multiple backups offsite,
many single points of failure remain,
because replicas are managed by single institution
– Approaches: LOCKSS, Digital Preservation Network,
MetaArchive, Data-PASS, Datanet Federation
Consortium, Data-ONE
• Technical: Fixity, verification and auditing
• Legal: Secession planning, Confidentiality, …
Approaches to Preservation Storage Technologies 22
Future research: What
do we need to know?
Approaches to Preservation Storage Technologies 23
Modeling
Bit Corruption
Media
characteristics
Threat
characteristics
Correlations
Logical Scope of
Corruption
File
Characteristics
File system
Characteristics
Probability of
Successful Repair
Auditing
Frequency
Number of copies
Repair frequency
Corruption
Detection
Repair
Approaches to Preservation Storage Technologies 24
The Risk Problem Restated
Keeping risk of object loss fixed
-- what choices minimize $?
“Dual problem”
Keeping $ fixed, what choices minimize risk?
Extension
For specific cost functions for loss of object:
Loss(object_i), of all lost objects
What choices minimize:
Total cost= preservation cost+ sum(E(Loss))
risk
cost
Are we there
yet?
Approaches to Preservation Storage Technologies 25
Research Directions
• Growing the evidence base
– Descriptive inference – patterns of use
– Descriptive inference – outcomes
– Predictive inference – trend analysis
– Causal inference – effectiveness of interventions
• Modes of inquiry
– probability-based surveys
(e.g. of information management practice and outcomes)
– replicable simulation experiments tied to theoretically grounded
models of information management and risk;
– creation of testbeds and test-corpuses which can be used to
systematically compare new practices, tools, and methods;
– field experiments, in which randomized interventions are applied and
evaluated in real operational environments.
Approaches to Preservation Storage Technologies 26
Bibliography (Selected)
• David S.H. Rosenthal, Thomas S. Robertson, Tom Lipkis, Vicky Reich,
Seth Morabito. “Requirements for Digital Preservation Systems: A
Bottom-Up Approach”, D-Lib Magazine, vol. 11, no. 11, November
2005.
• Pinheiro, E., Weber, W.D., & Barroso, L. A. (2007). Failure trends in a large
disk drive population. In Proceedings of 5th USENIX Conference on File and
Storage Technologies.
• Rosenthal, David SH. "Bit preservation: a solved problem?." International
Journal of Digital Curation 5.1 (2010): 134-148.
Approaches to Preservation Storage
Technologies
27
Questions?
E-mail: escience@mit.edu
Web: micahaltman.com
Twitter: @drmaltman
Approaches to Preservation Storage
Technologies
28

More Related Content

Similar to Approaches to Preservation Storage Technologies

Analytics in Context: Modelling in a regulatory environment
Analytics in Context: Modelling in a regulatory environmentAnalytics in Context: Modelling in a regulatory environment
Analytics in Context: Modelling in a regulatory environmentIntegrated Knowledge Services
 
Large Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine LearningLarge Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine Learningjaumebp
 
Auditing Distributed Preservation Networks
Auditing Distributed Preservation Networks Auditing Distributed Preservation Networks
Auditing Distributed Preservation Networks Micah Altman
 
Philips john huffman
Philips john huffmanPhilips john huffman
Philips john huffmanBigDataExpo
 
ACRL 2011 Data-Driven Library Web Design
ACRL 2011 Data-Driven Library Web DesignACRL 2011 Data-Driven Library Web Design
ACRL 2011 Data-Driven Library Web DesignAmanda Dinscore
 
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdfML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdfAvijitChaudhuri3
 
A New Model for Testing
A New Model for TestingA New Model for Testing
A New Model for TestingSQALab
 
How Many Copies is Enough
How Many Copies is EnoughHow Many Copies is Enough
How Many Copies is EnoughMicah Altman
 
The Myths + Realities of Machine-Learning Cybersecurity
The Myths + Realities of Machine-Learning CybersecurityThe Myths + Realities of Machine-Learning Cybersecurity
The Myths + Realities of Machine-Learning CybersecurityInterset
 
Lecture_4_Data_Gathering_and_Analysis.pdf
Lecture_4_Data_Gathering_and_Analysis.pdfLecture_4_Data_Gathering_and_Analysis.pdf
Lecture_4_Data_Gathering_and_Analysis.pdfAbdullahOmar64
 
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...Edge AI and Vision Alliance
 
Ibm colloquium 070915_nyberg
Ibm colloquium 070915_nybergIbm colloquium 070915_nyberg
Ibm colloquium 070915_nybergdiannepatricia
 
CyberSecurity Portfolio Management
CyberSecurity Portfolio ManagementCyberSecurity Portfolio Management
CyberSecurity Portfolio ManagementPriyanka Aash
 
071310 sun d_0930_feldman_stephen
071310 sun d_0930_feldman_stephen071310 sun d_0930_feldman_stephen
071310 sun d_0930_feldman_stephenSteve Feldman
 
Modeling Framework to Support Evidence-Based Decisions
Modeling Framework to Support Evidence-Based DecisionsModeling Framework to Support Evidence-Based Decisions
Modeling Framework to Support Evidence-Based DecisionsAlbert Simard
 

Similar to Approaches to Preservation Storage Technologies (20)

Analytics in Context: Modelling in a regulatory environment
Analytics in Context: Modelling in a regulatory environmentAnalytics in Context: Modelling in a regulatory environment
Analytics in Context: Modelling in a regulatory environment
 
Large Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine LearningLarge Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine Learning
 
Auditing Distributed Preservation Networks
Auditing Distributed Preservation Networks Auditing Distributed Preservation Networks
Auditing Distributed Preservation Networks
 
Philips john huffman
Philips john huffmanPhilips john huffman
Philips john huffman
 
ACRL 2011 Data-Driven Library Web Design
ACRL 2011 Data-Driven Library Web DesignACRL 2011 Data-Driven Library Web Design
ACRL 2011 Data-Driven Library Web Design
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdfML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
A New Model for Testing
A New Model for TestingA New Model for Testing
A New Model for Testing
 
How Many Copies is Enough
How Many Copies is EnoughHow Many Copies is Enough
How Many Copies is Enough
 
The Myths + Realities of Machine-Learning Cybersecurity
The Myths + Realities of Machine-Learning CybersecurityThe Myths + Realities of Machine-Learning Cybersecurity
The Myths + Realities of Machine-Learning Cybersecurity
 
Lecture_4_Data_Gathering_and_Analysis.pdf
Lecture_4_Data_Gathering_and_Analysis.pdfLecture_4_Data_Gathering_and_Analysis.pdf
Lecture_4_Data_Gathering_and_Analysis.pdf
 
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
 
Ibm colloquium 070915_nyberg
Ibm colloquium 070915_nybergIbm colloquium 070915_nyberg
Ibm colloquium 070915_nyberg
 
CyberSecurity Portfolio Management
CyberSecurity Portfolio ManagementCyberSecurity Portfolio Management
CyberSecurity Portfolio Management
 
071310 sun d_0930_feldman_stephen
071310 sun d_0930_feldman_stephen071310 sun d_0930_feldman_stephen
071310 sun d_0930_feldman_stephen
 
Modeling Framework to Support Evidence-Based Decisions
Modeling Framework to Support Evidence-Based DecisionsModeling Framework to Support Evidence-Based Decisions
Modeling Framework to Support Evidence-Based Decisions
 

More from Micah Altman

Selecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategiesSelecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategiesMicah Altman
 
Well-Being - A Sunset Conversation
Well-Being - A Sunset ConversationWell-Being - A Sunset Conversation
Well-Being - A Sunset ConversationMicah Altman
 
Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Micah Altman
 
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Micah Altman
 
Well-being A Sunset Conversation
Well-being A Sunset ConversationWell-being A Sunset Conversation
Well-being A Sunset ConversationMicah Altman
 
Can We Fix Peer Review
Can We Fix Peer ReviewCan We Fix Peer Review
Can We Fix Peer ReviewMicah Altman
 
Academy Owned Peer Review
Academy Owned Peer ReviewAcademy Owned Peer Review
Academy Owned Peer ReviewMicah Altman
 
Redistricting in the US -- An Overview
Redistricting in the US -- An OverviewRedistricting in the US -- An Overview
Redistricting in the US -- An OverviewMicah Altman
 
A Future for Electoral Districting
A Future for Electoral DistrictingA Future for Electoral Districting
A Future for Electoral DistrictingMicah Altman
 
A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk  A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk Micah Altman
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...Micah Altman
 
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Micah Altman
 
Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Micah Altman
 
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsCreative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsMicah Altman
 
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...Micah Altman
 
Ndsa 2016 opening plenary
Ndsa 2016 opening plenaryNdsa 2016 opening plenary
Ndsa 2016 opening plenaryMicah Altman
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Micah Altman
 
Software Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanSoftware Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanMicah Altman
 
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...Micah Altman
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceMicah Altman
 

More from Micah Altman (20)

Selecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategiesSelecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategies
 
Well-Being - A Sunset Conversation
Well-Being - A Sunset ConversationWell-Being - A Sunset Conversation
Well-Being - A Sunset Conversation
 
Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...
 
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
 
Well-being A Sunset Conversation
Well-being A Sunset ConversationWell-being A Sunset Conversation
Well-being A Sunset Conversation
 
Can We Fix Peer Review
Can We Fix Peer ReviewCan We Fix Peer Review
Can We Fix Peer Review
 
Academy Owned Peer Review
Academy Owned Peer ReviewAcademy Owned Peer Review
Academy Owned Peer Review
 
Redistricting in the US -- An Overview
Redistricting in the US -- An OverviewRedistricting in the US -- An Overview
Redistricting in the US -- An Overview
 
A Future for Electoral Districting
A Future for Electoral DistrictingA Future for Electoral Districting
A Future for Electoral Districting
 
A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk  A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
 
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
 
Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:
 
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsCreative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
 
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
 
Ndsa 2016 opening plenary
Ndsa 2016 opening plenaryNdsa 2016 opening plenary
Ndsa 2016 opening plenary
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...
 
Software Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanSoftware Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental Scan
 
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information Science
 

Recently uploaded

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Recently uploaded (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

Approaches to Preservation Storage Technologies

  • 1. Prepared for MIT Libraries Informatics Program Brown Bag Talk June 2013 Approaches to Preservation Storage Technologies Dr. Micah Altman <escience@mit.edu> Director of Research, MIT Libraries
  • 2. DISCLAIMER These opinions are my own, they are not the opinions of MIT, Brookings, any of the project funders, nor (with the exception of co-authored previously published work) my collaborators Secondary disclaimer: “It’s tough to make predictions, especially about the future!” -- Attributed to Woody Allen, Yogi Berra, Niels Bohr, Vint Cerf, Winston Churchill, Confucius, Disreali [sic], Freeman Dyson, Cecil B. Demille, Albert Einstein, Enrico Fermi, Edgar R. Fiedler, Bob Fourer, Sam Goldwyn, Allan Lamport, Groucho Marx, Dan Quayle, George Bernard Shaw, Casey Stengel, Will Rogers, M. Taub, Mark Twain, Kerr L. White, etc. Approaches to Preservation Storage Technologies 2
  • 3. Collaborators & Co-Conspirators • Jefferson Bailey, Karen Cariani, Jonathan Crabtree, Michelle Gallinger, Jane Mandelbaum, Nancy McGovern Trevor Owens • NDSA Coordination Committee & Working Group Chairs • Research Support Thanks to the Library of Congress, & the Massachusetts Institute of Technology. Approaches to Preservation Storage Technologies 3
  • 4. Related Work • Altman, et. al, 2013. “NDSA Storage Report: Reflections on National Digital Stewardship Alliance Member Approaches to Preservation Storage Technologies”, Dlib 19 (5/6) • National Digital Stewardship Alliance, 2013 (Forthcoming), 2014 National Agenda for Digital Stewardship. • Micah Altman, Jonathan Crabtree (2011) Using the SafeArchive System : TRAC-Based Auditing of LOCKSS, 165-170. In Archiving 2011. Most reprints available from: informatics.mit.edu Approaches to Preservation Storage Technologies 4
  • 5. Simple question? • If you have 1000 files (bitstreams), and you’d like to have 99.99% chance of accessing them in 20 years. How do you store them? Approaches to Preservation Storage Technologies 5
  • 6. Simplistic Answer: Put it in AWS • Amazon Glacier claims a design reliability of 99.999999999% • Neat-o !!!!!!!!!! – Longer odds than winning Powerball OR – Getting struck by a lightning, three times OR – (Possibly) eventually finding alien civilization • But … Approaches to Preservation Storage Technologies 6
  • 7. Clarifying Requirements • What are the units of reliability? - Collection? Object? Bit? • What is the natural unit of risk? • Is value of information uniform across units? • How many of these do you have? Approaches to Preservation Storage Technologies 7
  • 8. Hidden Assumptions • Reliability estimates appear entirely theoretical – (MTBF + Independence)* enough replicas -> as many 9’s as you like… – No details for estimate provided – No historical reliability statistics provided – No service reliability auditing provided • Empirical Issues – Storage manufacture hardware MTBF (mean time between failures) does not match observed error rates in real environments… – Failures across hardware replicas are observed to correlated • Unmodeled failure modes – software failure (e.g. a bug in the AWS software for its control backplane might result in permanent loss that would go undetected for a substantial time_ – legal threats (leading to account lock-out — such as this, deletion, or content removal); – institutional threats (such as a change in Amazon’s business model) – Process threats (someone hits the delete button by mistake; forgets to pay the bill; or AWS rejects the payment) Approaches to Preservation Storage Technologies 8
  • 9. Business Risks? • Amazon SLA’s do not incorporate or reflect “design” reliability claims: – No claim to reliability in SLA’s – Sole recover for breach limited to refund of fees for periods the service was unavailable – No right to audit logs, or other evidence of reliability Approaches to Preservation Storage Technologies 9
  • 10. What practices are leading stewardhip organizations using? Approaches to Preservation Storage Technologies 10
  • 11. Results from the NDSA Bi-Annual Preservation Storage Survey • 74 institutions surveyed. 58 met selection criteria. – Follow up on non-responders: 100% response rate. – Low rolloff on individual questions – Next round will be > 2x bigger • Survey Methods – Close ended, with open ended extensions – Selected qualitative followup • Survey Data – Instrument and data available as open data Approaches to Preservation Storage Technologies 11
  • 12. About the NDSA Approaches to Preservation Storage Technologies 12
  • 13. Key Findings: What are Current Institutional Practices? • 90% of respondents are distributing copies of at least part of their content geographically • 88% of respondents are responsible for their content for an indefinite period of time • 80% of respondents use some form of fixity checking for their content • 75% of respondents report a strong preference to host and control their own technical infrastructure for preservation storage • 69% of respondents are considering or currently participating in a distributed storage cooperative or system (ex. LOCKSS alliance, MetaArchive, Data-PASS) • 64% of respondents are planning to make significant changes in the technologies in their preservation storage architecture in the next three years • 51% of respondents are considering or already using a cloud storage provider to keep one or more copies of their content • 48% of respondents are considering, or currently contracting out storage services to be managed by another organization or company Approaches to Preservation Storage Technologies 13
  • 14. How Many Copies Approaches to Preservation Storage Technologies 14
  • 15. How Many Copies? … by Role Approaches to Preservation Storage Technologies 15
  • 16. Preservation Storage -- New Approaches Approaches to Preservation Storage Technologies 16
  • 17. What do organizations want from their preservation systems? Approaches to Preservation Storage Technologies 17
  • 18. What are most memory organizations not doing yet? • Formal cost and valuation models • Auditing&evaluation • Certification • Comprehensive content review Approaches to Preservation Storage Technologies 18
  • 19. Plans for future Approaches to Preservation Storage Technologies 19
  • 20. Emerging State of the Practice Approaches to Preservation Storage Technologies 20
  • 21. Methods for Mitigating Bit-Level Risk Physical: Media, Hardware, Environment Number of copies Diversification of copies Formats File Transforms: compression, encoding, encryption Fixity Repair Local Storage File Systems: transforms, deduplication, redundancy Replication Verification Audit Approaches to Preservation Storage Technologies 21
  • 22. Emerging State of Practice • Organizational – Multi Institutional Stewardship – Institutions hold digital assets they wish to preserve, many unique – Many of these assets are not replicated at all – Even when institutions keep multiple backups offsite, many single points of failure remain, because replicas are managed by single institution – Approaches: LOCKSS, Digital Preservation Network, MetaArchive, Data-PASS, Datanet Federation Consortium, Data-ONE • Technical: Fixity, verification and auditing • Legal: Secession planning, Confidentiality, … Approaches to Preservation Storage Technologies 22
  • 23. Future research: What do we need to know? Approaches to Preservation Storage Technologies 23
  • 24. Modeling Bit Corruption Media characteristics Threat characteristics Correlations Logical Scope of Corruption File Characteristics File system Characteristics Probability of Successful Repair Auditing Frequency Number of copies Repair frequency Corruption Detection Repair Approaches to Preservation Storage Technologies 24
  • 25. The Risk Problem Restated Keeping risk of object loss fixed -- what choices minimize $? “Dual problem” Keeping $ fixed, what choices minimize risk? Extension For specific cost functions for loss of object: Loss(object_i), of all lost objects What choices minimize: Total cost= preservation cost+ sum(E(Loss)) risk cost Are we there yet? Approaches to Preservation Storage Technologies 25
  • 26. Research Directions • Growing the evidence base – Descriptive inference – patterns of use – Descriptive inference – outcomes – Predictive inference – trend analysis – Causal inference – effectiveness of interventions • Modes of inquiry – probability-based surveys (e.g. of information management practice and outcomes) – replicable simulation experiments tied to theoretically grounded models of information management and risk; – creation of testbeds and test-corpuses which can be used to systematically compare new practices, tools, and methods; – field experiments, in which randomized interventions are applied and evaluated in real operational environments. Approaches to Preservation Storage Technologies 26
  • 27. Bibliography (Selected) • David S.H. Rosenthal, Thomas S. Robertson, Tom Lipkis, Vicky Reich, Seth Morabito. “Requirements for Digital Preservation Systems: A Bottom-Up Approach”, D-Lib Magazine, vol. 11, no. 11, November 2005. • Pinheiro, E., Weber, W.D., & Barroso, L. A. (2007). Failure trends in a large disk drive population. In Proceedings of 5th USENIX Conference on File and Storage Technologies. • Rosenthal, David SH. "Bit preservation: a solved problem?." International Journal of Digital Curation 5.1 (2010): 134-148. Approaches to Preservation Storage Technologies 27
  • 28. Questions? E-mail: escience@mit.edu Web: micahaltman.com Twitter: @drmaltman Approaches to Preservation Storage Technologies 28

Editor's Notes

  1. This work. by Micah Altman (http://micahaltman.com) is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
  2. The structure and design of digital storage systems is a cornerstone of digital preservation. To better understand ongoing storage practices of organizations committed to digital preservation, the National Digital Stewardship Alliance conducted a survey of member organizations. This talk discusses findings from this survey, common gaps, and trends in this area.(I also have a little fun highlighting the hidden assumptions underlying Amazon Glacier&apos;s reliability claims. For more on that see this earlier post: http://drmaltman.wordpress.com/2012/11/15/amazons-creeping-glacier-and-digital-preservation )