SlideShare a Scribd company logo
1 of 37
Prepared for
              Digital Preservation 2012
                   Washington, DC
                       July 2012




Assessing and Mitigating Bit-
  Level Preservation Risks

  NDSA Infrastructure Working Group
INTRODUCTION:

A Framework for Addressing
Bit-Level Preservation Risk
                          Micah Altman
       <Micah_Altman@alumni.brown.edu>
         Director of Research, MIT Libraries
                                 Mark Evans
                 <mark.evans@tessella.com>
         Digital Archiving Practice Manager,
                                 Tessela Inc.
Threats to Bits




Physical & Hardware                          Software
                          Insider &
                           External
                           Attacks




                        Organizational
                           Failure

       Media                             Curatorial Error
Do you know where your data are?
Encoding

                             Priscilla Caplan
                            <pcaplan@ufl.edu>
Assistant Director for Digital Library Services,
                       Florida Virtual Campus
Compression
• Many types of compression:
    – Format based file compression, e.g. JPEG2000
    – Tape hardware compression at the drive
    – NAS compression via appliance or storage
      device
    – Data deduplication
•   Is it lossless?
•   Is it transparent?
•   Is it proprietary?
•   What is effect on error recovery?
Compression Tradeoffs
• Tradeoffs
  – Space savings allows more copies at same cost
  – But makes files more sensitive to data corruption
• Erasure coding in cloud storage
  – Massively more reliable
  – But dependent on proprietary index
Encryption
• Two contexts:
  – Archiving encrypted content
  – Archive encrypting content
• Reasons to encrypt:
  – Prevent unauthorized access
    • Especially in Cloud and on tape
  – To enforce DRM
  – Legal requirements (HIPAA, state law)
    • Though only required for transmission, not “at rest”
Encryption Concerns
•   Increased file size
•   Performance penalty
•   Additional expense
•   Makes files more sensitive to data corruption
•   May complicate format migration
•   May complicate legitimate access
•   Risk of loss of encryption keys
•   Difficulty of enterprise level key management
•   Obsolescence of encryption formats
•   Obsolescence of PKI infrastructure
Redundancy & Diversity

                       Andrea Goethals
        <andrea_goethals@harvard.edu>
      Manager of Digital Preservation and
                     Repository Services,
                      Harvard University
Failures WILL happen

• Real problem:
     failures you can’t recover from!
• A few mitigating concepts:
     redundancy & diversity
Redundancy (multiple duplicates)

• Ecology
  – Redundancy hypothesis = species
    redundancy enhances ecosystem resiliency
• Digital preservation
  – Example: Multiple copies of content
Diversity (variations)

• Finance
  – Portfolio effect = diversification of assets
    stabilizes financial portfolios
• Ecology
  – Response diversity = diversification stabilizes
    ecosystem processes
• Digital preservation
  – Examples: different storage media, storage
    locations with different geographic threats
What can “fail”? What can’t?

• Likely candidates
  – Storage component faults
     • Latent sector errors (physical problems)
     • Silent data corruption (higher-level, usually SW
       problems)
     • Whole disks
  – Organizational disruptions (changes in
    finances, priorities, staffing)
Data loss risks                      Redundancy & diversity controls
        (impact & likelihood?)                                 (costs?)

Environmental factors                         Replication to different data centers
    e.g. temperature, vibrations affecting
    multiple devices in same data center


Shared component faults                        Replication to different data centers or
    e.g. power connections, cooling,           redundant components, replication
    SCSI controllers, software bugs          software systems

Large-scale disasters                         Replication to different geographic areas
    e.g. earthquakes

Malicious attacks                             Distinct security zones
    e.g. worms
Human error                                   Different administrative control
    e.g. accidental deletions
Organizational faults                         Different organizational control
    e.g. budget cuts
BIT-LEVEL FIXITY
                              Karen Cariani
                 <karen_cariani@wgbh.org>
  Director WGBH Media Library and Archives,
              WGBH Educational Foundation
                          John Spencer
                 <jspencer@bmschase.com>
                 President, BMS/Chase LLC
Bit-Level Fixity

• Fixity is a “property” and a “process” (as
  defined from the 2008 PREMIS data
  dictionary)
• It is a “property”, where a message digest
  (usually referred to as a checksum) is
  created as a validation tool to ensure bit-
  level accuracy when migrating a digital file
  from one carrier to another
Bit-Level Fixity

• It is also a “process”, in that fixity must be
  integrated into every digital preservation
  workflow
• Fixity is common in digital repositories, as
  it is easily put in the ingest and refresh
  migration cycles
• Fixity of digital files is a cornerstone of
  archival best practices
So what’s the problem?

• While bit-level fixity solutions are readily
  available, there remains a large
  constituency of content creators that place
  minimal (or zero) value on this procedure
• Legacy IT environments, focused on
  business processes, are not “standards-
  driven”, more so by vendors, budgets, and
  poorly defined archival workflow strategies
So what’s the problem?

• A vast majority of commercial digital
  assets are stored “dark” (i.e. data tape or
  even worse, random HDDs), with no fixity
  strategy in place
• For private companies, individuals, and
  content creators with digital assets, bit-
  level fixity remains a mystery – a
  necessary outreach effort remains
So what’s the problem?

• Major labels, DIY artists, indie labels,
  amateur and semi-professional archivists,
  photographers, oral histories, and born-
  digital films usually ignore the concept of
  fixity
• All of the these constituencies need
  guidance to engage fixity into their daily
  workflow or suffer the consequences when
  the asset is needed NOW to monetize…
Overview:

Auditing & Repair


                          Micah Altman
       <Micah_Altman@alumni.brown.edu>
         Director of Research, MIT Libraries
Audit [aw-dit]:

   An independent evaluation of
   records and activities to
   assess a system of controls

Fixity mitigates risk only if used
          for auditing.
Functions of Storage Auditing

• Detect
  corruption/deletion of content

• Verify
  compliance with storage/replication policies

• Prompt
  repair actions
Bit-Level Audit Design Choices
• Audit regularity and coverage:
  on-demand (manually); on object access; on
  event; randomized sample;
  scheduled/comprehensive
• Fixity check & comparison algorithms
• Auditing scope:
  integrity of object; integrity of collection;
  integrity of network; policy compliance;
  public/transparent auditing
• Trust model
• Threat model
Repair

  Auditing mitigates risk only if
        used for repair.
Design Elements
•Repair latency:
  – Detection to start of repair
  – Repair duration
•Repair algorithm
LOCKSS Auditing & Repair
Decentralized, peer-2-peer, tamper-resistant
            replication & repair
DuraCloud 1.0* Auditing & Repair
   Storage replicated across cloud providers




        * Version 2.0 release notes and documentation
        indicate the addition of regular bit integrity check,
        and the ability to sync changes from a primary copy
        for premium users. Detailed features and pecific
        algorithms do not yet appear fully documented
iRODS Auditing & Repair
Rules-based federated storage grid
Auditing & Repair
TRAC-Aligned policy auditing as a overlay network
Summary:




                        Micah Altman
     <Micah_Altman@alumni.brown.edu>
       Director of Research, MIT Libraries
Methods for Mitigating Risk


      Physical:         Formats        File           File
       Media,                      Transforms:      Systems:
     Hardware,                    compression,    transforms,
    Environment                     encoding,    deduplication,
                                   encryption     redundancy




Diversification   Number             Fixity      Audit       Repair
   of copies      of copies
How can we choose?

• Clearly state decision problem

• Model connections between choices
  &outcomes

• Empirically calibrate and validate
The Problem
            Keeping risk of object loss fixed
            -- what choices minimize $?
                                                     Are we there
“Dual problem”                                           yet?
Keeping $ fixed, what choices minimize risk?

Extension

For specific cost functions for loss of object:

Loss(object_i), of all lost objects

What choices minimize:

Total cost= preservation cost+ sum(E(Loss))
                                                  cost
Modeling
Measurements
• Media – MBTF theoretical and actual
• File transformations:
   – compression ratio
   – partial recoverability
• Filesystem transformations:
   – Deduplication
   – Compression ratio
• Diversification
   – Single points of failure
   – Correlated failures
• Copies, Audit, Repair
   – Simulation models
   – Audit studies
Questions*
    What techniques are you using?

    What models guide the “knobs”?
      Contact the NDSA Infrastructure Working Group:

      www.digitalpreservation.gov/ndsa/working_groups/

* Thanks to our moderator:

         Trevor Owens <trow@loc.gov>,Digital Archivist, Library of Congress

More Related Content

What's hot

Shared Oracle Hosting (Linux)
Shared Oracle Hosting (Linux)Shared Oracle Hosting (Linux)
Shared Oracle Hosting (Linux)
webhostingguy
 
Security Engineering 1 (CS 5032 2012)
Security Engineering 1 (CS 5032 2012)Security Engineering 1 (CS 5032 2012)
Security Engineering 1 (CS 5032 2012)
Ian Sommerville
 

What's hot (9)

Digitization Projects for Small Archives and Museums
Digitization Projects for Small Archives and MuseumsDigitization Projects for Small Archives and Museums
Digitization Projects for Small Archives and Museums
 
Information Management
Information ManagementInformation Management
Information Management
 
Pitfalls of software product development
Pitfalls of software product developmentPitfalls of software product development
Pitfalls of software product development
 
Symantec Appliances Strategy Launch
Symantec Appliances Strategy LaunchSymantec Appliances Strategy Launch
Symantec Appliances Strategy Launch
 
Control the Chaos with Novell File Reporter
Control the Chaos with Novell File ReporterControl the Chaos with Novell File Reporter
Control the Chaos with Novell File Reporter
 
Novell File Management Suite for Microsoft Active Directory Environments
Novell File Management Suite for Microsoft Active Directory EnvironmentsNovell File Management Suite for Microsoft Active Directory Environments
Novell File Management Suite for Microsoft Active Directory Environments
 
Agentless backup is not a myth
Agentless backup is not a mythAgentless backup is not a myth
Agentless backup is not a myth
 
Shared Oracle Hosting (Linux)
Shared Oracle Hosting (Linux)Shared Oracle Hosting (Linux)
Shared Oracle Hosting (Linux)
 
Security Engineering 1 (CS 5032 2012)
Security Engineering 1 (CS 5032 2012)Security Engineering 1 (CS 5032 2012)
Security Engineering 1 (CS 5032 2012)
 

Similar to Bit Level Preservation

fiwalk With Me: Building Emergent Pre-Ingest Workflows for Digital Archival R...
fiwalk With Me: Building Emergent Pre-Ingest Workflows for Digital Archival R...fiwalk With Me: Building Emergent Pre-Ingest Workflows for Digital Archival R...
fiwalk With Me: Building Emergent Pre-Ingest Workflows for Digital Archival R...
Mark Matienzo
 
Securing your esi_piedmont
Securing your esi_piedmontSecuring your esi_piedmont
Securing your esi_piedmont
scm24
 

Similar to Bit Level Preservation (20)

Digital Destiny
Digital DestinyDigital Destiny
Digital Destiny
 
fiwalk With Me: Building Emergent Pre-Ingest Workflows for Digital Archival R...
fiwalk With Me: Building Emergent Pre-Ingest Workflows for Digital Archival R...fiwalk With Me: Building Emergent Pre-Ingest Workflows for Digital Archival R...
fiwalk With Me: Building Emergent Pre-Ingest Workflows for Digital Archival R...
 
Securing your esi_piedmont
Securing your esi_piedmontSecuring your esi_piedmont
Securing your esi_piedmont
 
Virtual Data : Eliminating the data constraint in Application Development
Virtual Data :  Eliminating the data constraint in Application DevelopmentVirtual Data :  Eliminating the data constraint in Application Development
Virtual Data : Eliminating the data constraint in Application Development
 
Allison Stanfield
Allison StanfieldAllison Stanfield
Allison Stanfield
 
Auditing Distributed Preservation Networks
Auditing Distributed Preservation Networks Auditing Distributed Preservation Networks
Auditing Distributed Preservation Networks
 
Pragmatics Driven Issues in Data and Process Integrity in Enterprises
Pragmatics Driven Issues in Data and Process Integrity in EnterprisesPragmatics Driven Issues in Data and Process Integrity in Enterprises
Pragmatics Driven Issues in Data and Process Integrity in Enterprises
 
DBTA Data Summit : Eliminating the data constraint in Application Development
DBTA Data Summit : Eliminating the data constraint in Application DevelopmentDBTA Data Summit : Eliminating the data constraint in Application Development
DBTA Data Summit : Eliminating the data constraint in Application Development
 
Declare Victory with Big Data
Declare Victory with Big DataDeclare Victory with Big Data
Declare Victory with Big Data
 
Security Awareness Training
Security Awareness TrainingSecurity Awareness Training
Security Awareness Training
 
Scalar unstructured data april 28, 2010
Scalar unstructured data april 28, 2010Scalar unstructured data april 28, 2010
Scalar unstructured data april 28, 2010
 
P1141211139
P1141211139P1141211139
P1141211139
 
DSS.LV - Principles Of Data Protection - March2015 By Arturs Filatovs
DSS.LV - Principles Of Data Protection - March2015 By Arturs FilatovsDSS.LV - Principles Of Data Protection - March2015 By Arturs Filatovs
DSS.LV - Principles Of Data Protection - March2015 By Arturs Filatovs
 
Resiliency in Distributed Systems
Resiliency in Distributed SystemsResiliency in Distributed Systems
Resiliency in Distributed Systems
 
Webinar: How Snapshots CAN be Backups
Webinar: How Snapshots CAN be BackupsWebinar: How Snapshots CAN be Backups
Webinar: How Snapshots CAN be Backups
 
Dp%20 fudamentals%20%28ch1%29
Dp%20 fudamentals%20%28ch1%29Dp%20 fudamentals%20%28ch1%29
Dp%20 fudamentals%20%28ch1%29
 
Využijte svou Oracle databázi na maximum!
Využijte svou Oracle databázi na maximum!Využijte svou Oracle databázi na maximum!
Využijte svou Oracle databázi na maximum!
 
How to evaluate data protection technologies - Mastercard conference
How to evaluate data protection technologies -  Mastercard conferenceHow to evaluate data protection technologies -  Mastercard conference
How to evaluate data protection technologies - Mastercard conference
 
Introduction to Digital Preservation
Introduction to Digital PreservationIntroduction to Digital Preservation
Introduction to Digital Preservation
 
PROACT SYNC 2013 - Breakout - CommVault IntelliSnap Recovery Manager de inzet...
PROACT SYNC 2013 - Breakout - CommVault IntelliSnap Recovery Manager de inzet...PROACT SYNC 2013 - Breakout - CommVault IntelliSnap Recovery Manager de inzet...
PROACT SYNC 2013 - Breakout - CommVault IntelliSnap Recovery Manager de inzet...
 

More from Micah Altman

SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
Micah Altman
 
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsCreative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Micah Altman
 
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
Micah Altman
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Micah Altman
 

More from Micah Altman (20)

Selecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategiesSelecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategies
 
Well-Being - A Sunset Conversation
Well-Being - A Sunset ConversationWell-Being - A Sunset Conversation
Well-Being - A Sunset Conversation
 
Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...
 
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
 
Well-being A Sunset Conversation
Well-being A Sunset ConversationWell-being A Sunset Conversation
Well-being A Sunset Conversation
 
Can We Fix Peer Review
Can We Fix Peer ReviewCan We Fix Peer Review
Can We Fix Peer Review
 
Academy Owned Peer Review
Academy Owned Peer ReviewAcademy Owned Peer Review
Academy Owned Peer Review
 
Redistricting in the US -- An Overview
Redistricting in the US -- An OverviewRedistricting in the US -- An Overview
Redistricting in the US -- An Overview
 
A Future for Electoral Districting
A Future for Electoral DistrictingA Future for Electoral Districting
A Future for Electoral Districting
 
A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk  A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
 
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
 
Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:
 
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsCreative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
 
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
 
Ndsa 2016 opening plenary
Ndsa 2016 opening plenaryNdsa 2016 opening plenary
Ndsa 2016 opening plenary
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...
 
Software Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanSoftware Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental Scan
 
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information Science
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Bit Level Preservation

  • 1. Prepared for Digital Preservation 2012 Washington, DC July 2012 Assessing and Mitigating Bit- Level Preservation Risks NDSA Infrastructure Working Group
  • 2. INTRODUCTION: A Framework for Addressing Bit-Level Preservation Risk Micah Altman <Micah_Altman@alumni.brown.edu> Director of Research, MIT Libraries Mark Evans <mark.evans@tessella.com> Digital Archiving Practice Manager, Tessela Inc.
  • 3. Threats to Bits Physical & Hardware Software Insider & External Attacks Organizational Failure Media Curatorial Error
  • 4. Do you know where your data are?
  • 5. Encoding Priscilla Caplan <pcaplan@ufl.edu> Assistant Director for Digital Library Services, Florida Virtual Campus
  • 6. Compression • Many types of compression: – Format based file compression, e.g. JPEG2000 – Tape hardware compression at the drive – NAS compression via appliance or storage device – Data deduplication • Is it lossless? • Is it transparent? • Is it proprietary? • What is effect on error recovery?
  • 7. Compression Tradeoffs • Tradeoffs – Space savings allows more copies at same cost – But makes files more sensitive to data corruption • Erasure coding in cloud storage – Massively more reliable – But dependent on proprietary index
  • 8. Encryption • Two contexts: – Archiving encrypted content – Archive encrypting content • Reasons to encrypt: – Prevent unauthorized access • Especially in Cloud and on tape – To enforce DRM – Legal requirements (HIPAA, state law) • Though only required for transmission, not “at rest”
  • 9. Encryption Concerns • Increased file size • Performance penalty • Additional expense • Makes files more sensitive to data corruption • May complicate format migration • May complicate legitimate access • Risk of loss of encryption keys • Difficulty of enterprise level key management • Obsolescence of encryption formats • Obsolescence of PKI infrastructure
  • 10. Redundancy & Diversity Andrea Goethals <andrea_goethals@harvard.edu> Manager of Digital Preservation and Repository Services, Harvard University
  • 11. Failures WILL happen • Real problem: failures you can’t recover from! • A few mitigating concepts: redundancy & diversity
  • 12. Redundancy (multiple duplicates) • Ecology – Redundancy hypothesis = species redundancy enhances ecosystem resiliency • Digital preservation – Example: Multiple copies of content
  • 13. Diversity (variations) • Finance – Portfolio effect = diversification of assets stabilizes financial portfolios • Ecology – Response diversity = diversification stabilizes ecosystem processes • Digital preservation – Examples: different storage media, storage locations with different geographic threats
  • 14. What can “fail”? What can’t? • Likely candidates – Storage component faults • Latent sector errors (physical problems) • Silent data corruption (higher-level, usually SW problems) • Whole disks – Organizational disruptions (changes in finances, priorities, staffing)
  • 15. Data loss risks Redundancy & diversity controls (impact & likelihood?) (costs?) Environmental factors Replication to different data centers e.g. temperature, vibrations affecting multiple devices in same data center Shared component faults Replication to different data centers or e.g. power connections, cooling, redundant components, replication SCSI controllers, software bugs software systems Large-scale disasters Replication to different geographic areas e.g. earthquakes Malicious attacks Distinct security zones e.g. worms Human error Different administrative control e.g. accidental deletions Organizational faults Different organizational control e.g. budget cuts
  • 16. BIT-LEVEL FIXITY Karen Cariani <karen_cariani@wgbh.org> Director WGBH Media Library and Archives, WGBH Educational Foundation John Spencer <jspencer@bmschase.com> President, BMS/Chase LLC
  • 17. Bit-Level Fixity • Fixity is a “property” and a “process” (as defined from the 2008 PREMIS data dictionary) • It is a “property”, where a message digest (usually referred to as a checksum) is created as a validation tool to ensure bit- level accuracy when migrating a digital file from one carrier to another
  • 18. Bit-Level Fixity • It is also a “process”, in that fixity must be integrated into every digital preservation workflow • Fixity is common in digital repositories, as it is easily put in the ingest and refresh migration cycles • Fixity of digital files is a cornerstone of archival best practices
  • 19. So what’s the problem? • While bit-level fixity solutions are readily available, there remains a large constituency of content creators that place minimal (or zero) value on this procedure • Legacy IT environments, focused on business processes, are not “standards- driven”, more so by vendors, budgets, and poorly defined archival workflow strategies
  • 20. So what’s the problem? • A vast majority of commercial digital assets are stored “dark” (i.e. data tape or even worse, random HDDs), with no fixity strategy in place • For private companies, individuals, and content creators with digital assets, bit- level fixity remains a mystery – a necessary outreach effort remains
  • 21. So what’s the problem? • Major labels, DIY artists, indie labels, amateur and semi-professional archivists, photographers, oral histories, and born- digital films usually ignore the concept of fixity • All of the these constituencies need guidance to engage fixity into their daily workflow or suffer the consequences when the asset is needed NOW to monetize…
  • 22. Overview: Auditing & Repair Micah Altman <Micah_Altman@alumni.brown.edu> Director of Research, MIT Libraries
  • 23. Audit [aw-dit]: An independent evaluation of records and activities to assess a system of controls Fixity mitigates risk only if used for auditing.
  • 24. Functions of Storage Auditing • Detect corruption/deletion of content • Verify compliance with storage/replication policies • Prompt repair actions
  • 25. Bit-Level Audit Design Choices • Audit regularity and coverage: on-demand (manually); on object access; on event; randomized sample; scheduled/comprehensive • Fixity check & comparison algorithms • Auditing scope: integrity of object; integrity of collection; integrity of network; policy compliance; public/transparent auditing • Trust model • Threat model
  • 26. Repair Auditing mitigates risk only if used for repair. Design Elements •Repair latency: – Detection to start of repair – Repair duration •Repair algorithm
  • 27. LOCKSS Auditing & Repair Decentralized, peer-2-peer, tamper-resistant replication & repair
  • 28. DuraCloud 1.0* Auditing & Repair Storage replicated across cloud providers * Version 2.0 release notes and documentation indicate the addition of regular bit integrity check, and the ability to sync changes from a primary copy for premium users. Detailed features and pecific algorithms do not yet appear fully documented
  • 29. iRODS Auditing & Repair Rules-based federated storage grid
  • 30. Auditing & Repair TRAC-Aligned policy auditing as a overlay network
  • 31. Summary: Micah Altman <Micah_Altman@alumni.brown.edu> Director of Research, MIT Libraries
  • 32. Methods for Mitigating Risk Physical: Formats File File Media, Transforms: Systems: Hardware, compression, transforms, Environment encoding, deduplication, encryption redundancy Diversification Number Fixity Audit Repair of copies of copies
  • 33. How can we choose? • Clearly state decision problem • Model connections between choices &outcomes • Empirically calibrate and validate
  • 34. The Problem Keeping risk of object loss fixed -- what choices minimize $? Are we there “Dual problem” yet? Keeping $ fixed, what choices minimize risk? Extension For specific cost functions for loss of object: Loss(object_i), of all lost objects What choices minimize: Total cost= preservation cost+ sum(E(Loss)) cost
  • 36. Measurements • Media – MBTF theoretical and actual • File transformations: – compression ratio – partial recoverability • Filesystem transformations: – Deduplication – Compression ratio • Diversification – Single points of failure – Correlated failures • Copies, Audit, Repair – Simulation models – Audit studies
  • 37. Questions* What techniques are you using? What models guide the “knobs”? Contact the NDSA Infrastructure Working Group: www.digitalpreservation.gov/ndsa/working_groups/ * Thanks to our moderator: Trevor Owens <trow@loc.gov>,Digital Archivist, Library of Congress

Editor's Notes

  1. Text is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA. Note, images on pages 3, 34 , 37 are under separate licenses: Curatorial error image © AtomSmasher, Distributed under CC-BY-SA Physical infrastructure failure image, produced by NSF, included under CC-BY-SA Media Failure, knob, and graph images obtained from Wikimedia Commons, distributed under CC-BY-SA Insider attack image obtained from http://unix.oppserver.net/wlan/faq/Vollst%C3%A4ndige%20Anleitung%20f%C3%BCr%20den%20Einbruch%20in%20WLAN%20Netze%20at%20Kopfnote.htm under CC-BY-NC
  2. Unknown: organizational disruptions are likely, but how risky are they?
  3. Areas for research: What is the impact and likelihood of each risk? What is the cost of each control?
  4. This is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
  5. Poll the Audience: To get a sense of where everyone’s practices could we get a show of hands for the following questions: How many of your organizations maintain two or more additional copies of your content? … One additional copy? How many of your organizations are tracking fixity information for content you intend to preserve? - Now, how many of you have audit procedures in place for that content? Lastly, how many of you have procedures to repair digital content when an audit discovers fixity discontinuities? - For those of you with organizations not engaging in these practices why is that the case? Panel then Audience questions: What have been the biggest challenges related to the topics you all presented for each of your organizations? [Open the question to the audience] - What kinds of rules and models should we be putting in place for fiddling with the Knobs in Micah’s slides? [Open the question to the audience] What do we need to be able to make it easier to balance these tradeoffs? [Open the question to the audience]