Auditing Distributed Preservation Networks

1,819 views

Published on

This presentation, delivered at CNI 2012, summarizes the lessons learned from trial audits of a several production distributed digital preservation networks. These audits were conducted using the open source SafeArchive system, which enables automated auditing of a selection of TRAC criteria related to replication and storage. An analysis of the trial audits demonstrates both the complexities of auditing modern replicated storage networks, and reveals common gaps between archival policy and practice. Recommendations for closing these gaps are discussed, as are extensions that have been added to the SafeArchive system to mitigate risks in distributed digital preservation (DDP).

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,819
On SlideShare
0
From Embeds
0
Number of Embeds
745
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • This work by Micah Altman (http://micahaltman.com) , with the exception of images explicitly accompanied by a separate “source” reference, is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
  • Auditing Distributed Preservation Networks

    1. 1. Prepared for CNI Fall Meeting 2012 Washington, D.C. December 2012Auditing Distributed Digital Preservation Networks Micah Altman, Director of Research, MIT Libraries Non Resident Senior Fellow, The Brookings Institution Jonathan Crabtree, Assistant Director of Computing and Archival Research HW Odum Institute for Research in Social Science, UNC
    2. 2. Collaborators*• Nancy McGovern• Tom Lipkis & the LOCKSS Team• Data-PASS Partners – ICPSR – Roper Center – NARA – Henry A. Murray Archive• Dataverse Network Team @ IQSSResearch Support Thanks to the Library of Congress, the National Science Foundation, IMLS, the Sloan Foundation, the Harvard University Library, the Institute for Quantitative Social Science, and the Massachusetts Institute of Technology. * And co-conspirators Auditing Distributed Digital Preservation 2 Networks
    3. 3. Related WorkReprints available from: micahaltman.com• M. Altman, J. Crabtree, “Using the SafeArchive System: TRAC- Based Auditing of LOCKSS”, Proceedings of Archiving 2011, Society for Imaging Science and Technology.• Altman, M., Beecher, B., & Crabtree, J. (2009). A Prototype Platform for Policy-Based Archival Replication. Against the Grain, 21(2), 44- 47. Auditing Distributed Digital Preservation 3 Networks
    4. 4. Preview• Why? … distributed digital preservation? … audit?• SafeArchive: Automating Auditing• Theory vs. Practice – Round 0: Calibration – Round 1: Self-Audit – Round 2: Self-Compliance (almost) – Round 3: Auditing Other Networks• Lessons learned: practice & theory Auditing Distributed Digital Preservation Networks 4
    5. 5. Why distributed digital preservation? Auditing Distributed Digital Preservation Networks 5
    6. 6. Slightly Long Answer: Things Go WrongPhysical & Hardware Software Insider & External Attacks Organizational Failure Media Auditing Distributed Digital Preservation Networks Curatorial Error 6
    7. 7. Potential Nexuses for Preservation Failure• Technical – Media failure: storage conditions, media characteristics – Format obsolescence – Preservation infrastructure software failure – Storage infrastructure software failure – Storage infrastructure hardware failure• External Threats to Institutions – Third party attacks – Institutional funding – Change in legal regimes• Quis custodiet ipsos custodes? – Unintentional curatorial modification – Loss of institutional knowledge & skills – Intentional curatorial de-accessioning – Change in institutional missionSource: Reich & Rosenthal 2005 Auditing Distributed Digital Preservation 7 Networks
    8. 8. The Problem “Preservation was once an obscure backroom operation of interest chiefly to conservators and archivists: it is now widely recognized as one of the most important elements of a functional and enduring cyberinfrastructure.” – [Unsworth et al., 2006] “• Libraries, archives and museums hold digital assets they wish to preserve, many unique• Many of these assets are not replicated at all• Even when institutions keep multiple backups offsite, many single points of failure remain, Auditing Distributed Digital Preservation Networks 8
    9. 9. Why audit?Auditing Distributed Digital Preservation Networks 9
    10. 10. Short Answer: Why the heck not? “Don‟t believe in anything you hear, and only half of what you see” - Lou Reed “Trust, but verify.” - Ronald Reagan Auditing Distributed Digital Preservation Networks 10
    11. 11. Full Answer:It’s our responsibility Auditing Distributed Digital Preservation Networks 11
    12. 12. OAIS Model Responsibilities• Accept appropriate information from Information Producers.• Obtain sufficient control of the information to ensure long term preservation.• Determine which groups should become the Designated Community able to understand the information.• Ensure that the preserved information is independently understandable to the DC• Ensure that the information can be preserved against all reasonable contingencies,• Ensure that the information can be disseminated as authenticated copies of the original or as traceable back to the original• Makes the preserved data available to the DC Auditing Distributed Digital Preservation Networks 12
    13. 13. OAIS Basic Implied Trust Model• Organization is axiomatically trusted to identify designated communities• Organization is engineered with the goal of: – Collecting appropriate authentic document – Reliably deliver authentic documents, in understandable form, at a future time• Success depends upon: – Reliability of storage systems & services: e.g., LOCKSS network, Amazon Glacier – Reliability of organizations: MetaArchive, DataPASS, Digital Preservation Network – Document contents and properties: Formats, Metadata, Semantics, Provenance, Authenticity Auditing Distributed Digital Preservation Networks 13
    14. 14. Enhancing Reliability through Trust Engineering• Incentives: • Social engineering – Rewards, penalties – Recognized practices; shared norms – Incentive-compatible mechanisms – Social evidence• Modeling and analysis: – Reduce provocations – Statistical quality control & reliability – Remove excuses estimation, threat-modeling and • Regulatory approaches vulnerability assessment – Disclosure; Review; Certification; Audits• Portfolio Theory: – Regulations & penalties – Diversification (financial, legal, technical, • Security engineering institutional … ) – Increase effort for attacker: harden target – Hedging (reduce vulnerability); increase• Over-engineering approaches: technical/procedural controls; , – Safety margin, redundancy remove/conceal targets• Informational approaches: – Increase risk to attacker: surveillance, – Transparency (release of information detection, likelihood of response permitting direct evaluation of – Reduce reward: deny benefits, disrupt compliance); common knowledge, markets, identify property – Crypto: signatures, fingerprints, non- repudiation Auditing Distributed Digital Preservation Networks 14
    15. 15. Audit [aw-dit]: An independent evaluation of records and activities to assess a system of controlsFixity mitigates risk only if used for auditing. Auditing Distributed Digital Preservation Networks 15
    16. 16. Functions of Storage Auditing• Detect corruption/deletion of content• Verify compliance with storage/replication policies• Prompt repair actions Auditing Distributed Digital Preservation Networks 16
    17. 17. Bit-Level Audit Design Choices• Audit regularity and coverage: on-demand (manually); on object access; on event; randomized sample; scheduled/comprehensive• Fixity check & comparison algorithms• Auditing scope: integrity of object; integrity of collection; integrity of network; policy compliance; public/transparent auditing• Trust model• Threat model Auditing Distributed Digital Preservation Networks 17
    18. 18. Repair Auditing mitigates risk only if used for repair.Key Design Elements• Repair granularity• Repair trust model• Repair latency: – Detection to start of repair – Repair duration• Repair algorithm Auditing Distributed Digital Preservation Networks 18
    19. 19. Summary of Current Automated Preservation Auditing StrategiesLOCKSS Automated; decentralized (peer-2-peer); tamper-resistant auditing & repair; for collection integrity.iRODS Automated centralized/federated auditing for collection integrity; micro-policies.DuraCloud Automated; centralized auditing; for file integrity. (Manual repair by DuraSpace staff available as commercial service if using multiple cloud providers.)Digital Preservation In development…Mechanism Automated; independent; multi-centered; auditing, repair and provisioning; of existing LOCKSS storage networks; for collection integrity, for high-level policy (e.g. TRAC) compliance. Auditing Distributed Digital Preservation Networks 19
    20. 20. LOCKSS Auditing & Repair Decentralized, peer-2-peer, tamper-resistant replication & repairRegularity ScheduledAlgorithms Bespoke, peer-reviewed, tamper resistantScope - Collection integrity - Collection repairTrust model - Publisher is canonical source of content - Changed contented treated as new - Replication peers are untrustedMain threat models - Media failure - Physical Failure - Curatorial Error - External Attack - Insider threats - Organizational failureKey auditing limitations - Correlated Software Failure - Lack of Policy Auditing, public/transparent auditing Auditing Distributed Digital Preservation Networks 20
    21. 21. Auditing & RepairTRAC-Aligned policy auditing as a overlay networkRegularity Scheduled; ManualFixity algorithms Relies on underlying replication systemScope - Collection integrity - Network integrity - Network repair - High-level (e.g. trac) policy auditingTrust model - External auditor, with permissions to collect meta- data/log information from replication network - Replication network is untrustedMain threat models - Software failure - Policy implementation failure (curatorial error; insider threat) - Organizational failure - Media/physical failure through underlying replication systemKey auditing limitations Relies on underlying replication system, (now) LOCKSS, for fixity check and repair Auditing Distributed Digital Preservation Networks 21
    22. 22. SafeArchive: TRAC-Based Auditing & Management of Distributed Digital PreservationFacilitating collaborative replication and preservation with technology…• Collaborators declare explicit non-uniform resource commitments• Policy records commitments, storage network properties• Storage layer provides replication, integrity, freshness, versioning• SafeArchive software provides monitoring, auditing, and provisioning• Content is harvested through HTTP (LOCKSS) or OAI-PMH• Integration of LOCKSS, The 22 Dataverse Network,Auditing Distributed Digital Preservation Networks TRAC
    23. 23. SafeArchive: Schematizing Policy and Behavior “The repository system must be able to identify thePolicy number of copies of all stored digital objects, and the location of each object and their copies.”SchematizationBehavior(Operationalization) Auditing Distributed Digital Preservation Networks 23
    24. 24. Adding High-Level Policy to LOCKSS• LOCKSS Lots of Copies Keep Stuff Safe – Widely used in library community – Self-contained OSS replication system, low maintenance, inexpensive – Harvests resources via web-crawling, OAI-PMH, database queries,… – Maintains copies through secure p2p protocol – Zero trust & self repairing• What does SafeArchive Add? – Auditing – easily monitor number of copies of content in network – Provisioning – ensure sufficient copies and distribution – Collaboration – coordinate across partners, monitor resource commitments – Provide restoration guarantees – Integrate with Dataverse Network digital repository Auditing Distributed Digital Preservation Networks 24
    25. 25. Design RequirementsSafeArchive is a targeted vertical slice of functionality through the policy stack…  Policy Driven status of participating systems  Institutional policy creates formal  At least one system to initiate new replication commitments harvesting on participating system  Documents and supports TRAC  No deletion/modification of /ISO policies objects stored on another system  Allows Asymmetric  Schema based auditing used to Commitments  … verify collection replication  … record storage commitments  … storage commitments  … document all TRAC criteria  … size of holdings being replicated  … demonstrate policy compliance  … distribution of holdings over time  Provide restoration guarantees  to owning archive  to replication hosts  Limited trust  No superuser  Partners trusted to hold the unencrypted content of other (reinforced with legal agreements)  At least one system trusted to read Auditing Distributed Digital Preservation Networks 25
    26. 26. SafeArchive Components Auditing Distributed Digital Preservation Networks 26
    27. 27. SafeArchive in Action safearchive.org Auditing Distributed Digital Preservation Networks 27
    28. 28. Theory vs. PracticeRound 0: Setting up the Data-PASS PLN “Looks ok to me” - PHB Motto Auditing Distributed Digital Preservation Networks 28
    29. 29. THEORY StartExpose Content ( Through Install LOCKSS OAI+DDI+HTTP ) (On 7 servers) Harvest Content (through OAI plugin) Setup PLN configurations (through OAI plugin) LOCKSS Magic Done 29 Auditing Distributed Digital Preservation Networks
    30. 30. Application: Data-PASS Partnership• Data-PASS partners collaborate to – identify and promote good archival practices, – seek out at-risk research data, – build preservation infrastructure, – and mutually safeguard collections.• Data-PASS collections – 5 Collections – Updated ~daily – Research data as content – 25000+ Studies – 600000+ Files – <10TB – Goal: >=3 verified replicas per collection, >= 2 regions Auditing Distributed Digital Preservation Networks 30
    31. 31. Practice (Round 0)• OAI Plugin extensions required for: – Non-DC metadata – Large metadata Expose Content ( Install LOCKSS Through – Alternate authentication method OAI+DDI+HTTP ) (On 7 servers) – Support for OAI-SETS – Non-fatal error handling Harvest Content• OAI Provider (Dataverse) tuning: (through OAI plugin) – Performance handling for delivery – Performance handling for errors Setup PLN configurations• PLN Configuration required: (through OAI plugin) – Stabilization around LOCKSS versions LOCKSS – Coordination around plugin repository Magic – Coordination around collection definition• Dataverse Network Extensions – Generate LOCKSS manifest pages (Theory) – License harmonization – LOCKSS export control by archive curator Auditing Distributed Digital Preservation Networks 31
    32. 32. Results (Round 0)• Remaining issues – None known• Outcomes – LOCKSS OAI plugin extensions (later integrated into LOCKSS core) – Dataverse Network performance tuning – Dataverse Network extensions Auditing Distributed Digital Preservation Networks 32
    33. 33. Lesson 0• When innovating plan for… – substantial gap between prototype and production – multiple iterations Auditing Distributed Digital Preservation Networks 33
    34. 34. Theory vs. Practice Round 1: Self-Audit“A mere matter of implementation” - PHB Motto Auditing Distributed Digital Preservation Networks 34
    35. 35. THEORY Log Error for Later Investigation (Round 1) LOCKSS Cache Manager Gather InformationStart from Add Replica Each Replica Integrate Information -> Map Network State State NO Compare Current == Network to Policy Policy ? YES Success Auditing Distributed Digital Preservation Networks 35
    36. 36. Implementation www.safearchive.org
    37. 37. Practice (Round 1)• Gathering information required – Replacing the LOCKSS cache manager – Permissions – Reverse-engineering UI’s (with help) Gather Information – Network magic from Add Replica Each Replica• Integrating information required – Heuristics for lagged information Integrate Information -> – Heuristics for incomplete Map Network State information State Compare == N – Heuristics for aggregated Current State Polic O Map to Policy y information ?• Comparing map to policy required YES Mere matter of implementation  Success (Theory) Auditing Distributed Digital Preservation Networks 37
    38. 38. Results (Round 1)• Outcomes – Implementation of SafeArchive reporting engine – Stand alone OSS replacement for LOCKSS cache manager – Initial audit of Data-PASS replicated collections• Problems – Collections achieving policy compliance were actually incomplete • Dude, where’s our metadata? – Uh-oh, most collections failed policy compliance  • Adding replicas didn’t solve it Auditing Distributed Digital Preservation Networks 38
    39. 39. Lesson 1:Replication agreement does not prove collection integrityWhat you see Replicas X,Y,Z agree on collection AWhat you are tempted to conclude: Replicas X,Y,Z agree Collection on collection A A is good Auditing Distributed Digital Preservation Networks 39
    40. 40. What can you infer from replication agreement? Replicas X,Y,Z agree Collection on collection A Assumptions: A is good • Harvesting did not report errors AND • Harvesting system is error free OR • Errors are independent per object AND • Large number of objects in collection Supporting External Evidence Multiple Systematic Collection Independent Automated Comparison Automated Restore & Harvester Systematic with External Harvester Log Comparison Implementations Harvester Testing Collection Monitoring Testing per Collection Statistics Auditing Distributed Digital Preservation Networks 40
    41. 41. Lesson 2: Replication disagreement does not prove corruption What you see Replicas X,Y disagree with Z on collection A What you are tempted to conclude: Repair/Repl CollectionReplicas X,Y disagree ace A on hostwith Z on collection A Collection A Z is bad on host Z Auditing Distributed Digital Preservation Networks 41
    42. 42. What can you infer from replication failure?Replicas X,Y disagree Collectionwith Z on collection Assumptions: A on host A Z is bad • Disagreement implies that content of collection A is different on all hosts • Contents of collection A should be identical on all hosts • If some content of collection A is bad, entire collection is bad Possible alternate scenarios Audit Objects in information Collections grow collections are cannot be ??? ??? rapidly frequently collected from updated some host Auditing Distributed Digital Preservation Networks 42
    43. 43. Theory vs. PracticeRound 2: Compliance (almost) “How do you spell „backup‟? R–E-C–O–V–E–R-Y - Auditing Distributed Digital Preservation Networks 43
    44. 44. Lesson 3: Distributed digital preservation works …with evidence-based tuning and adjustment• Diagnostics – When network is out of adjustment additional information is needed to inform adjustment – Worked with LOCKSS team to gather information• Adjustments – Timings (e.g. crawls, polls) • Understand • Tune • Parameterize heuristics, reporting • Track trends over time – Collections • Change partitioning to AU’s at source • Extend mapping to AU’s in plugin • Extend reporting/policy framework to group AU’s• Outcomes – At time: Verified replications of all collections – Currently: Minor policy violations in one collection – Worked with LOCKSS team to design further instrumentation of LOCKSS Auditing Distributed Digital Preservation Networks 44
    45. 45. Theory vs. PracticeRound 3: Auditing Other PLNs “In theory, theory and practice are the same – in practice, they differ.” - Auditing Distributed Digital Preservation Networks 45
    46. 46. Application: Coppul• Council of Prairie and Pacific University Libraries• Collections – 9 Institutions – Dozens of collections • Journal runs • Digitized member content: text, photos, images, ETDS• Goal – ‘Multiple’ verified replicas Auditing Distributed Digital Preservation Networks 46
    47. 47. Application: Digital Federal Depository Library Program• The Digital Federal Depository Library Program, or the “USDocs” private LOCKSS network replicates key aspects of the United States Federal Depository System.• Collections – Dozens of institutions (24 replicating) – Electronic publications – 580+ collections – <10TB• Goal – “Many” replicas, “many” regions Auditing Distributed Digital Preservation Networks 47
    48. 48. Application: MetaArchive• “A secure and cost-effective repository that provides for the long-term care of digital materials – not by outsourcing to other organizations, but by actively participating in the preservation of our own content.”• 50+ institutions, 22+ members• >10TB, including audio and video content• Testing only, full auditing not yet performed… Auditing Distributed Digital Preservation Networks 48
    49. 49. THEORY (Round 3) Gather Information AddStart from Replica Each Replica NO YES Collection Integrate Adjust Sizes, Information -> Polling Map Network State Intervals adjusted? State NO Compare Current == Network to Policy Policy ? YES Success Auditing Distributed Digital Preservation Networks 49
    50. 50. Here’s where things get even more complicated… Auditing Distributed Digital Preservation Networks 50
    51. 51. Practice (Year 3)Lesson 6: Trust, but continuously verify• 20-80 % initial failure to confirm policy compliance Gather• Tuning infeasible, or yielded only Information from Add Replica moderate improvement Each Replica NO YES Integrate Adjust AU Sizes, Information -> PollingOutcomes Map Network Intervals• In-depth diagnostic and analysis with State adjusted? State LOCKSS team Compare Current == NO• Adjustment of auditing algorithms: Network to Policy Policy ? YES detect “islands of agreement”• Adjust expectations – Focus on inferences rather than replication Success agreement – Focus on 100% policy compliance per collection rather than 100% error-free• Design file-level diagnostic instrumentation in LOCKSSRe-analysis in progress… Auditing Distributed Digital Preservation Networks 51
    52. 52. What can you infer from replication failure?Replicas X,Y disagree Collectionwith Z on collection Assumptions: A on host A Z is bad • Disagreement implies that content of collection A is different on all hosts • Contents of collection A should be identical on all hosts • If some content of collection A is bad, entire collection is bad Possible alternate scenarios Audit ??? ??? Objects in information Collections grow collections are cannot be rapidly frequently collected from updated some host Auditing Distributed Digital Preservation Networks 52
    53. 53. What else could be wrong?Round 1 hypothesisDisagreement is real, but doesn’t matter in long run1.1 Temporary differences. Collections temporarily out or sync (either missing objects or different object versions) – will resolve over time(E.g. if harvest frequency << source update frequency, but harvest times across boxes vary significantly)1.2 Permanent point-in-time collection differences that are artefact of synchronization.(E.g. if one replica always has version n-1, at time of poll)Hypothesis 2: Disagreement is real, but nonsubstantive.2.1.Non-Substantive collection differences (arising from dynamic elements in collection that have no bearingon the substantive content )2.1.1 Individual URLS/files that are dynamic and non substantive (e.g., logo images, plugins, Twitter feeds,etc.) cause content changes (this is common in the GLN). 2.2.2 dynamic content embedded in substantive content (e.g. a customized per-client header pageembedded in the pdf for a journal article)2.2. Audit summary over-simplifies  loses information 2.2.1 Technical failure of poll can occur when still sub-quora “islands” of agreement, sufficient for policyHypothesis 3: Disagreement is real, matters Substantive collection differences3.1 Some objects are corrupt (e.g. from corruption in storage, or during transmission/harvesting)3.2 Substantive objects persistently missing from some replicas( e.g. because of permissions issue @ provider; technical failures during harvest; plugin problems)3.3 Versions of objects permanently missing (Note that later “agreement” may signify that a later version was verified)
    54. 54. What diagnostic evidence would be useful for audit-related inference? • Longitudinal fixity and modification time collection – E.g. detect if disagreement is persistent for a specific collection – Transient collection problems suggest synchronization issues • Collection-replica fixity – Detect sub-quorum-level ‘islands’ of agreement (insufficient to validate default poll; but potentially sufficient to verify policy compliance) • File fixity, version/modification time – E.g., establish partial collection agreement • All files older than time X agree • All disagreements are versioning/modification time agreements • Longitudinal information on file/collection level – Subset of files persistently missing – Subset of files longitudinally different across re-harvesting (suggests dynamic content issues) • Universal Numeric Fingerprints/Semantic Signatures – Remove false positives from format migration/dynamic non-substantive content • Object information – Suggests dynamic object • Manual file inspection  – Check dynamic objects – Not scalable – Difficult to do without violating trust model Auditing Distributed Digital Preservation Networks 54
    55. 55. What can you infer from replication failure?Replicas X,Y disagree Collectionwith Z on collection Assumptions: A on host A Z is bad • Disagreement implies that content of collection A is different on all hosts • Contents of collection A should be identical on all hosts • If some content of collection A is bad, entire collection is bad Alternative Scenarios Audit Objects in Partial Non- information Collections grow collections are Agreement substantive cannot be rapidly frequently without dynamic collected from updated Quorum content some host Auditing Distributed Digital Preservation Networks 55
    56. 56. Lesson 6: Don’t aim for 100% performance, aim for 100% compliance• 100% of replicas agree: NO• 100% of collections are compliant 100% of the time: NO• 100% of files agree between verified collections: Maybe• 100% of policy overall: By design• 100% of bits in a file? : Implicitly assumed by tools; but not necessary• 100% policy for specific collection at specific time: Yes Auditing Distributed Digital Preservation Networks 56
    57. 57. Lessons Learned “What, me worry?” - - Alfred E. NeumanAuditing Distributed Digital Preservation Networks 57
    58. 58. Formative Lessons• Lesson 0: When innovating plan for… – substantial gap between prototype and production – multiple iterations• Lesson 1: Replication agreement does not prove collection integrity… confirm with external evidence of correct harvesting• Lesson 2: Replication disagreement does not not prove collection corruption… use diagnostics• Lesson 3: Distributed digital preservation works… with evidence-based tuning and adjustment Auditing Distributed Digital Preservation Networks 58
    59. 59. Analytic Lessons• Lesson 4: All networks had substantial and unrecognized gaps  Trust but continuously verify• Lesson 5: Don’t aim for 100% performance, aim for 100% compliance• Lesson 6: Many different things can go wrong in distributed systems, without easily recognizable external symptoms  Distributed preservation requires distributed auditing• Lesson 7: External information on system operation and collection characteristics is important for analyzing results  Transparency helps preservation Auditing Distributed Digital Preservation Networks 59
    60. 60. What’s Next? “It‟s tough to make predictions, especially about the future” -Attributed to Woody Allen, Yogi Berra, Niels Bohr, Vint Cerf, WinstonChurchill, Confucius, Disreali [sic], Freeman Dyson, Cecil B. Demille, Albert Einstein, Enrico Fermi, Edgar R. Fiedler, Bob Fourer, Sam Goldwyn, Allan Lamport, Groucho Marx, Dan Quayle, George Bernard Shaw, Casey Stengel, Will Rogers, M. Taub, Mark Twain, Kerr L. White, and others Auditing Distributed Digital Preservation Networks 60
    61. 61. Short Term• Complete round 4 data collection (including CLOCKSS)• Refinements of current auditing algorithms – More tunable parameters (yeah!?!) – Better documentation – Simple health metrics• Reports, and dissemination Auditing Distributed Digital Preservation Networks 61
    62. 62. Longer Term• Statistical health metrics• Diagnostics• Policy decision support• Additional audit standards• Support additional replication technology• Audit other policy sets Auditing Distributed Digital Preservation Networks 62
    63. 63. Bibliography (Selected)• B. Schneier, 2012. Liars and Outliers, John Wiley & Sons• H.M. Gladney, J.L. Bennett, 2003. “What do we mean by authentic”, D-Lib 9(7/8)• K. Thompson, 1984. “Reflections on Trusting Trust”, Communication of the ACM, Vol. 27, No. 8, August 1984, pp. 761-763.• David S.H. Rosenthal, Thomas S. Robertson, Tom Lipkis, Vicky Reich, Seth Morabito. “Requirements for Digital Preservation Systems: A Bottom-Up Approach”, D-Lib Magazine, vol. 11, no. 11, November 2005.• OAIS, Reference Model for an Open Archival Information System (OAIS). CCSDS 650.0-B-1, Blue Book, January 2002 Auditing Distributed Digital Preservation 63 Networks
    64. 64. Questions? SafeArchive: safearchive.orgE-mail: Micah_altman@alumni.brown.eduWeb: micahaltman.comTwitter: @drmaltmanE-mail: Jonathan_Crabtree@unc.edu Auditing Distributed Digital Preservation 64 Networks

    ×