Blue Rubin Task Force Presentation - Digital Preservation


Published on

Dublin, Ohio

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Blue Rubin Task Force Presentation - Digital Preservation

    1. 1. Blue Ribbon Task Force on Sustainable Digital Preservation and Access Peter Mojica, AXS-One Inc. SNIA-DMF Storage Networking Industry Association Data Management Forum October 28, 2008 The Conference Center at OCLC Dublin, Ohio
    2. 2. Topics <ul><li>Who are the stakeholders for these materials? </li></ul><ul><ul><li>Describe the users/communities that benefit from the preservation of the preserved materials. </li></ul></ul><ul><ul><ul><li>Who? </li></ul></ul></ul><ul><li>What is the “value proposition” for this preservation effort? </li></ul><ul><ul><li>Why are stakeholders interested in the long-term preservation of the materials? What are the anticipated future uses of the materials? Is the “value proposition” perpetual, or does it expire within a finite time frame? </li></ul></ul><ul><ul><ul><li>Why? </li></ul></ul></ul><ul><li>What is the nature of the materials being preserved? </li></ul><ul><ul><li>E.g., source(s), content, volume, format, copyright restrictions, frequency of use … </li></ul></ul><ul><ul><ul><li>What? </li></ul></ul></ul>
    3. 3. Digital Data Archival <ul><li>The task of archiving data has traditionally been considered as “one-way” </li></ul><ul><li>Digital data goes “in” </li></ul><ul><li>No exhaustive requirements for getting it “out” </li></ul>
    4. 4. Rapid Business Change <ul><li>Over the last 5 years business requirements have changed </li></ul><ul><li>There is now a need to get information “out” on a more regular basis </li></ul><ul><li>Confusion between long term data preservation repositories and transactional system usage </li></ul>
    5. 5. Complicating Matters <ul><li>It’s more complicated when others “unknowingly” set the end-user expectations for: </li></ul><ul><ul><li>Search </li></ul></ul><ul><ul><li>Retrieval </li></ul></ul><ul><ul><li>& Speed </li></ul></ul><ul><li>Data Preservation requirements are excruciatingly more complicated !!! </li></ul>
    6. 6. A Holistic Approach is Needed <ul><li>Two way requirements necessitates that “loose couplings” between source data and data preservation repositories become “tighter” </li></ul><ul><li>We need to know and preserve more at the time of acquisition </li></ul><ul><ul><li>Meta-data </li></ul></ul><ul><ul><li>Accrued meta-data over time </li></ul></ul><ul><ul><li>Provenance data </li></ul></ul><ul><ul><li>Full-Text indexing </li></ul></ul><ul><ul><li>Migration data </li></ul></ul><ul><ul><li>Classification </li></ul></ul><ul><ul><li>Retention </li></ul></ul><ul><ul><li>Destruction </li></ul></ul>
    7. 7. 100 Year Archive Survey Project <ul><li>The Research Goal </li></ul><ul><li>Determine requirements for the definition of best practices and solutions for the long-term digital information retention problems of the data center. </li></ul><ul><li>The Digital Crisis </li></ul><ul><ul><li>• Risk of losing digital information over time </li></ul></ul><ul><ul><li>• Growing cost and complexity of physical and logical migration </li></ul></ul><ul><ul><li>• Overwhelming volume of digital information to preserve long-term </li></ul></ul><ul><ul><li>• Increased legal, business, and security risk </li></ul></ul>
    8. 8. Who? Organization <ul><li>ORGANIZATION TYPE: The mix of respondents was in line with where long-term retention pain exists, governmental agencies, non-governmental organizations such as universities, libraries, and museums, and IT companies. </li></ul>
    9. 9. Who? Vertical <ul><li>INDUSTRY VERTICALS:With RIM, IT, and Archivists as the dominant respondent it is no surprise that the leading verticals represented by their organizations are education, government, IT services, and places where archivists work, including Libraries, Museums, and Churches. 65% of the respondents represented a very broad spectrum of organizations which further validates the importance and relevance of solving the long-term retention issues. </li></ul>
    10. 10. What? <ul><li>100 YrATF Analysis: Archivists and RIMs are most concerned with ‘source files’, the originals. IT would be more focused on databases, financials, or customer records. With the large percentage of RIM and Archivist respondents, it is not a surprise to see “source files” as the longest retained information type. What is a surprise in the data are the 6% who put ‘database archive’ records on top. </li></ul>
    11. 11. What? <ul><li>100 YrATF Analysis: This data confirms the importance of automatic classification. </li></ul>
    12. 12. Why? <ul><li>The top five (5) drivers identified are </li></ul><ul><ul><li>business </li></ul></ul><ul><ul><li>legal </li></ul></ul><ul><ul><li>security </li></ul></ul><ul><ul><li>compliance </li></ul></ul><ul><ul><li>other risk </li></ul></ul><ul><li>(the ‘other-risk’ category is principally “the risk of losing an organization’s history”). </li></ul>
    13. 13. Key Data Points <ul><li>70% of respondents say they are ‘highly dissatisfied’ with their ability to read their retained information in 50 years </li></ul><ul><li>Current practices are too manual, too prone to error and too costly </li></ul><ul><li>Collaboration is recognized as necessary in order to define information retention requirements </li></ul>“ Remember that IT doesn't own the information. RIM, Legal, Business units and IT all have a part to play in the decisions applied to business records and should be sitting down at the table together.” (Source: Respondent) Source: 100 Yr Archive Requirements Survey 2007 N=276
    14. 14. Key Data Points <ul><li>Over 80% report a need to retain information over 50 years, and 68% report a need of over 100 years </li></ul><ul><li>Long-term generally means longer than 10 to 15 years </li></ul><ul><li>Over 40% of respondents are keeping email records over 10 years </li></ul><ul><li>Database information was considered most at risk of loss </li></ul>Source: 100 Yr Archive Requirements Survey 2007 N=276
    15. 15. Key Findings <ul><li>Logical and physical migration do not scale cost-effectively </li></ul><ul><ul><li>Practitioners are struggling to keep up with migration requirements. Only 30% claimed to be doing physical migration correctly on disk & none on tape or optical. Only 20% claimed they were confident in their ability to logically migrate some of the data. </li></ul></ul><ul><ul><ul><li>Information is at risk long-term! </li></ul></ul></ul>
    16. 16. IT Preservation Practices <ul><li>What are the requirements? (Most do not know!) </li></ul><ul><li>Many still rely on Backup (Wrong!) </li></ul><ul><li>Record to tape and ‘lose it’ (Sad but true!) </li></ul><ul><li>Migration by Crisis: </li></ul><ul><ul><li>Only 30% Migrate every 3-5 years if on disk, none migrate regularly if on tape, if an application changes, it forces a ‘crisis’ migration </li></ul></ul><ul><li>Survey Conclusion: Information is at Risk! </li></ul>
    17. 17. Resources <ul><li>SNIA Data Management Forum </li></ul><ul><ul><li>100 Yr Archive Task Force Survey & Glossary </li></ul></ul><ul><li>DMF Community </li></ul><ul><ul><li> </li></ul></ul>