Evolving Domains, Problems and Solutions for Long Term Digital Preservation


Published on

Overview of FP7 projects, including ARCOMEM, ENSURE, SCAPE and TIMBUS. Presentation by Dr. Ross King, AIT Austrian Institute of Technology GmbH, at iPres 2011, Singapore. In Proceedings of the 8th International Conference on Preservation of Digital Objects (iPRES 2011), 2011, 194-204 ISBN 978-981-07-0441-4

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Evolving Domains, Problems and Solutions for Long Term Digital Preservation

  1. 1. Evolving Domains, Problems and Solutions for Long Term Digital Preservation Dr. Ross King AIT Austrian Institute of Technology GmbH
  2. 2. Co-Authors• Orit Edelstein – IBM Research, Haifa• Michael Factor – IBM Research, Haifa• Thomas Risse – L3S Research Center, Hannover• Eliot Salant – IBM Research, Haifa• Philip Taylor – SAP Research, Belfast
  3. 3. Outline• Why these projects?• Introducing the projects• Comparing and contrasting the projects – Motivation – Objectives – Approach• Trends in Digital Preservation
  4. 4. Why these projects?
  5. 5. Timeline of Digital Preservation Projectsfrom http://cordis.europa.eu/fp7/ict/telearn-digicult/report-research-digital-preservation_en.pdfCoordinated Action Network of Excellence STREP Collaborative Project FP7 6th Call, Objective ICT-2009.4.1: Digital Libraries and Digital Preservation 5 07.11.2011
  6. 6. EU Funding for Digital Preservation Projects from http://cordis.europa.eu/fp7/ict/telearn-digicult/report-research-digital-preservation_en.pdf FP7 FP6 FP5 68.4 M€ 24.9 M€ 0.9 M€ 6 07.11.2011
  7. 7. Introducing the projects
  8. 8. ARCOMEM• Transforming Web archives into community memories that are much more tightly integrated with their community of current and future users.• Developing methods and tools based on novel socially-aware and socially- driven Web preservation models.• Three dimensions – Social Web analysis: leverage Social Web information, relying on the Wisdom of the Crowds for intelligent content appraisal, selection, contextualization and preservation. – Archive enrichment: extract information about entities, events, topics, and opinions. – Intelligent and collaborative content acquisition support for archives• Two testbeds – Media-related web archives (Sudwestrundfunk, Deutsche Welle) – Political archives (Helenic and Austrian Parliaments)
  9. 9. ENSUREEnabling kNowledge Sustainability, Usability and Recovery for Economic value• EVALUATE Cost and Value • Ability to compose different quality solutions at different costs • Build a software stack that balances the cost of preservation against the value of the data• AUTOMATE Preservation Lifecycle • Control the preservation lifecycle based on • the changing value of business data over time • changes in regulation • advances in underlying technology• PROTECT 4 3 • Content-aware data protection • Focus on long term access control, privacy and IPR, and de-identification Healthcare• SCALE using ICT innovations • Investigate economical and scalable solutions INNOVATIONS USE CASES Clinical Studies such as cloud storage Financial Services • include issues of security and data locality• Three testbeds • Healthcare • Clinical Trials • Financial Services
  10. 10. SCAPESCAlable Preservation Environments• Making preservation planning and preservation workflows scalable – Define and test an infrastructure for scalable preservation actions – Provide a framework for automated quality assurance workflows – Develop a policy-based preservation planning tool with automated preservation watch• Three testbeds – Web archives – Large-scale repositories – Research data sets from digitalbevaring.dk
  11. 11. TIMBUSTimeless Business Processes and Services• Exploring scenarios where the important digital information to be preserved is the execution context within which data are processed, analysed, transformed and rendered. – Although there are significant advantages to SaaS and IoS models, there is the danger of services and service providers disappearing (for various reasons), leaving partially complete business processes.• Enlarging the understanding of digital preservation to include the set of activities, processes and tools that ensure continued access to services and software necessary to produce the context within which information can be accessed, properly rendered, validated and transformed into context based knowledge.• Three testbeds – engineering services and systems for digital preservation – civil engineering infrastructures – e-science and mathematical simulations
  12. 12. Comparing and contrasting the projects
  13. 13. Motivation• ACROMEM is unique in dealing with publically available and non-regulated data and in harnessing the "wisdom of crowds" to help decide what to preserve.• TIMBUS focuses on the environments that produce the data rather than the data itself.• ENSURE and TIMBUS are motivated in part by accurate risk assessment and preservation lifecycle issues related to regulations.• ENSURE, SCAPE and TIMBUS address the scalability of technology and software infrastructure for digital preservation.• Targeted Stakeholders: – scientific data (SCAPE, ENSURE, TIMBUS) – memory institutions (SCAPE, ACROMEM) – web (SCAPE, ACROMEM) – engineering (TIMBUS) – health care (ENSURE) – finance (ENSURE)
  14. 14. Objectives• ENSURE, SCAPE, and TIMBUS are focused on organisations (organization- focused projects); ARCOMEM is focused on the web• All project address the question "what is to be preserved" – ARCOMEM: social media can tell us – ENSURE: extract this information from business rules – SCAPE and TIMBUS: provide tools for responsible persons (curators) – TIMBUS driven by risk management, ENSURE by cost/benefit• ARCOMEM, ENSURE and SCAPE focus on issues of scalability – ARCOMEM, SCAPE: computational – ENSURE: storage infrastructure• The organisation-focused projects also consider – the automation of the preservation lifecycle – the automation of quality assurance for preservation actions• Both ENSURE and TIMBUS have the goal of re-running software after long periods of time
  15. 15. Approach• All four projects will produce prototype software frameworks – The organisation-focused projects all propose to implement platforms for the execution of preservation workflows• SCAPE and ENSURE will make use of service-oriented architectures – SCAPE for prototyping only; SOA model workflows should be translated in to Map/Reduce jobs• Digital Lifecycle approach – TIMBUS focuses on the legal and IPR aspects – ENSURE focuses on the trade-offs between quality, cost and economic performance• Preservation planning plays a role in all projects – ENSURE plans a configuration layer with special emphasis on cost versus value – The TIMBUS approach is based on dependency and risk management – Both ARCOMEM and SCAPE rely on the internet to guide preservation • ARCOMEM through the monitoring of social media • SCAPE through the monitoring of web harvests• Virtualisation plays a role in all organisation-focused projects – ENSURE: as a means to access digital objects – SCAPE: as a means to deploy complex preservation action environments – TIMBUS: as a means to preserve and recover the entire business process
  16. 16. Some trendsin Digital Preservation
  17. 17. Trends in Digital Preservation Projects2006 2007 2008 2009 2010 2011 2012 CONTENT-DRIVEN Semantic Semantic Web Services Web Services + Agents EMULATION Virtualization PANIC Workflow Linked Open Data SEMANTIC WEB WORKFLOW SOA: Web Services WEB SERVICES Security and Trust Distributed Storage Quality Assurance GRID Distributed Distributed Processing Storage CLOUD 17 07.11.2011
  18. 18. Thank you for your attention! Ross King – AIT, Vienna Orit Edelstein – IBM Research, Haifa Michael Factor – IBM Research, Haifa Thomas Risse – L3S Research Center, Hannover Eliot Salant – IBM Research, Haifa Philip Taylor – SAP Research, Belfast ARCOMEM: www.arcomem.eu ENSURE: ensure-fp7.eu SCAPE: www.scape-project.eu TIMBUS: timbusproject.net