SCAPE 
Scalable Preservation Environments
• Your collection of digital data is growing rapidly. 
• Your preservation activities must become more 
efficient and more scalable. 
• You need SCAPE! 
• The SCAPE project has developed scalable solutions 
for long-term preservation of large-scale and 
heterogeneous data sets. 
2 
Digital Preservation – What do I need? 
This work was partially supported by the SCAPE Project. 
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
3 
What is SCAPE? 
Its all about scalability! 
• Scalable services for planning and execution of 
institutional preservation strategies 
• Infrastructure for the execution of digital 
preservation processes on large volumes of data 
• Existing tools have been improved and extended. 
• New tools have been developed where necessary. 
This work was partially supported by the SCAPE Project. 
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
4 
What is SCAPE? 
SCAPE covers a whole digital preservation life cycle 
• Interconnecting services support 
the preservation of large 
repositories of digital objects 
• Applications support the 
formulation of preservation 
policies, decision making and 
selection of preservation actions 
This work was partially supported by the SCAPE Project. 
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
5 
What is SCAPE? 
Take your pick – choose what you need! 
• Use the full set of interconnected 
SCAPE components or a selected 
series of SCAPE tools or workflows. 
• Many SCAPE components can be 
individually incorporated. 
This work was partially supported by the SCAPE Project. 
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
• All SCAPE solutions arise from real-world challenges at 
partner institutions. 
• Each challenge is tested in testbeds at the partner 
institutions. 
6 
Solutions Tested in Real Life 
Web 
Content 
Digital 
Repositories 
This work was partially supported by the SCAPE Project. 
Data 
Centres 
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137). 
Research 
Data Sets 
Testbeds
Solutions for Content Holders 
Scalability 
In four dimensions: 
Heterogeneity of collections 
as well as number, size and 
complexity of objects 
Automation 
Through scalable, 
automated and simple to 
design preservation 
workflows 
Planning 
Answering core 
preservation planning 
questions 
Integration 
Through a robust, 
integrated, open source 
preservation system 
7 This work was partially supported by the SCAPE Project. 
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
8 
Overview: SCAPE Architecture 
This work was partially supported by the SCAPE Project. 
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
9 
Overview: SCAPE Components 
This work was partially supported by the SCAPE Project. 
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137). 
The SCAPE 
Platform is a 
reference 
architecture 
for scalable 
preservation 
environments
10 
Overview: SCAPE Components 
This work was partially supported by the SCAPE Project. 
The SCAPE Preservation 
Components are tools which 
enhance the functionality of a 
digital preservation system in: 
• Scalability 
• Functional coverage 
• Quality 
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
The SCAPE Planning and Watch 
components address the 
bottleneck of decision 
processes and processing 
information required for 
decision making 
11 
Overview: SCAPE Components 
This work was partially supported by the SCAPE Project. 
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
Examples of tools and services 
12 This work was partially supported by the SCAPE Project. 
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
13 
Scalable Planning and Watch 
Scout – an Automated Preservation Watch System 
• Enables you to monitor your 
collections 
• Lets you access 
community knowledge 
• Collects relevant knowledge 
and enables automated 
notification 
This work was partially supported by the SCAPE Project. 
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
14 
Scalable Planning and Watch 
C3PO – Content Profiling Tool for Preservation Analysis 
• Analyses characterisation 
metadata for digital collections 
• Aggregates and combines the 
metadata information across 
collections 
• Generates a profile of the 
content set 
• Allows use of different 
metadata formats 
This work was partially supported by the SCAPE Project. 
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
15 
Scalable Planning and Watch 
Plato – Scalable Preservation Planning 
• Decision-making support tool 
• Guides you through the 
preservation planning 
workflow 
• Provides trust through 
controlled experiments and 
documentation 
• Provides an executable plan 
This work was partially supported by the SCAPE Project. 
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
16 
Scalable Tools 
ToMaR – let your Preservation Tools Scale 
• Run existing tools against 
large amounts of files 
• Execute tools in a scalable 
fashion on a MapReduce 
cluster 
• Enable scalable workflows 
which chain together a set of 
tools 
• Process payloads too big to be 
computed on a single machine 
This work was partially supported by the SCAPE Project. 
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
17 
Preservation Components 
Pagelyzer – Monitor your Web Content 
• Detect changes in web pages 
• Compare web page versions 
on a large scale 
• Compare web page rendering 
in different browsers 
• Determine appropriate 
frequency of web harvestings 
This work was partially supported by the SCAPE Project. 
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
18 
Preservation Components 
Jpylyzer – Easy Validation of JPEG 2000 
• Automated JP2 validation and 
feature extraction 
• Enables you to confirm 
whether an image is a valid, 
intact JP2 file 
• Reports the key technical 
properties of the image 
This work was partially supported by the SCAPE Project. 
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
19 
Preservation Components 
Matchbox – easy Detection of Nearly Duplicate Images 
• Identify duplicate content, 
even where files are of 
different size, format, 
cropping etc. or scanned from 
different original copy 
• Automate quality assurance 
and reduce manual effort 
This work was partially supported by the SCAPE Project. 
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
20 
Preservation Components 
xcorrSound – Automate Sound Wave Analysis 
• Compare two audio files and 
output the similarity 
• Detect overlaps in audio files 
• Detect occurrences of a 
smaller audio file (e.g. a jingle) 
within a larger audio file or an 
index of audio files 
This work was partially supported by the SCAPE Project. 
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
SCAPE tools are published as open source software. 
Tools and services from SCAPE are sustained by 
• Open Planets Foundation - 
address core digital preservation 
challenges and engage with the community 
• COPTR - 
Community Owned digital Preservation 
Tool Registry 
21 
Sustainability of Tools and Services 
This work was partially supported by the SCAPE Project. 
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
Sustainability of SCAPE results 
Ultimate Sustainability goal: 
• Supporting communities of practice by enabling 
efficient collaboration during the project and 
beyond. 
Open Planets Foundation will take post-project 
ownership of the outputs, supported by other 
partners providing specific capabilities. 
22 This work was partially supported by the SCAPE Project. 
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
Sustainability of SCAPE results 
Five complementary approaches: 
• Visibility 
Providing integrated outreach to multiple audiences to maximise 
discoverability. 
• Quality 
Ensuring that project outputs conform to standards-driven quality 
assurance. 
• Training 
Supporting skills development to further institutional capacity building. 
• Open licensing 
Using open licences to encourage the adoption and reuse of project 
outputs. 
• Community integration 
Integrating project outputs into commercial and non-commercial 
systems and services. 
23 This work was partially supported by the SCAPE Project. 
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
• EU-funded project under FP7 (Research and 
Technological Development) 
• Project runtime: February 2011 to September 2014 
• 20 partners from 10 countries - from memory 
institutions, data centres, research labs, universities, 
and industrial firms 
• Public Project materials are licensed under a 
CC-BY-SA International License 
24 
About SCAPE 
This work was partially supported by the SCAPE Project. 
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
25 
SCAPE Consortium 
This work was partially supported by the SCAPE Project. 
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
26 
Additional Sources of Interest 
• Development Infrastructure 
• Code repository hosted by the Open Planets Foundation and GitHub 
• https://github.com/openplanets/scape/ 
• Development Wiki 
• http://wiki.opf-labs.org/display/SP/Home 
This work was partially supported by the SCAPE Project. 
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137). 
• Tools 
• http://www.scape-project.eu/tools 
• Experimental Workflows 
• http://www.myexperiment.org/search?query=SCAPE&type=all&commit=Search 
• Publications 
• http://www.scape-project.eu/category/publication 
• Public Deliverables 
• http://www.scape-project.eu/category/deliverable
27 
More Information 
• SCAPE website: www.scape-project.eu 
• Blog posts and more: 
www.openplanetsfoundation.com/projects/scape 
• Tools and Services: 
https://github.com/openplanets/scape 
• SCAPE Twitter: @SCAPEProject, #SCAPEProject 
• SCAPE Newsletter: Sign up via www.scape-project.eu 
All images © the SCAPE Project or its partners, 
except images on slides 3, 6 and 26 © www.digitalbevaring.dk 
This work was partially supported by the SCAPE Project. 
The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).

Scape project presentation - Scalable Preservation Environments

  • 1.
  • 2.
    • Your collectionof digital data is growing rapidly. • Your preservation activities must become more efficient and more scalable. • You need SCAPE! • The SCAPE project has developed scalable solutions for long-term preservation of large-scale and heterogeneous data sets. 2 Digital Preservation – What do I need? This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
  • 3.
    3 What isSCAPE? Its all about scalability! • Scalable services for planning and execution of institutional preservation strategies • Infrastructure for the execution of digital preservation processes on large volumes of data • Existing tools have been improved and extended. • New tools have been developed where necessary. This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
  • 4.
    4 What isSCAPE? SCAPE covers a whole digital preservation life cycle • Interconnecting services support the preservation of large repositories of digital objects • Applications support the formulation of preservation policies, decision making and selection of preservation actions This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
  • 5.
    5 What isSCAPE? Take your pick – choose what you need! • Use the full set of interconnected SCAPE components or a selected series of SCAPE tools or workflows. • Many SCAPE components can be individually incorporated. This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
  • 6.
    • All SCAPEsolutions arise from real-world challenges at partner institutions. • Each challenge is tested in testbeds at the partner institutions. 6 Solutions Tested in Real Life Web Content Digital Repositories This work was partially supported by the SCAPE Project. Data Centres The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137). Research Data Sets Testbeds
  • 7.
    Solutions for ContentHolders Scalability In four dimensions: Heterogeneity of collections as well as number, size and complexity of objects Automation Through scalable, automated and simple to design preservation workflows Planning Answering core preservation planning questions Integration Through a robust, integrated, open source preservation system 7 This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
  • 8.
    8 Overview: SCAPEArchitecture This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
  • 9.
    9 Overview: SCAPEComponents This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137). The SCAPE Platform is a reference architecture for scalable preservation environments
  • 10.
    10 Overview: SCAPEComponents This work was partially supported by the SCAPE Project. The SCAPE Preservation Components are tools which enhance the functionality of a digital preservation system in: • Scalability • Functional coverage • Quality The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
  • 11.
    The SCAPE Planningand Watch components address the bottleneck of decision processes and processing information required for decision making 11 Overview: SCAPE Components This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
  • 12.
    Examples of toolsand services 12 This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
  • 13.
    13 Scalable Planningand Watch Scout – an Automated Preservation Watch System • Enables you to monitor your collections • Lets you access community knowledge • Collects relevant knowledge and enables automated notification This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
  • 14.
    14 Scalable Planningand Watch C3PO – Content Profiling Tool for Preservation Analysis • Analyses characterisation metadata for digital collections • Aggregates and combines the metadata information across collections • Generates a profile of the content set • Allows use of different metadata formats This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
  • 15.
    15 Scalable Planningand Watch Plato – Scalable Preservation Planning • Decision-making support tool • Guides you through the preservation planning workflow • Provides trust through controlled experiments and documentation • Provides an executable plan This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
  • 16.
    16 Scalable Tools ToMaR – let your Preservation Tools Scale • Run existing tools against large amounts of files • Execute tools in a scalable fashion on a MapReduce cluster • Enable scalable workflows which chain together a set of tools • Process payloads too big to be computed on a single machine This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
  • 17.
    17 Preservation Components Pagelyzer – Monitor your Web Content • Detect changes in web pages • Compare web page versions on a large scale • Compare web page rendering in different browsers • Determine appropriate frequency of web harvestings This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
  • 18.
    18 Preservation Components Jpylyzer – Easy Validation of JPEG 2000 • Automated JP2 validation and feature extraction • Enables you to confirm whether an image is a valid, intact JP2 file • Reports the key technical properties of the image This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
  • 19.
    19 Preservation Components Matchbox – easy Detection of Nearly Duplicate Images • Identify duplicate content, even where files are of different size, format, cropping etc. or scanned from different original copy • Automate quality assurance and reduce manual effort This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
  • 20.
    20 Preservation Components xcorrSound – Automate Sound Wave Analysis • Compare two audio files and output the similarity • Detect overlaps in audio files • Detect occurrences of a smaller audio file (e.g. a jingle) within a larger audio file or an index of audio files This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
  • 21.
    SCAPE tools arepublished as open source software. Tools and services from SCAPE are sustained by • Open Planets Foundation - address core digital preservation challenges and engage with the community • COPTR - Community Owned digital Preservation Tool Registry 21 Sustainability of Tools and Services This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
  • 22.
    Sustainability of SCAPEresults Ultimate Sustainability goal: • Supporting communities of practice by enabling efficient collaboration during the project and beyond. Open Planets Foundation will take post-project ownership of the outputs, supported by other partners providing specific capabilities. 22 This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
  • 23.
    Sustainability of SCAPEresults Five complementary approaches: • Visibility Providing integrated outreach to multiple audiences to maximise discoverability. • Quality Ensuring that project outputs conform to standards-driven quality assurance. • Training Supporting skills development to further institutional capacity building. • Open licensing Using open licences to encourage the adoption and reuse of project outputs. • Community integration Integrating project outputs into commercial and non-commercial systems and services. 23 This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
  • 24.
    • EU-funded projectunder FP7 (Research and Technological Development) • Project runtime: February 2011 to September 2014 • 20 partners from 10 countries - from memory institutions, data centres, research labs, universities, and industrial firms • Public Project materials are licensed under a CC-BY-SA International License 24 About SCAPE This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
  • 25.
    25 SCAPE Consortium This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
  • 26.
    26 Additional Sourcesof Interest • Development Infrastructure • Code repository hosted by the Open Planets Foundation and GitHub • https://github.com/openplanets/scape/ • Development Wiki • http://wiki.opf-labs.org/display/SP/Home This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137). • Tools • http://www.scape-project.eu/tools • Experimental Workflows • http://www.myexperiment.org/search?query=SCAPE&type=all&commit=Search • Publications • http://www.scape-project.eu/category/publication • Public Deliverables • http://www.scape-project.eu/category/deliverable
  • 27.
    27 More Information • SCAPE website: www.scape-project.eu • Blog posts and more: www.openplanetsfoundation.com/projects/scape • Tools and Services: https://github.com/openplanets/scape • SCAPE Twitter: @SCAPEProject, #SCAPEProject • SCAPE Newsletter: Sign up via www.scape-project.eu All images © the SCAPE Project or its partners, except images on slides 3, 6 and 26 © www.digitalbevaring.dk This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).