This document summarizes tools from the PLANETS, OPF, and SCAPE projects and where their development is heading. It discusses making the PLANETS tools more modular and flexible. SCAPE is building on PLANETS work and using cluster computing for large-scale preservation. The PLANETS Testbed is being reimagined as a gateway to various interconnected tools for experiment design, execution, and analysis of preservation actions. SCAPE is leveraging these tools and aims to publish more experimental data.
Planets, OPF & SCAPE - presentation of tools on digital preservation
1. PLANETS, OPF & SCAPE
A summary of the tools from these
preservation projects, and where their
development is heading
www.openplanetsfoundation.org
2. PLANETS
• A big project to build digital preservation tools...
www.openplanetsfoundation.org
3. OPF’s Challenge
• The Open Planets Foundation was set up to sustain
the PLANETS outputs into the future.
– But the tools are
• Numerous, often complex, & of mixed quality/maturity
• Require complex technology stacks (JEE)
– So, how do we make the code sustainable?
• Selection, modularisation, simplification
• Aim for a flexible suite of modular tools, rather than a
monolithic system
www.openplanetsfoundation.org
4. SCAPE
• http://www.scape-project.eu/
• Many PLANETS partners
– Including OPF
• Many new partners too
• Driven by data
– Web archiving, science data, large-scale
• Cluster computing for scale
– Based on the HADOOP platform
www.openplanetsfoundation.org
7. The PLANETS Testbed:
Too Many Good Ideas In One Place
• Designing experiments
– Web GUI for complex workflows
• Running experiments
– All services hosted centrally, plus test corpora
• Analysing the results
– Per-experiment automated & manual analysis
– Multi-experiment aggregation & data mining
• Sharing all of the above
www.openplanetsfoundation.org
8. Re-imagining The PLANETS Testbed:
A Modular Approach
• Use separate tools in each role
– Experiment Design
– Execution
– Analysis
• Publish results from each
– Loosely coupled instead of all-in-one
• i.e. sharing is built into the design
www.openplanetsfoundation.org
9. Experiment Design:
SCAPE Workflows In Taverna
• As part of SCAPE
www.openplanetsfoundation.org
11. Experiment Design Support:
OPF Shared Test Corpora
• Simple collections accessed over HTTP
– No special browser software required
• Publicly hosted by HATII
– May also be mirrored by OPF members
• Stabilise corpora from Planets
– Adsorb corpora from SCAPE & elsewhere
• Look for Open Source CMS/Annotation tools
– Layer on top of HTTP collections
www.openplanetsfoundation.org
13. Experiment Execution Support:
SCAPE’s Lightweight Tool Wrapping
• PIT: Preservation-action Invocation Tool
– Uses XML ‘tool specification’ documents that
describe preservation actions
• Command-line templates, Java classes, PLANETS/SCAPE
web services, etc
– Built to be shared
• Can be published via, e.g. myExperiment
• Should lead to more reproducible results
– Re-using PLANETS interoperability code
www.openplanetsfoundation.org
14. Experiment Execution:
Multi-platform Tool & Workflow Invocation
• Shared tool specifications make multi-platform
execution easier
– From the command line
– From within Taverna
– From the SCAPE cluster platform
– From a simplified web interface
• Run local-first, remote/service as needed
• Collect results in a standard form, using Testbed code
www.openplanetsfoundation.org
15. Experiment Execution:
Publishing Experimental Results Via REF
• OPF Results Evaluation Framework: REF
– Hard-coded experiments of common interest
• Can run the experiment automatically
– Publishes results as linked data
• http://data.openplanetsfoundation.org/ref/extension/
• Built by Dave Tarrant, based on P2 format registry
– Will come up again in the Identification session
– SCAPE aims to publish much more data
www.openplanetsfoundation.org
16. Analysing Results:
Linked Data & Future Plans
• REF allows data to be inspected
– Concentrating on collecting data at present
• Will expose SPARQL endpoint for data queries
– Analysis, visualisation can be build upon that
• Please add analysis Issues for your Datasets and
preservation processes to the wiki!
– e.g. what graphs and statistics would be useful?
www.openplanetsfoundation.org
17. Summary
• PLATO
– SCAPE will add Preservation Watch & more
• The PLANETS Testbed
– Re-imagined as a gateway to a complementary
suite of preservation tools and data services
– SCAPE leveraging work from Taverna, IMPACT
• Development driven by user needs
– SCAPE Scenarios, AQuA/Hackathon Issues
www.openplanetsfoundation.org