IPRES 2014 paper presentation: significant environment information for LTDP

SIGNIFICANT ENVIRONMENT
INFORMATION FOR LTDP
Fabio Corubolo, Adil Hasan – University of Liverpool
Anna Eggers, Jens Ludwig - Göttingen State University Library
Mark Hedges, Simon Waddington - King’s College London
This project has received funding from the European Union’s Seventh Framework
Programme for research, technological development and demonstration under
grant agreement no FP7-601138 PERICLES.

Objective and outline
• Aim: Ensure long term usability of Digital Objects (DO)
• Usability of Digital Object usually requires access to parts of its
environment
• Define a broad set of information (Environment information)
• Consider its significance (Significant environment information)
• Explore and test pragmatic methods to collect such information

Environment information definition
• All the entities (DOs, metadata, policies, rights, services, users,
etc.) useful to correctly access, render and use the DO.
Refinement:
• The information about the set of relationships between the
source DO and any related objects from its environment.

Environment for a DO
• Technical system information (OS, system architecture, etc.)
• DO metadata (descriptive, structural, technical)
• User, policy, process information (User BG knowledge, …)
• Information necessary to make use of the object including:
• Auxiliary data (e.g. calibration data for to support sensor data)
• External documentation (e.g. specifications, related documents)
• Implicit knowledge about what data is useful to use the DO (e.g. the user
knowledge about what is relevant and what not in the collection)
• More…

No object is an island, entire of itself
• Digital objects are used in a rich environment
Digital object
Ext. Metadata
Environment
Storage Digital object

Digital object information
• Rich and varied terminology
• The scope of each term is not
absolutely defined
• We are aiming to support
object use: use-centric view
• First broad - Environment
information: more or less all
that sits outside of the DO

Standards, and coverage – initial analysis

Significant Environment Information (SEI)
• Use of a DO has a purpose
• The purpose gives a scope to the dependent environment
information
• Weights can express the importance for a specific purpose
(definition)
We define SEI as the set of relationships between a DO and its
environment information qualified with purpose and weights

How to collect and measure SEI?
• Observe the use of DOs – in different phases of lifecycle
• in the environment of creation and use
• Collect dependencies for use (relationships to other DOs)
• Measure significance e.g. based on frequency of use
• Different semantics and factors for significance weights (value,…) – WIP
• Weights will change in time
• Sheer curation: curation activities integrated in the use
workflow; lightweight and transparent

Pericles Extraction Tool (PET)
• Open source* framework - builds on the SEI concepts
• Uses a sheer curation approach – right time and place
• Generic, modular, domain agnostic
• Collection by observation – monitoring changes in time
• Snapshot of the system environment
• To observe unstructured workflows
• https://github.com/pericles-project/pet
* Release due soon, approved but waiting for final stamps

PET Architecture and modules
• Available and used system resources;
• File format identification and
checksums;
• Currently running processes;
• Event information (file and network)
from processes;
• Graphic configuration information;
• MS Office and PDF font
dependencies.
• Native commands

The compulsory screenshots slide

How to setup PET for a use scenario
• PET is installed, configured, started on the machine where the
DOs are used – stays in monitoring mode
• The profile (modules and configuration) are use case specific
• The user interacts normally with the DOs while PET collects SEI
in the background
• The environment information, DO events and changes are
collected for future use and analysis

General scenario for PET
1. Use PET to collect environment information when-where the
DOs are used, based on profiles
--- We are now here ---
2. Analyse the information collected to infer new relationships
(also SEI) between DOs - forming a graph structure
3. Assign weights to relationships based on the purpose and
significance – weighted graph

Experiment: use case description
• Fictional scenario, based on operations for ISS SOLAR payload
• Operator’s task: resolve anomalies
• Process: extensive search in the archived data + documents
• Issue: how to preserve implicit information, help with overload
• PET task: record SEI for a specific anomaly
• monitor environment, record significant events, infer documentation
useful to solve the anomaly
• SEI: to identify and debug a specific anomaly, that is the implicit
operator knowledge

Experimental results (1)

An anomaly is reported in an handover sheet
The operator proceeds with
documentation search and
consultation, all tracked by PET

Experimental results (2)
• Environment monitoring
• Events, extraction on occurrence of events
• Leads to dependency inference
• In future work we consider more complex issues
• ‘noise’ from multitask,
• careful analysis of collected data in the next phases

Conclusions, Future work
• Define Significant Environment Information (SEI) for object reuse
• Base for dependency graphs weighted on significance and purpose
• Explain ways to obtain SEI and significance weights
• Present the PET tool – to collect SEI
• Show experimental results - initial dependency collection
Future:
• Improve: filtering, dependency inference
• Work on definition and semantics for significance weights
• Use weighted dependency graphs to support appraisal

Thank you!
More information:
• https://github.com/pericles-project/pet

About the PERICLES project
• Promoting and enhancing reuse of information throughout the
content lifecycle taking account of evolving semantics
• Ensure availability and reuse of digital objects for the next
generations
• Extensions to current preservation and lifecycle models to
address the evolution of dynamic heterogeneous resources and
their dependencies
• Models capturing intent and interpretative context: key to
achieving “preservation by design”

Facts & Figures
• Collaborative FP7 project on digital preservation
• 12 million Euro, co-funded by the European Commission
• 11 partners: research institutions, IT development and
application domain
• 6 European countries
• Feb 2013 – Feb 2017
• Project website: http://www.pericles-project.eu

Consortium
COORDINATOR: King’ s College London – UK
ACADEMIC PARTNERS:
Hoegskolan i Borås – University of Borås – SE
Georg-August-Universität Göttingen – DE
University of Liverpool – UK
Centre for Research and Technology Hellas – GR
University of Edinburgh – UK
NON-ACADEMIC PUBLIC SECTOR ORGANISATIONS
Tate – UK
Belgian User Service and Operation Centre - B.USOC – BE
PRIVATE SECTOR ORGANISATIONS
Dotsoft – GR
Space Applications Services NV/SA (SpaceApps) – BE
Xerox Research Centre Europe - FR

IPRES 2014 paper presentation: significant environment information for LTDP

More Related Content

What's hot

Viewers also liked

Similar to IPRES 2014 paper presentation: significant environment information for LTDP

More from Fabio Corubolo

Recently uploaded

IPRES 2014 paper presentation: significant environment information for LTDP

Editor's Notes