SIGNIFICANT ENVIRONMENT 
INFORMATION FOR LTDP 
Fabio Corubolo, Adil Hasan – University of Liverpool 
Anna Eggers, Jens Ludwig - Göttingen State University Library 
Mark Hedges, Simon Waddington - King’s College London 
This project has received funding from the European Union’s Seventh Framework 
Programme for research, technological development and demonstration under 
grant agreement no FP7-601138 PERICLES.
Objective and outline 
• Aim: Ensure long term usability of Digital Objects (DO) 
• Usability of Digital Object usually requires access to parts of its 
environment 
• Define a broad set of information (Environment information) 
• Consider its significance (Significant environment information) 
• Explore and test pragmatic methods to collect such information
Environment information definition 
• All the entities (DOs, metadata, policies, rights, services, users, 
etc.) useful to correctly access, render and use the DO. 
Refinement: 
• The information about the set of relationships between the 
source DO and any related objects from its environment.
Environment for a DO 
• Technical system information (OS, system architecture, etc.) 
• DO metadata (descriptive, structural, technical) 
• User, policy, process information (User BG knowledge, …) 
• Information necessary to make use of the object including: 
• Auxiliary data (e.g. calibration data for to support sensor data) 
• External documentation (e.g. specifications, related documents) 
• Implicit knowledge about what data is useful to use the DO (e.g. the user 
knowledge about what is relevant and what not in the collection) 
• More…
No object is an island, entire of itself 
• Digital objects are used in a rich environment 
Digital object 
Ext. Metadata 
Environment 
Storage Digital object
Digital object information 
• Rich and varied terminology 
• The scope of each term is not 
absolutely defined 
• We are aiming to support 
object use: use-centric view 
• First broad - Environment 
information: more or less all 
that sits outside of the DO
Standards, and coverage – initial analysis
Significant Environment Information (SEI) 
• Use of a DO has a purpose 
• The purpose gives a scope to the dependent environment 
information 
• Weights can express the importance for a specific purpose 
(definition) 
We define SEI as the set of relationships between a DO and its 
environment information qualified with purpose and weights
How to collect and measure SEI? 
• Observe the use of DOs – in different phases of lifecycle 
• in the environment of creation and use 
• Collect dependencies for use (relationships to other DOs) 
• Measure significance e.g. based on frequency of use 
• Different semantics and factors for significance weights (value,…) – WIP 
• Weights will change in time 
• Sheer curation: curation activities integrated in the use 
workflow; lightweight and transparent
Pericles Extraction Tool (PET) 
• Open source* framework - builds on the SEI concepts 
• Uses a sheer curation approach – right time and place 
• Generic, modular, domain agnostic 
• Collection by observation – monitoring changes in time 
• Snapshot of the system environment 
• To observe unstructured workflows 
• https://github.com/pericles-project/pet 
* Release due soon, approved but waiting for final stamps
PET Architecture and modules 
• Available and used system resources; 
• File format identification and 
checksums; 
• Currently running processes; 
• Event information (file and network) 
from processes; 
• Graphic configuration information; 
• MS Office and PDF font 
dependencies. 
• Native commands
The compulsory screenshots slide
How to setup PET for a use scenario 
• PET is installed, configured, started on the machine where the 
DOs are used – stays in monitoring mode 
• The profile (modules and configuration) are use case specific 
• The user interacts normally with the DOs while PET collects SEI 
in the background 
• The environment information, DO events and changes are 
collected for future use and analysis
General scenario for PET 
1. Use PET to collect environment information when-where the 
DOs are used, based on profiles 
--- We are now here --- 
2. Analyse the information collected to infer new relationships 
(also SEI) between DOs - forming a graph structure 
3. Assign weights to relationships based on the purpose and 
significance – weighted graph
Experiment: use case description 
• Fictional scenario, based on operations for ISS SOLAR payload 
• Operator’s task: resolve anomalies 
• Process: extensive search in the archived data + documents 
• Issue: how to preserve implicit information, help with overload 
• PET task: record SEI for a specific anomaly 
• monitor environment, record significant events, infer documentation 
useful to solve the anomaly 
• SEI: to identify and debug a specific anomaly, that is the implicit 
operator knowledge
Experimental results (1) 
	 
An anomaly is reported in an handover sheet 
The operator proceeds with 
documentation search and 
consultation, all tracked by PET
Experimental results (2) 
• Environment monitoring 
• Events, extraction on occurrence of events 
• Leads to dependency inference 
• In future work we consider more complex issues 
• ‘noise’ from multitask, 
• careful analysis of collected data in the next phases
Conclusions, Future work 
• Define Significant Environment Information (SEI) for object reuse 
• Base for dependency graphs weighted on significance and purpose 
• Explain ways to obtain SEI and significance weights 
• Present the PET tool – to collect SEI 
• Show experimental results - initial dependency collection 
Future: 
• Improve: filtering, dependency inference 
• Work on definition and semantics for significance weights 
• Use weighted dependency graphs to support appraisal
Thank you! 
More information: 
• https://github.com/pericles-project/pet
About the PERICLES project 
• Promoting and enhancing reuse of information throughout the 
content lifecycle taking account of evolving semantics 
• Ensure availability and reuse of digital objects for the next 
generations 
• Extensions to current preservation and lifecycle models to 
address the evolution of dynamic heterogeneous resources and 
their dependencies 
• Models capturing intent and interpretative context: key to 
achieving “preservation by design”
Facts & Figures 
• Collaborative FP7 project on digital preservation 
• 12 million Euro, co-funded by the European Commission 
• 11 partners: research institutions, IT development and 
application domain 
• 6 European countries 
• Feb 2013 – Feb 2017 
• Project website: http://www.pericles-project.eu
Consortium 
COORDINATOR: King’ s College London – UK 
ACADEMIC PARTNERS: 
Hoegskolan i Borås – University of Borås – SE 
Georg-August-Universität Göttingen – DE 
University of Liverpool – UK 
Centre for Research and Technology Hellas – GR 
University of Edinburgh – UK 
NON-ACADEMIC PUBLIC SECTOR ORGANISATIONS 
Tate – UK 
Belgian User Service and Operation Centre - B.USOC – BE 
PRIVATE SECTOR ORGANISATIONS 
Dotsoft – GR 
Space Applications Services NV/SA (SpaceApps) – BE 
Xerox Research Centre Europe - FR

IPRES 2014 paper presentation: significant environment information for LTDP

  • 1.
    SIGNIFICANT ENVIRONMENT INFORMATIONFOR LTDP Fabio Corubolo, Adil Hasan – University of Liverpool Anna Eggers, Jens Ludwig - Göttingen State University Library Mark Hedges, Simon Waddington - King’s College London This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no FP7-601138 PERICLES.
  • 2.
    Objective and outline • Aim: Ensure long term usability of Digital Objects (DO) • Usability of Digital Object usually requires access to parts of its environment • Define a broad set of information (Environment information) • Consider its significance (Significant environment information) • Explore and test pragmatic methods to collect such information
  • 3.
    Environment information definition • All the entities (DOs, metadata, policies, rights, services, users, etc.) useful to correctly access, render and use the DO. Refinement: • The information about the set of relationships between the source DO and any related objects from its environment.
  • 4.
    Environment for aDO • Technical system information (OS, system architecture, etc.) • DO metadata (descriptive, structural, technical) • User, policy, process information (User BG knowledge, …) • Information necessary to make use of the object including: • Auxiliary data (e.g. calibration data for to support sensor data) • External documentation (e.g. specifications, related documents) • Implicit knowledge about what data is useful to use the DO (e.g. the user knowledge about what is relevant and what not in the collection) • More…
  • 5.
    No object isan island, entire of itself • Digital objects are used in a rich environment Digital object Ext. Metadata Environment Storage Digital object
  • 6.
    Digital object information • Rich and varied terminology • The scope of each term is not absolutely defined • We are aiming to support object use: use-centric view • First broad - Environment information: more or less all that sits outside of the DO
  • 7.
    Standards, and coverage– initial analysis
  • 8.
    Significant Environment Information(SEI) • Use of a DO has a purpose • The purpose gives a scope to the dependent environment information • Weights can express the importance for a specific purpose (definition) We define SEI as the set of relationships between a DO and its environment information qualified with purpose and weights
  • 9.
    How to collectand measure SEI? • Observe the use of DOs – in different phases of lifecycle • in the environment of creation and use • Collect dependencies for use (relationships to other DOs) • Measure significance e.g. based on frequency of use • Different semantics and factors for significance weights (value,…) – WIP • Weights will change in time • Sheer curation: curation activities integrated in the use workflow; lightweight and transparent
  • 10.
    Pericles Extraction Tool(PET) • Open source* framework - builds on the SEI concepts • Uses a sheer curation approach – right time and place • Generic, modular, domain agnostic • Collection by observation – monitoring changes in time • Snapshot of the system environment • To observe unstructured workflows • https://github.com/pericles-project/pet * Release due soon, approved but waiting for final stamps
  • 11.
    PET Architecture andmodules • Available and used system resources; • File format identification and checksums; • Currently running processes; • Event information (file and network) from processes; • Graphic configuration information; • MS Office and PDF font dependencies. • Native commands
  • 12.
  • 13.
    How to setupPET for a use scenario • PET is installed, configured, started on the machine where the DOs are used – stays in monitoring mode • The profile (modules and configuration) are use case specific • The user interacts normally with the DOs while PET collects SEI in the background • The environment information, DO events and changes are collected for future use and analysis
  • 14.
    General scenario forPET 1. Use PET to collect environment information when-where the DOs are used, based on profiles --- We are now here --- 2. Analyse the information collected to infer new relationships (also SEI) between DOs - forming a graph structure 3. Assign weights to relationships based on the purpose and significance – weighted graph
  • 15.
    Experiment: use casedescription • Fictional scenario, based on operations for ISS SOLAR payload • Operator’s task: resolve anomalies • Process: extensive search in the archived data + documents • Issue: how to preserve implicit information, help with overload • PET task: record SEI for a specific anomaly • monitor environment, record significant events, infer documentation useful to solve the anomaly • SEI: to identify and debug a specific anomaly, that is the implicit operator knowledge
  • 16.
    Experimental results (1) An anomaly is reported in an handover sheet The operator proceeds with documentation search and consultation, all tracked by PET
  • 17.
    Experimental results (2) • Environment monitoring • Events, extraction on occurrence of events • Leads to dependency inference • In future work we consider more complex issues • ‘noise’ from multitask, • careful analysis of collected data in the next phases
  • 18.
    Conclusions, Future work • Define Significant Environment Information (SEI) for object reuse • Base for dependency graphs weighted on significance and purpose • Explain ways to obtain SEI and significance weights • Present the PET tool – to collect SEI • Show experimental results - initial dependency collection Future: • Improve: filtering, dependency inference • Work on definition and semantics for significance weights • Use weighted dependency graphs to support appraisal
  • 19.
    Thank you! Moreinformation: • https://github.com/pericles-project/pet
  • 20.
    About the PERICLESproject • Promoting and enhancing reuse of information throughout the content lifecycle taking account of evolving semantics • Ensure availability and reuse of digital objects for the next generations • Extensions to current preservation and lifecycle models to address the evolution of dynamic heterogeneous resources and their dependencies • Models capturing intent and interpretative context: key to achieving “preservation by design”
  • 21.
    Facts & Figures • Collaborative FP7 project on digital preservation • 12 million Euro, co-funded by the European Commission • 11 partners: research institutions, IT development and application domain • 6 European countries • Feb 2013 – Feb 2017 • Project website: http://www.pericles-project.eu
  • 22.
    Consortium COORDINATOR: King’s College London – UK ACADEMIC PARTNERS: Hoegskolan i Borås – University of Borås – SE Georg-August-Universität Göttingen – DE University of Liverpool – UK Centre for Research and Technology Hellas – GR University of Edinburgh – UK NON-ACADEMIC PUBLIC SECTOR ORGANISATIONS Tate – UK Belgian User Service and Operation Centre - B.USOC – BE PRIVATE SECTOR ORGANISATIONS Dotsoft – GR Space Applications Services NV/SA (SpaceApps) – BE Xerox Research Centre Europe - FR

Editor's Notes

  • #3 WE want to collect important information that could be lost if not gathered at the right time.
  • #5 Users and their interaction with DOs are also to be considered part of the environment!
  • #7 This is just ONE vision on the different sets of data. I think it’s a reasonable one, but not for sure the absolute truth.  Environment here thought as ‘where data lives’ Environment does not necessarily have a structure (metadata has usually standards) and that can include a lot of not necessarily related information.  This is to say, it’s still not qualified as ‘data about the data’ but as ‘where the data lives’; so likely much broader.  Another definition of environment is ‘anything that is not the object’ that is to say the universe - the object.
  • #16 PLEASE NOTE: THIS IS One example – based on one scenario, I prefer to give you a complete example in one scenario, but there are many possible scenarios that can be addressed by PET with proper configuration and modules. I will now introduce briefly a synthetic scenario (fictional) inspired by the BUSOC mission operators use case - Busoc operators are sometime facing the task of resolving anomalies, such as when some instrument does not respond as expected the process they follow is guided by their knowledge of the domain and involves research on the archived documentation and operation data can include for example solutions from previous anomalies, telemetry, console logs, meeting notes, emails, etc. Such data, although present in the storage, requires experience and its selection is a task that requires specific knowledge that is usually passed from operator to operator - the issue we want to address is that of preserving the useful information that is in the use of specific documents from the large collection in order to solve the issue, and help the operators with the information overload. the task the PET tool is trying to accomplish is to record the SEI for this use case, for a specific anomaly. This is done by monitoring the environment and recording significant events (via a PET profile) and from there allow the inferring of new dependencies dependencies between anomalies and mission documentation, in order to preserve useful information that is otherwise not captured. The SEI in this case is EI that will help to identify and debug a specific anomaly
  • #17  we set up a specific PET profile that tracks the use of relevant software on specific files, using the PET software monitor; this enables us to have a trace of the documents that have been used at a given moment in time At the same time, it is possible to observe the ‘handover sheet’ and track the reporting of an anomaly start and end times The connection between the documentation track and the ‘handover sheet’ tracking can allow us to infer the ‘anomaly solving time span’ (indicated with a red line in Figure 4) and assume there is a dependency between the solution to the anomaly and the documentation that was used between the start and end of the anomaly. In future work we will consider more complex issues that we have ignored in this simplified example, such as the ‘noise’ that can be reported by the event tracking. This ‘noise’ can be for example due to the fact that users often multitask, so there can be unrelated documentation that was used but not relevant to the anomaly solution, or documentation that was quickly opened and closed may also indicate in some cases that the document was not relevant. We will explore also ways to obtain a fine-grained tracking, as for example to include what pages have been consulted in a document. We are planning to dedicate effort to a more careful analysis of the collected data in the next phases.
  • #19 In this paper we presented our work on determining what information is significant to collect, from the widest set of the Environment Information. We presented a definition of Significant Environment Information that takes into account the purpose of use of a DO, and can apply to relationships with significance weights. We also presented ways to determine significance weights and their relations to the DO lifecycle. Finally, we presented the tool we are developing to collect such information, together with its methods of extraction, and showed experimental results to support the importance of such information. We believe the importance of the contribution also lies in the way that the information is collected, that is domain agnostic and aims at collection in the context of spontaneous workflows, with minimal input from the user and very limited assumption on the system structure and its infrastructure. We plan to continue our work on exploring new methods of automated information collection, and improving the filtering and inference of dependencies. We also plan to explore and implement the methods for determining significance described in section 3.1, and look at the aspects of dependency graphs based on the purpose and significance weights that the tool will allow to infer.