GRANT AGREEMENT: 601138 | SCHEME FP7 ICT 2011.4.3
Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving
Semantics [Digital Preservation]
Fabio Corubolo, University of Liverpool
11 February, IDCC 2015, London
Ensure long term usability of DOs
Observation:
Use of DO ⇒ access to DO’s environment
Environment
 Digital objects are used in a rich environment
Digital
object
Ext.
Metadata
Storage
Digital
object
 Define a broad set of information
 Consider its significance and purposes
 Explore pragmatic methods to collect such
information
 Technical system information
 DO metadata
 User, policy, process information
 Information necessary to use the DO:
◦ Auxiliary data (e.g. calibration data)
◦ External documentation (e.g. related documents)
◦ Implicit knowledge (e.g. user knowledge about
relevance in relation to purpose
◦ …
 All the entities that have some relationship to
a DO through its lifecycle
 Entities: DOs, metadata, policies, rights,
services, users, etc.
Refinement:
 Information about the set of relationships
from the DO to any related objects
 DOs are preserved for different uses,
purposes
 Purposes give scope to the dependent
environment information
 Weights can express based on purpose
(definition)
SEI is the set of relationships between a DO and
its environment information qualified with
purpose and weights
 Observe the use of DOs throughout of lifecycle
◦ Curation doesn’t start at the archive but throughout
DO’s life
 Collect dependencies for use (SEI)
 Measure significance
 Sheer curation:
◦ curation activities integrated in the use workflow;
◦ lightweight and transparent
 Open source* framework - builds on the SEI
 Sheer curation – at the right time and place
 Generic, modular, domain agnostic
 Flexible configuration and profiles
 Monitoring changes in time
 Snapshot of the system environment
 User is in full control of the app and data
 To observe unstructured workflows
* Apache 2.0 licensed, on GitHub
 Install PET, configure, leave it monitoring
 Profile is use case specific
 User interacts with DOs, PET collects in BG
◦ Environment information,
◦ DO events
◦ Changes
1. Collect EI: User is using a machine, PET
installed and running in BG
--- We are now here ---
2. SEI graph: PET data analyzed, relationships
between DOs discovered.
3. Weighted SEI graph: assign weights to
relationships (with purpose and significance)
4. Graphs can help:
1. understand inter-document relationships
2. appraisal of documents; defining collections
 Show aspects of PET tool
 Operator’s task: resolve anomalies
 Process: extensive search in the archived data
 Issue: preserve implicit information, help with
overload
 PET task: record SEI for a specific anomaly
◦ monitor environment, record significant events,
infer documentation useful to solve the anomaly
 SEI: to identify and debug a specific anomaly,
that is the implicit operator knowledge
An anomaly is reported in an handover sheet
The operator proceeds with
documentation search and
consultation, all tracked by
PET
 Improve: filtering, dependency inference
 Semantics for SEI and significance weights
 Explore weighted dependency graphs to
support appraisal
Can you think of other situations where PET
could be useful in your practice?
Get involved! This is open source (-:
 https://github.com/pericles-project/pet

Slides for IDCC PET presentation

  • 1.
    GRANT AGREEMENT: 601138| SCHEME FP7 ICT 2011.4.3 Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics [Digital Preservation] Fabio Corubolo, University of Liverpool 11 February, IDCC 2015, London
  • 2.
    Ensure long termusability of DOs Observation: Use of DO ⇒ access to DO’s environment
  • 3.
    Environment  Digital objectsare used in a rich environment Digital object Ext. Metadata Storage Digital object
  • 4.
     Define abroad set of information  Consider its significance and purposes  Explore pragmatic methods to collect such information
  • 5.
     Technical systeminformation  DO metadata  User, policy, process information  Information necessary to use the DO: ◦ Auxiliary data (e.g. calibration data) ◦ External documentation (e.g. related documents) ◦ Implicit knowledge (e.g. user knowledge about relevance in relation to purpose ◦ …
  • 6.
     All theentities that have some relationship to a DO through its lifecycle  Entities: DOs, metadata, policies, rights, services, users, etc. Refinement:  Information about the set of relationships from the DO to any related objects
  • 7.
     DOs arepreserved for different uses, purposes  Purposes give scope to the dependent environment information  Weights can express based on purpose (definition) SEI is the set of relationships between a DO and its environment information qualified with purpose and weights
  • 8.
     Observe theuse of DOs throughout of lifecycle ◦ Curation doesn’t start at the archive but throughout DO’s life  Collect dependencies for use (SEI)  Measure significance  Sheer curation: ◦ curation activities integrated in the use workflow; ◦ lightweight and transparent
  • 9.
     Open source*framework - builds on the SEI  Sheer curation – at the right time and place  Generic, modular, domain agnostic  Flexible configuration and profiles  Monitoring changes in time  Snapshot of the system environment  User is in full control of the app and data  To observe unstructured workflows * Apache 2.0 licensed, on GitHub
  • 11.
     Install PET,configure, leave it monitoring  Profile is use case specific  User interacts with DOs, PET collects in BG ◦ Environment information, ◦ DO events ◦ Changes
  • 12.
    1. Collect EI:User is using a machine, PET installed and running in BG --- We are now here --- 2. SEI graph: PET data analyzed, relationships between DOs discovered. 3. Weighted SEI graph: assign weights to relationships (with purpose and significance) 4. Graphs can help: 1. understand inter-document relationships 2. appraisal of documents; defining collections
  • 13.
     Show aspectsof PET tool
  • 16.
     Operator’s task:resolve anomalies  Process: extensive search in the archived data  Issue: preserve implicit information, help with overload  PET task: record SEI for a specific anomaly ◦ monitor environment, record significant events, infer documentation useful to solve the anomaly  SEI: to identify and debug a specific anomaly, that is the implicit operator knowledge
  • 17.
    An anomaly isreported in an handover sheet The operator proceeds with documentation search and consultation, all tracked by PET
  • 18.
     Improve: filtering,dependency inference  Semantics for SEI and significance weights  Explore weighted dependency graphs to support appraisal
  • 19.
    Can you thinkof other situations where PET could be useful in your practice?
  • 20.
    Get involved! Thisis open source (-:  https://github.com/pericles-project/pet

Editor's Notes

  • #3 WE want to collect important information that could be lost if not gathered at the right time. Aim: Ensure long term usability of Digital Objects Observation: Usability of Digital Object can require access to parts of its environment
  • #5  Define a broad set of information (Environment information) Consider its significance (Significant environment information) Explore and test pragmatic methods to collect such information (PET)
  • #6  Technical system information (OS, system architecture, etc.) DO metadata (descriptive, structural, technical) User, policy, process information (User BG knowledge, interaction with the system and document collections, use data etc. Information necessary to make use of the object: Auxiliary data (e.g. calibration data support sensor data) External documentation (e.g. specifications, related documents) Implicit knowledge about what data is useful to use the DO (e.g. the user knowledge about what is relevant and what not in the collection) …
  • #9 Observe the use of DOs throughout of lifecycle Collect dependencies for use (SEI) Measure significance E.g. based on frequency of use Different semantics and factors for significance weights Weights will change in time Sheer curation: curation activities integrated in the use workflow; lightweight and transparent
  • #10 Extracts information that is usually ignored by current metadata extractors. Visualizes information change over time. Information snapshot extractions allow getting a quick overview of extractable information.
  • #11 Modules: Available and used system resources; File format identification and checksums; Currently running processes; Event information (file and network) from processes; Graphic configuration information; MS Office and PDF font dependencies. Native commands
  • #12 PET is installed, configured, started on the machine where the DOs are used – stays in monitoring mode The profile (modules and configuration) are use case specific The user interacts normally with the DOs while PET collects in the background Collects environment information, DO events and changes for future use and analysis (for future use and analysis)
  • #13 User is using a machine, were PET runs in background, observing the use of documents --- We are now here --- Collected data is analyzed and relationships between Dos are derived; this will form a SEI graph Assign weights to relationships based on the purpose and significance – weighted graph SEI graphs can help understanding inter-document relationships and appraisal of documents; collection building and analysis
  • #17 PLEASE NOTE: THIS IS One example – based on one scenario, I prefer to give you a complete example in one scenario, but there are many possible scenarios that can be addressed by PET with proper configuration and modules. I will now introduce briefly a synthetic scenario (fictional) inspired by the BUSOC mission operators use case - Busoc operators are sometime facing the task of resolving anomalies, such as when some instrument does not respond as expected the process they follow is guided by their knowledge of the domain and involves research on the archived documentation and operation data can include for example solutions from previous anomalies, telemetry, console logs, meeting notes, emails, etc. Such data, although present in the storage, requires experience and its selection is a task that requires specific knowledge that is usually passed from operator to operator - the issue we want to address is that of preserving the useful information that is in the use of specific documents from the large collection in order to solve the issue, and help the operators with the information overload. the task the PET tool is trying to accomplish is to record the SEI for this use case, for a specific anomaly. This is done by monitoring the environment and recording significant events (via a PET profile) and from there allow the inferring of new dependencies dependencies between anomalies and mission documentation, in order to preserve useful information that is otherwise not captured. The SEI in this case is EI that will help to identify and debug a specific anomaly
  • #18  we set up a specific PET profile that tracks the use of relevant software on specific files, using the PET software monitor; this enables us to have a trace of the documents that have been used at a given moment in time At the same time, it is possible to observe the ‘handover sheet’ and track the reporting of an anomaly start and end times The connection between the documentation track and the ‘handover sheet’ tracking can allow us to infer the ‘anomaly solving time span’ (indicated with a red line in Figure 4) and assume there is a dependency between the solution to the anomaly and the documentation that was used between the start and end of the anomaly. In future work we will consider more complex issues that we have ignored in this simplified example, such as the ‘noise’ that can be reported by the event tracking. This ‘noise’ can be for example due to the fact that users often multitask, so there can be unrelated documentation that was used but not relevant to the anomaly solution, or documentation that was quickly opened and closed may also indicate in some cases that the document was not relevant. We will explore also ways to obtain a fine-grained tracking, as for example to include what pages have been consulted in a document. We are planning to dedicate effort to a more careful analysis of the collected data in the next phases.