This presentation was delivered by Justin and Joel Simpson (Artefactual), Fabio Corubolo (Univ. Liverpool/PERICLES), Jean-Yves Vion-Dury (Xerox/PERICLES) and Stratos Kontopoulos (CERTH/PERICLES) within the interactive workshop 'Modelling Policies, Exploring Real Use Cases' which took place at final project conference 'Acting on Change: New Approaches and Future Practices in LTDP' (Wellcome Collection Conference Centre, London, 30 Nov -1 Dec 2016).
This workshop explored the application of the PERICLES approach for policy and digital ecosystems modelling for real world preservation policies, provided by Artefactual and by the attendees. A video-player example was used to illustrate change propagation. In the second part of the workshop participants joined in a practical exercise exploring changing needs for Email preservation.
http://pericles-project.eu/
My INSURER PTE LTD - Insurtech Innovation Award 2024
PERICLES Modelling Policies - Acting on Change 2016
1. GRANT AGREEMENT: 601138 | SCHEME FP7 ICT 2011.4.3
Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics
[Digital Preservation]
“This project has received funding from the European Union’s Seventh
Framework Programme for research, technological development and
demonstration under grant agreement no601138”.
INTERACTIVE WORKSHOP: Modelling
Policies, Exploring Real Use Cases
Justin Simpson (Artefactual)
Fabio Corubolo (Univ. Liverpool/PERICLES)
Jean-Yves Vion-Dury (Xerox/PERICLES)
Stratos Kontopoulos (CERTH/PERICLES)
Joel Simpson (Artefactual)
@PericlesFP7
#PERIconf2016
3. ▶ What do different types of stakeholders (e.g.
archivists vs. technologists, generalists vs.
specialists) think are significant from a
“policy” perspective?
▶ Are ontologies a helpful tool in describing
policies?
▶ How could we represent real policies with a
proposed design pattern?
Objectives
4. Why Model Policies?
We believe the focus on policies in Archivematica
today provides a number of benefits
▶ Simplification: separating rules (policies) from
workflow make both easier to configure and manage
▶ Understandability: abstracting policies from
technical implementation enables non-technical
users to interact more directly with the system
▶ Shareability: enables some level of sharing best
practices across the community
5. Why Model Policies?
We think the PERICLES approach may help us
improve upon our existing focus on policies:
▶ Simplification: many important preservation
decisions are still deeply embedded in technical
implementation
▶ Understandability: using well defined vocabularies &
languages (ontologies) to define policy will make
make it easier to be precise and eliminate ambiguity
▶ Shareability: using common standards will make it
easier to share policy within a community
6. Why Model Policies? New Benefits
▶ Impact analysis: ability to determine the impact of
a system or policy change before it is committed
▶ Reasoning / change management: In some cases
we can automate the management (resolution) of
change issues
▶ Validation: we can attach ad hoc validation
processes (tests)
▶ Reuse: making use of existing ontological
knowledge bases on formats and preservation
policies in general
7. Abstraction of complex systems as models that can
be manipulated independently
Model-driven Preservation
Models
Digital ecosystem
◦ Analogy with biological
systems
◦ Evolving systems of
interdependent entities
Capture and
representation
of the
environment
Continuous
change and reuse
Continuum
approach
▶ Merging of active-life
and archival phases
▶ Non-custodial
8. ▶ Models can be constructed on existing
infrastructure
▶ Does not require replacing existing services
▶ Add preservation and policy management on
top of what exists
▶ Save in costs and adoption time
Model-driven approach
9. Pericles introduced ontologies at different
levels, that are partially independent:
▶ LRM - ontology for linked resources
▶ Policy ODP - generic policy ontology
▶ DEM - formalism for digital ecosystem (uses
LRM)
▶ Domain specific instances
PERICLES Ontologies
10. ▶ Relation between change and dependency
▶ Understanding dependencies between digital objects
and resources within their environment is key to
manage change
▶ Given objects A and B, A is dependent on B if
changes to B have a significant impact on the state
of A, or if changes to B can impact the ability to
perform function X on A.
Dependency and Change
11. Dependency: the association, relation or interaction
among two or more Resources
Plan: presents a set of actions/steps to be executed by
Agent
precondition and impact
Description:
intention: the intended usage of a Resource
specification: the context of the Dependency itself
LRM Dependency
19. ▶ Policies can be expressed in formal
languages
◦ SPIN and ReAL language work on ontologies
▶ They can impose constraints, perform
changes, validation
▶ Changes in the models (incl. policies) can be
managed using these rules and techniques
Modelling for Change Management
20. ▶ Policy: all videos from Collection X must be
renderable on at least one of players Y
▶ Model based on the ODP pattern we just
described
▶ Uses PERICLES models and ideas
▶ This policy is a pattern on its own: keep data
processable
Video Playback Example
24. ▶ Policy: Preferred email preservation format is
“maildir”
▶ Preservation Task: When email is provided in a
format that is not suitable for preservation, we
normalize the email to the appropriate
preservation format.
◦ E.g. Normalize a “pst” object (a proprietary
email format from Microsoft) into an “maildir”
object.
▶ Implementation: We use the open source tool
“readpst”.
Example
25. ▶ Policy: Preferred email preservation format is
“mbox”
▶ Preservation Task: When email is provided in a
format that is not suitable for preservation, we
normalize the email to the appropriate
preservation format.
◦ E.g. Normalize a “pst” object (a proprietary
email format from Microsoft) into an “mbox”
object.
▶ Implementation: We use the open source tool
“readpst”.
Change:
28. Exercise Time!
Modelling Considerations –
1. Process vs. Policy: What’s important to define as “policy” from a
preservation perspective? What aspects of the preservation process
would be significant from a compliance point of view? Are there aspects
of the process that are purely operational or technical (e.g. do we care
which tool is used to perform a process? should that be part of policy?)
2. PERICLES Design Pattern for Policies: how would you describe the policy
(or policies) from the example process using the constructs in the
design pattern for policies? (e.g. requirement level; policy type)
3. Linked Resource Model: how would you model the policy along with the
related concepts in a digital ecosystem model? e.g. what preconditions
should exist, or specifications, impacts etc.
29. Exercise Time!
Consider one of the following example preservation processes - which aspects of
these processes make sense to model as policies? What might that model look like?
1. Extracting Attachments from email to enable other preservation processes to act
on the individual attachment (e.g. format identification, characterization,
normalization)
2. Format Identification using tools such as Siegfried or Droid to identify the format
of digital objects (email formats: .msg,.pst,.eml,.mbox or attachments:
.doc,.pptx)
3. Virus Scans & Quarantine processes - using tools such as ClamAV to identify
viruses and taking further action to address viruses found
4. Format Validation using tools such as JHove to determine if a particular digital
object fully or partially complies with the specification for the purported format
5. Email Signature Validation - the process of validating individual emails that have
been provided with a digital signature (e.g. using DKIM or DMARC)
Current Benefits in Archivematica:
separating rules (policies) from workflow -- there are hundreds (or even thousands) of formats and format specific rules, but only a small number of core processes or worfklow tasks (e.g. format identification, characterization, etc.) - separating these concerns make both easier to configure and manage
rules and commands abstracted from the specifics of an implementation (i.e. location of servers, configuration or sequence of workflow etc) makes it easier
for non-technical users to understand & operate key aspects of system behaviour (vs. having system behaviour obfuscated behind layers of technical detail)
to share / re-use across institutions (again, because the “policy” level concerns are what we want to share, more than the specifics of implementation)
Potential Improvements:
simplification: the policies in Archivematica are very simple / rudimentary. The approach Pericles has taken is far more sophisticated - and while in some respects this adds complexity, once understood, we believe that the ontology approach may help to further separate ‘technical’ or ‘operational’ concerns from ‘preservation’ concerns
understandability: currently Archivematica has a very rudimentary “policy language”. Making this more expressive and well defined will allow users to be more expressive and map more of the ‘policy’ objectives of an institution to the actual implementation
shareability: currently Archivematica has implemented something unique to Archivematica. We see the ontologies that Pericles has been working on and demonstrating as a promising way to develop standards that can be shared within and across communities
New Benefits from the Pericles Approach
impact analysis is the starting point of change management. We can model what a new policy will look like, and see how that affects the other objects in our digital ecosystem. This allows to make informed decisions and address in advance potential issues when changing a policy.
change management: some type of changes consequences in an ecosystem can be automatically managed ( change management) and resolved, thanks to appropriate rules (precondition-impact) - so for example a some specific type of ecosystem change (e.g. new file formats being submitted, changes in the supported software ecosystem) can be fixed automatically (e.g. by migrating data)
Validation: automated tests are defined and executed when change happens or periodically; validating the correct policy implementation. This allows to notice when changes have impact on the correct policy implementation
Reuse: Ontological structures in general (linked data etc.) are designed to allow easy reuse of existing data sources. By making use of these, existing ontologies (e.g. the SCAPE policy ontology, etc.) can be reused.
NOTE -- in the examples we will give later, we will focus primarily on the ‘impact analysis’ benefit -- in other words, showing how we can assess the impact of changes to policy over time.
Understand the wider context around digital objects that impacts their long-term reuse
Focus on pragmatic approaches that facilitate implementation and reuse of existing infrastructure, saving cost and time of implementation.
Description of the policies, constraints and validation methods, and their dependencies in human readable form enable users to communicate and define requirements, to record and share the knowledge and decisions taken when implementing policies.
This help communicate general objectives of an organisation, and how these map to concrete infrastructure and requirements, and is not limited to defining constraints and mandatory practices. This is an important record and communication tool per se.
In PERICLES we introduced different ontologies at different levels, that can be used in conjunction or separately.
We are going to focus here on the first two ontologies.
The theoretical model for policies and change management is independent of a specific implementation to give the maximum flexibility in adaptation (low barrier of entry)
Plan is a mean of giving operational semantics to dependencies. Plans can describe how preconditions and impacts are checked and implemented (this could be for example defined via a formal rule-based language, such as SWRL). The semantics of the instance of a Dependency is conveyed by proper instances of lrm: Intention (WHY) and lrm: Specification (WHAT)
Explanations for all constructs are available at: http://ontologydesignpatterns.org/wiki/Submissions:Policy
Explanations for all constructs are available at: http://ontologydesignpatterns.org/wiki/Submissions:Policy
Explanations for all constructs are available at: http://ontologydesignpatterns.org/wiki/Submissions:Policy
Explanations for all constructs are available at: http://ontologydesignpatterns.org/wiki/Submissions:Policy
Explanations for all constructs are available at: http://ontologydesignpatterns.org/wiki/Submissions:Policy
Explanations for all constructs are available at: http://ontologydesignpatterns.org/wiki/Submissions:Policy
The modelling for change management, as already mentioned, can be implemented using different technologies.
These can be implemented using dependency concepts, or using rules and change notification (SPIN rules).
As a pattern, it can be adapted to many similar type of preservation policies, and can reuse the structure and partially the rules and models.
This is an implementation of complex change management for a typical use case (processability).
The rules handle different change in different elements of the model.
As a pattern, it can be adapted to many similar type of preservation policies, and can reuse the structure and partially the rules and models.
This pattern makes use of change propagation and the dependency concepts
In this case we show a more complex ecosystem with different policies that have been modelled with the PERICLES methodology. This complex view does not contain the rules and change management, but shows how the concepts can be adopted at different levels of granularity and using different implementations (rules and change management or change driven tests for policy validation)
We have an email that is generated by a process with format "pst". At the beginning it conforms with the policy i.e the policy also defines that the metadata extraction process should generate emails with "pst" format. This is represented by attached file "dependency-minimal-v1.0.1_Conformance.ttl".
RESPONSE - Nikos
Currently in the example of slide 25 the policy defines "pst" as the required format. Then we assume that it changes to define "mbox" as the required value for the format. Here we do not focus on the normalisation dependency from "pst" to "maldir".
We rather focus on the scenario of changing the requirements of the policy from "pst" format to "mbox" (apparently "pst" here could change rather easily to "mailDir" so that we are more in sync with your example if this poses a problem) .
At a second stage the policy changes and defines the the format should be "mBox". In that case the email format should change to mBox via a corresponding normalisation process if the appropriate tool is available.
To achieve that, two dependencies are modelled.
- One between the policy and the metadata extraction process that can be used to check whether the format related criterion is satisfied. If it is not satisfied (precondition_1) then a Warning is generated (impact_1).
- The second dependency links the email format normalisation process to the corresponding tool (readPst). If the tool is available and a Warning is raised by the first dependency (precondition_2) then 1. the format of the email will be changed from pst to mBox 2. a warning (Fix) that the old pst metadata has to be deleted will be entered (this will be an lrm:Delta in the next version) 3. an RDF-triple corresponding to premis use will be added <premis:eventDetail> program="readpst"; version="readpst 1.5" </premis:eventDetail> .
The idea here is to provide a high level ‘to do’ list to help participants think through how they would model policies from a particular example of a preservation process.
before modelling, a discussion can be had about the preservation process in ‘business terms’. What are the important aspects that might benefit from being set out as policies? I think this might be a good starting point because the examples are all very common preservation processes, and you don’t need any experience with ontologies or modelling to start discussing the content
we could have some print outs of the design pattern for policies… just as a checklist to work through and identify the different attributes etc. fo the policy
same as per linked resource model… have the slide showing the constructs printed out as a prompt to help people start to model
These are 5 suggestions that groups could use… my thinking is that these are very common preservation processes (with the exception of the last one). So people should be able to imagine the kind of details that typically happen during these processes, and consider how that might apply to email.