This presentation was delivered by Simon Waddington from PERICLES project partner King's College London at the 12th International Digital Curation Conference (IDCC17).
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Technical appraisal and change impact analysis - IDCC17 workshop
1. GRANT AGREEMENT: 601138 | SCHEME FP7 ICT 2011.4.3
Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving
Semantics [Digital Preservation]
Simon Waddington (King’s College London)
Technical appraisal and change
impact analysis
2. Appraisal
◦ Aims to determine which data should be kept by an
organisation
◦ Traditionally performed prior to transfer to an archive
◦ Guided by policies based on defined criteria
Technical appraisal
◦ Evaluation of the (on-going) feasibility of preserving the
digital objects
◦ Answers the question “can we preserve?”
3. Simple digital objects
◦ E.g. files, software applications, operating systems
◦ Include hardware specification
Complex digital objects
◦ Digital objects made by combining a number of simple
digital objects
Dependency
◦ Relationships between components of a complex digital
object
◦ Functional relationship
4. Digital video artwork Science experiment object
Video codec Container
Media player
Operating system
Computer
Digital video
Document ViewerImage Viewer
Image File
Scripting Language
Database
Document File
5. Complex digital objects subject to changing
external environment
◦ Technical appraisal required on an ongoing basis to
support long term reuse
Reusability implies complex digital objects
may need to be adapted
◦ Potential adaptations termed recovery options
◦ Significant properties – specify what features should be
maintained
Main risk considered is availability
◦ Obsolescence
◦ Hardware failure
6. Is this the Flying Scotsman?
◦ Cost of the restoration £4.5 million from 2006–2016
7. Digital video artwork
◦ Comprises videos and their surrounding technical environment
◦ Video codec, audio codec, subtitles, container, media player,
operating system, computer, display
Mary - digital art conservator
◦ Supports acquisition decisions
◦ Maintains artworks for exhibition
◦ Has limited technical knowledge of video
◦ Has no control over the technologies used by artists
Artworks are required for ongoing display
◦ Adapt artwork to current technical environment
◦ Maintain viewing experience rather than use of specific technologies
◦ Potentially exist in multiple versions
Artworks may be maintained indefinitely
Sow Farm by John Gerrard
8. Space science experiment
◦ Raw data captured by instrument, stored in database
◦ Scripts written by scientists to process raw data
◦ Image files and documents generated by scripts
Steve – space science data manager
◦ Responsible for maintaining data from multiple experiments
◦ Little or no control on the technologies used by scientists
◦ Large volumes of experiments to deal with
Examples
◦ Earth observation, solar measurements, material science, cell biology
◦ Often time-related and expensive/impossible to replicate
Reuse – continuing over long timeframes
◦ Compare performance of different instruments
◦ Compare processing techniques
◦ Determine long term trends e.g. in solar activity
◦ Deal with errors and anomalies
9. What are the external risks to a complex
digital object?
What are the proximity and impact of those
risks and what are the recovery options?
Implementation of the chosen recovery
option
10. Maintain inventory of artworks and
components
◦ Video formats, players, operating systems etc.
Monitoring the external environment
◦ Aka preservation watch
◦ Monitors websites and external news sources
◦ Networks with fellow conservators
Technical analysis
◦ Records technical specifications of components
◦ Learns from practical experience of testing
11. External monitoring is time-consuming and
unreliable
◦ E.g. QuickTime formats
Hard to plan forward
◦ Sudden unavailability of a component hard to predict
rigorously
◦ May imply a large amount of work if a technology is used in
many artworks
Compatibility of components
◦ Based on human experience rather than a systematic model
Difficult in determining recovery options
◦ Time-consuming analysis and testing of many options
12. Large variety of scripting languages and formats
used by scientists
◦ No control of the technologies used
Unable to warn scientists that their experiments
may need to be updated to maintain reusability
Can’t support scientists who want to rerun a
particular experiment
◦ E.g. provide information on website
Unfamiliar with older technologies
13. Normalisation
◦ Convert objects to one or more “long-lived” formats
◦ Performed systematically on all objects at acquisition
Problems
◦ Objects may discarded before they require any adaptation
◦ Objects may already be sufficiently “future proof”
◦ May imply major re-engineering, whereas only minor changes
are sufficient
◦ Could increase risks if wrong choices are made
Freezing
◦ E.g. virtualisation
◦ Software licensing, security and compliance issues
◦ May be impossible to source suitable hardware
◦ May not be acceptable to users e.g. scientists
14. Automated tool to assist in appraisal
Main features
◦ Automated harvesting of environmental data and trend
analysis
◦ Pre-built domain models for digital video and space science
experiments
◦ Collection-level risk, proximity and impact analysis
◦ Component-level risk, proximity and impact analysis
◦ Object-level analysis and determination of recovery options
Storage
◦ Tool creates a registry of objects
◦ Objects themselves are not stored in the tool
15. Applied in industries such as aviation
Determine availability of hardware
components
Standardised
lifecycle
model for a
technology
◦ Units shipped
against time
16. Compute lifecycle curve from harvested data
◦ Software repositories e.g. commits and downloads
◦ Search engines
◦ Wikipedia
◦ Usage tracking data
◦ Social networks
Confidence measure
◦ Correlate results across different data sources
Calibration
◦ Compare results with known dates e.g. operating systems
Validation
◦ Operating systems have known end of support dates
◦ Predict start date from incomplete time series
17. 2012 2014 2016 2018 2020 2022 2024
Video
codec
Container
Media
player
Operating
system
Computer
Current
obsolescence
Recovery
option 1
Recovery
option 2
Recovery
option 3
18. Representation of the entities and dependencies
◦ OWL ontology
◦ Scope - decision about what to leave in and what to leave out
Layered model
◦ Domain-independent ontology (Linked Resource Model) to
describe change
◦ Domain-dependent ontology – describes e.g. video
components
Inherits from existing domain ontologies (e.g. CIDOC-CRM)
Modular
◦ Supports reuse in different applications
◦ Ontology design patterns
19. Describes the compatibility between instances
◦ E.g. media player X and video codec Y
Does not guarantee compatibility
◦ Recoverability options require testing and validation
◦ Enables alternatives to be excluded
Features
◦ Supports full and partial compatibility
◦ Instances added by hand – currently command line tool
◦ Needs to be updated over time
◦ Two prebuilt ontologies provided
20. Reflects the cost of transforming entities of
the same type
◦ E.g. change media player from Mplayer to Xine
Currently built by hand using command line
tool
Needs to be adapted to specific context and
updated over time
21. Use ontology to populate a probabilistic
graphical model
◦ States are components in complex digital object
Exhaustive analysis very costly
◦ Apply a variation of Pearl’s Belief Propagation Algorithm
◦ Based on efficient message passing
Generate recovery options
◦ Correspond to different temporal constraints
22. Based on web
services
Java – UI
framework
Analysis
components in
Python and R
Triple store
◦ Fuseki or
PERICLES ERMR
23. The technical appraisal tool is not a
repository or archive
Central point is the ERMR (Entity Registry
Model Repository)
Objects (composed of files, software,
hardware descriptions)
◦ Retained across multiple storage systems
◦ Those storage systems may or may not be repositories or
archives
24. Model Impact Change Explorer (MICE)
◦ Visualisation tool using D3 Javascript library
◦ Enables users to evaluate how a potential change to a
resource will impact the overall ecosystem
◦ Changes described via “deltas”
◦ uses PERSiST, an intermediate component for semantic
interpretation of the DVA ontology
25.
26. MICE-Appraisal Tool Integration
Workflow
Engine
PERSIsT API
retrieves
dependencies
and impact
forwards
Change (LRM delta)
visualises
impact
accepts /
rejects
change
Entity Registry Model Repository (ERMR)
saves
change
Technical Appraisal
Tool
recovery
options
inserts
new
Media /
selects
recovery
option
returns user’s
decision
sends change
(RDF triples)
retrieves
dependencies
and costs writes
recovery
options
27. PERICLES Appraisal Tool
◦ Due for release in March 2017
◦ Release on Github
PERICLES MICE tool
◦ Available on Github at
https://github.com/pericles-project/MICE
Licences
◦ Apache License Version 2.0, January 2004
◦ http://www.apache.org/licenses/
28. Demonstrates an automated decision support for
technical appraisal
Data-driven approach to monitor environmental
trends
Ecosystem model to capture technical
information on dependencies
Integrated tools for presenting risk-impact
analysis, impact visualisation and recoverability
options