D.3.1: State of the Art - Linked Data and Digital Preservation

State of the Art
SUMMARY OF D3.1 STATE OF THE ART
D GIARETTA

Outline
Preservation – State of the Art
Challenges for Linked Data
Options
Conclusions

EC policy – a brief
history – a personal view
EC support for
 DP research
 for creating digital objects
 Data
 Digitisation
 e-Infrastructure
 to
 Digital Agenda
National funding
 Significantly more than EC funding
 What is the EC role?

DP research: approx
100M€ from EC
From Research on Digital Preservation within projects co-funded by the European
Union in the ICT programme, 2011, Stephan Strodl et al
http://cordis.europa.eu/fp7/ict/creativity/report-research-digital-preservation_en.pdf

Situation now
The digital preservation community has failed in persuading the EC
that there is need for more funding for DP research
◦We do not have a consistent story about:
◦ Costs
◦ Rights
◦ Methods etc
◦ “Emulate or Migrate” inadequate!
◦ Who is doing it right
Luxembourg unit which previously funded DP research – name
changed to “Creativity” - now shows no funding for digital
preservation research
EC expects results from the previous 100 M € research by deploying
solutions

Digital Preservation –
some quotes:
Head of unit funding the Digital Preservation
projects asked repeatedly:
◦“Who pays and why?”
NSF colleague:
◦“Digital preservation is like VAT – people don’t
like it”

Value pyramid
From Riding the Wave

“The Digital Agenda for Europe outlines policies and actions
to maximise the benefit of the digital revolution for all.
Supporting research and innovation is a key priority of the
Agenda, essential if we want to establish a flourishing digital
economy.”
Neelie Kroes,
Vice-President of the EC, responsible for the Digital Agenda
Data is the new gold.
“We have a huge goldmine… Let’s start mining it.”
Neelie Kroes
That is the magic to find value amid the mass of data. The right infrastructure, the
right networks, the right computing capacity and, last but not least, the right
analysis methods and algorithms help us break through the mountains of rock to
find the gold within.

……but
Gold is precious because
◦it is rare
◦it does not combine with other elements
◦it does not perish
……..but……….
Data is valuable because
◦there is so much of it
◦it is more valuable when it is combined together
◦BUT it is far from imperishable
Role for
Linked Data

Preservation – State of the Art

Problems when
preserving data
Preserve?
Preserve what?
For how long?
How to test?
Which people?
Which organisations?
How well?
• Metadata? – What kind? How much?

Difficulties in digital
preservation
Many different terminologies
Many different views of preservation
Many different kinds of digital objects
◦ Documents
◦ Data
◦ …… and new types of objects
Tools and Services
◦ Which ones work for which digital objects?
◦ Which tools/techniques fit together?
◦ How to integrate new tools
Consistent training needed
Risks vs Cost
Who can you trust?
}Need a
consistent,
coherent
approach to
digital
preservation
- APARSEN.
Need an Audit and Certification
system – ISO 16363
OAIS – ISO 14721

Preservation techniques
For each technique
look for evidence – what
evidence?
must at least make sure we
consider different types of data
◦rendered vs non-rendered
◦composite vs simple
◦dynamic vs static
◦active vs passive
must look at all types of threats

Basic preservation
activities
Libraries say:
“Emulate or migrate”
◦ Works well with data only in special cases
◦ Can repeat what was done before instead of new things
◦ Does not help with building cross-disciplinary communities
• Can repeat what has been
done before
BUT
• Cannot use new applications
• Convert to format which
new software can use
BUT
• What if there are many
software systems?

Contains numbers – need
meaning
16

...to be combined and processed
to get this
17
Level 2Level 0 Level 1
Processing
Processing/c
ombining

OAIS Information model:
Representation Information
The Information Model is
keyRecursion ends at
KNOWLEDGEBASE of
the DESIGNATED
COMMUNITY
(this knowledge will
change over time
and region)
Does not demand that
ALL Representation
Information be
collected at once.
A process which can
be tested

FITS FILE
FITS DICTIONARY
FITS
STANDARD
PDF
SOFTWAREJAVA VM
PDF
STANDARD
FITS JAVA
SOFTWARE
DICTIONARY
SPECIFICATION
XML
SPECIFICATION
UNICODE
SPECIFICATION
Rep Info Network

Additional technique:
add Representation Information
Descriptions of the digitally encoded
object
Ideal description allows a machine to
extract information

Migration
OAIS defines various types of Migration:
◦Do not change the bits
◦Refresh
◦Replicate
◦Change the packaging but not the content
◦Repackage
◦Change the content
◦Transform (usually non-reversible)
◦Need to consider “Transformational Information Properties” – important for
AUTHENTICITY
◦Related to “Significant properties”
◦Add appropriate Representation Information for the new format
22

AND – be prepared to
Hand-over
Preservation requires funding
Funding for a dataset (or a repository) may stop
Need to be ready to hand over everything needed
for preservation
◦OAIS (ISO 14721) defines “Archival Information Package
(AIP).
◦Issues:
◦ Storage naming conventions
◦ Representation Information
◦ Provenance
◦ ….

Preserving digitally
encoded information
Ensure that digitally encoded information
are understandable and usable over the long
term
 Long term could start at just a few years
 Chain of preservation
Need to do something because things
become “unfamiliar” over time
But the same techniques enable use of data
which is “unfamiliar” right now

When things changes
We need to:
◦Know something has changed
◦Identify the implications of that change
◦Decide on the best course of action for preservation
◦What RepInfo we need to fill the gaps
◦ Created by someone else or creating a new one
◦If transformed: how to maintain data authenticity
◦Alternatively: hand it over to another repository
◦Make sure data continues to be usable
Orchestration
Service
Gap Identification
Service
Preservation
Strategy Tk
RepInfo Registry
Service
Authenticity
Toolkit
Packaging Tk
Data
Virtualisati
on Toolkit
Process
Virtualisati
on Toolkit
RepInfo
Toolkit

SCIDIP-ES
Storage
Service
Gap
Identification
Service
Orchestration
Service
RepInfo
Registry
Service
Preservation
Strategy
Toolkit
Data
Virtualisation
Toolkit
Process
Virtualisation
Toolkit
Authenticity
Toolkit
Packaging
Toolkit
RepInfo
Toolkit
Finding
Aid
Toolkit
Cloud
Storage
External
Access/Use
Services
Persistent ID
i/f Service
External
PI
services
ISO
Certification
Organisation
Certification
Toolkit
Services:
run on remote
servers
Toolkits
Runs on
local
machines
• These SUPPLEMENT what repositories do (customised for repositories)
• Make it easier for repositories to do preservation – share the effort

Preservation objectives
The same digital object may be
preserved with different aims in mind
by different repositories:
For a digital document
Re-print the pages?
To understand the numbers printed in the page to
do further research
For a piece of performance art
Replay a recording of a particular performance?
Re-perform the work?
For a scientific data file
Understand the numbers?
Understand the numbers in the context of a
particular theory?

Preservation, Value and
Re-use
(re-)usability the essential test for success of preservation
◦ Usability usually essential for justifying cost of preservation
Impossible to insist on common formats, semantics or software
◦ How to avoid N2
problem?
Impossible to know what formats, semantics or software will be used in future
Needs appropriate Representation Information
◦ for preservation (use in the future when things have become unfamiliar)
◦ for use now (use of unfamiliar data i.e. most of it!)
◦ automated (re-)use as far as possible
APARSEN is bringing together a coherent, consistent, evidence-based approach to
digital preservation involving tools, services, consultancy and training.

Classification of objects
must at least make sure we
consider different types of data
◦rendered vs non-rendered
◦composite vs simple
◦dynamic vs static
◦Active vs passive
RDF Triple: dynamic/complex/non-rendered/passive

Key questions about the
what is to be preserved
What is the object to be preserved?
The specific piece of RDF?
The specific RDF plus data pointed to
The underlying database (if any)?
 The whole linked “world”?
What are the preservation objectives?
The RDF and whole inference system?
Just the RDF?
Just the underlying database (if any)?

Key questions about
RDF
What Representation information is needed for the LD?
Schema?
Additional semantics?
Evolution of links e.g. replace this host by a new one)?
Snapshots?
What Transformation?
One version of RDF to another?
Move to replacement for RDF?
Change of underlying database?
Authenticity??
Who to hand over to
What to do with the URIs? – maintain or change?
What to do with the underlying database (if any)?

Key questions about the
things the RDF points to
Will they be preserved?
How to find the Representation
Information?
Will the Persistent Identifiers change?

Joint Key Questions
Who will pay, and why?
For which things?
Are some things more valuable – and therefore
more likely to be preserved?
What happens when some things disappear?

Options
Be clear about what is meant
Understand what is possible
Start with what is agreed as valuable
Don’t promise too much

Input to standards
See http://www.iso16363.org
Audit and Certification of Trustworthy
repositories
Forum: OAIS Futures

Conclusions
A great deal of funding (€100M) has been
invested in digital preservation research by the EU
EC is not putting further funding into digital
preservation research
There are technical challenges
The biggest challenge is to be clear about what
the preservation aims are for Linked Data

D.3.1: State of the Art - Linked Data and Digital Preservation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to D.3.1: State of the Art - Linked Data and Digital Preservation

Similar to D.3.1: State of the Art - Linked Data and Digital Preservation (20)

More from PRELIDA Project

More from PRELIDA Project (13)

Recently uploaded

Recently uploaded (20)

D.3.1: State of the Art - Linked Data and Digital Preservation

Editor's Notes