SlideShare a Scribd company logo
State of the Art
SUMMARY OF D3.1 STATE OF THE ART
D GIARETTA
Outline
Preservation – State of the Art
Challenges for Linked Data
Options
Conclusions
EC policy – a brief
history – a personal view
EC support for
 DP research
 for creating digital objects
 Data
 Digitisation
 e-Infrastructure
 to
 Digital Agenda
National funding
 Significantly more than EC funding
 What is the EC role?
DP research: approx
100M€ from EC
From Research on Digital Preservation within projects co-funded by the European
Union in the ICT programme, 2011, Stephan Strodl et al
http://cordis.europa.eu/fp7/ict/creativity/report-research-digital-preservation_en.pdf
Situation now
The digital preservation community has failed in persuading the EC
that there is need for more funding for DP research
◦We do not have a consistent story about:
◦ Costs
◦ Rights
◦ Methods etc
◦ “Emulate or Migrate” inadequate!
◦ Who is doing it right
Luxembourg unit which previously funded DP research – name
changed to “Creativity” - now shows no funding for digital
preservation research
EC expects results from the previous 100 M € research by deploying
solutions
Digital Preservation –
some quotes:
Head of unit funding the Digital Preservation
projects asked repeatedly:
◦“Who pays and why?”
NSF colleague:
◦“Digital preservation is like VAT – people don’t
like it”
Value pyramid
From Riding the Wave
“The Digital Agenda for Europe outlines policies and actions
to maximise the benefit of the digital revolution for all.
Supporting research and innovation is a key priority of the
Agenda, essential if we want to establish a flourishing digital
economy.”
Neelie Kroes,
Vice-President of the EC, responsible for the Digital Agenda
Data is the new gold.
“We have a huge goldmine… Let’s start mining it.”
Neelie Kroes
That is the magic to find value amid the mass of data. The right infrastructure, the
right networks, the right computing capacity and, last but not least, the right
analysis methods and algorithms help us break through the mountains of rock to
find the gold within.
……but
Gold is precious because
◦it is rare
◦it does not combine with other elements
◦it does not perish
……..but……….
Data is valuable because
◦there is so much of it
◦it is more valuable when it is combined together
◦BUT it is far from imperishable
Role for
Linked Data
OR
Preservation – State of the Art
Problems when
preserving data
Preserve?
Preserve what?
For how long?
How to test?
Which people?
Which organisations?
How well?
• Metadata? – What kind? How much?
Difficulties in digital
preservation
Many different terminologies
Many different views of preservation
Many different kinds of digital objects
◦ Documents
◦ Data
◦ …… and new types of objects
Tools and Services
◦ Which ones work for which digital objects?
◦ Which tools/techniques fit together?
◦ How to integrate new tools
Consistent training needed
Risks vs Cost
Who can you trust?
}Need a
consistent,
coherent
approach to
digital
preservation
- APARSEN.
Need an Audit and Certification
system – ISO 16363
OAIS – ISO 14721
Preservation techniques
For each technique
look for evidence – what
evidence?
must at least make sure we
consider different types of data
◦rendered vs non-rendered
◦composite vs simple
◦dynamic vs static
◦active vs passive
must look at all types of threats
Basic preservation
activities
Libraries say:
“Emulate or migrate”
◦ Works well with data only in special cases
◦ Can repeat what was done before instead of new things
◦ Does not help with building cross-disciplinary communities
• Can repeat what has been
done before
BUT
• Cannot use new applications
• Convert to format which
new software can use
BUT
• What if there are many
software systems?
Contains numbers – need
meaning
16
...to be combined and processed
to get this
17
Level 2Level 0 Level 1
Processing
Processing/c
ombining
...or this
18
OAIS Information model:
Representation Information
The Information Model is
keyRecursion ends at
KNOWLEDGEBASE of
the DESIGNATED
COMMUNITY
(this knowledge will
change over time
and region)
Does not demand that
ALL Representation
Information be
collected at once.
A process which can
be tested
FITS FILE
FITS DICTIONARY
FITS
STANDARD
PDF
SOFTWAREJAVA VM
PDF
STANDARD
FITS JAVA
SOFTWARE
DICTIONARY
SPECIFICATION
XML
SPECIFICATION
UNICODE
SPECIFICATION
Rep Info Network
Additional technique:
add Representation Information
Descriptions of the digitally encoded
object
Ideal description allows a machine to
extract information
Migration
OAIS defines various types of Migration:
◦Do not change the bits
◦Refresh
◦Replicate
◦Change the packaging but not the content
◦Repackage
◦Change the content
◦Transform (usually non-reversible)
◦Need to consider “Transformational Information Properties” – important for
AUTHENTICITY
◦Related to “Significant properties”
◦Add appropriate Representation Information for the new format
22
AND – be prepared to
Hand-over
Preservation requires funding
Funding for a dataset (or a repository) may stop
Need to be ready to hand over everything needed
for preservation
◦OAIS (ISO 14721) defines “Archival Information Package
(AIP).
◦Issues:
◦ Storage naming conventions
◦ Representation Information
◦ Provenance
◦ ….
Preserving digitally
encoded information
Ensure that digitally encoded information
are understandable and usable over the long
term
 Long term could start at just a few years
 Chain of preservation
Need to do something because things
become “unfamiliar” over time
But the same techniques enable use of data
which is “unfamiliar” right now
When things changes
We need to:
◦Know something has changed
◦Identify the implications of that change
◦Decide on the best course of action for preservation
◦What RepInfo we need to fill the gaps
◦ Created by someone else or creating a new one
◦If transformed: how to maintain data authenticity
◦Alternatively: hand it over to another repository
◦Make sure data continues to be usable
Orchestration
Service
Gap Identification
Service
Preservation
Strategy Tk
RepInfo Registry
Service
Authenticity
Toolkit
Packaging Tk
Data
Virtualisati
on Toolkit
Process
Virtualisati
on Toolkit
RepInfo
Toolkit
SCIDIP-ES
Storage
Service
Gap
Identification
Service
Orchestration
Service
RepInfo
Registry
Service
Preservation
Strategy
Toolkit
Data
Virtualisation
Toolkit
Process
Virtualisation
Toolkit
Authenticity
Toolkit
Packaging
Toolkit
RepInfo
Toolkit
Finding
Aid
Toolkit
Cloud
Storage
External
Access/Use
Services
Persistent ID
i/f Service
External
PI
services
ISO
Certification
Organisation
Certification
Toolkit
Services:
run on remote
servers
Toolkits
Runs on
local
machines
• These SUPPLEMENT what repositories do (customised for repositories)
• Make it easier for repositories to do preservation – share the effort
Preservation objectives
The same digital object may be
preserved with different aims in mind
by different repositories:
For a digital document
Re-print the pages?
To understand the numbers printed in the page to
do further research
For a piece of performance art
Replay a recording of a particular performance?
Re-perform the work?
For a scientific data file
Understand the numbers?
Understand the numbers in the context of a
particular theory?
Preservation, Value and
Re-use
(re-)usability the essential test for success of preservation
◦ Usability usually essential for justifying cost of preservation
Impossible to insist on common formats, semantics or software
◦ How to avoid N2
problem?
Impossible to know what formats, semantics or software will be used in future
Needs appropriate Representation Information
◦ for preservation (use in the future when things have become unfamiliar)
◦ for use now (use of unfamiliar data i.e. most of it!)
◦ automated (re-)use as far as possible
APARSEN is bringing together a coherent, consistent, evidence-based approach to
digital preservation involving tools, services, consultancy and training.
Classification of objects
must at least make sure we
consider different types of data
◦rendered vs non-rendered
◦composite vs simple
◦dynamic vs static
◦Active vs passive
RDF Triple: dynamic/complex/non-rendered/passive
Key questions about the
what is to be preserved
What is the object to be preserved?
The specific piece of RDF?
The specific RDF plus data pointed to
The underlying database (if any)?
 The whole linked “world”?
What are the preservation objectives?
The RDF and whole inference system?
Just the RDF?
Just the underlying database (if any)?
Key questions about
RDF
What Representation information is needed for the LD?
Schema?
Additional semantics?
Evolution of links e.g. replace this host by a new one)?
Snapshots?
What Transformation?
One version of RDF to another?
Move to replacement for RDF?
Change of underlying database?
Authenticity??
Who to hand over to
What to do with the URIs? – maintain or change?
What to do with the underlying database (if any)?
Key questions about the
things the RDF points to
Will they be preserved?
How to find the Representation
Information?
Will the Persistent Identifiers change?
Joint Key Questions
Who will pay, and why?
For which things?
Are some things more valuable – and therefore
more likely to be preserved?
What happens when some things disappear?
Options
Be clear about what is meant
Understand what is possible
Start with what is agreed as valuable
Don’t promise too much
Input to standards
See http://www.iso16363.org
Audit and Certification of Trustworthy
repositories
Forum: OAIS Futures
Conclusions
A great deal of funding (€100M) has been
invested in digital preservation research by the EU
EC is not putting further funding into digital
preservation research
There are technical challenges
The biggest challenge is to be clear about what
the preservation aims are for Linked Data

More Related Content

What's hot

Digitization Basics for Libraries, Archives, and Museums
Digitization Basics for Libraries, Archives, and MuseumsDigitization Basics for Libraries, Archives, and Museums
Digitization Basics for Libraries, Archives, and Museums
Martin Kalfatovic
 
External controlled vocabularies support in Dataverse
External controlled vocabularies support in DataverseExternal controlled vocabularies support in Dataverse
External controlled vocabularies support in Dataverse
vty
 
Doc Book Vs Dita Teresa
Doc Book Vs Dita TeresaDoc Book Vs Dita Teresa
Doc Book Vs Dita Teresaday
 
Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...
vty
 
Intro to Digitization Projects
Intro to Digitization ProjectsIntro to Digitization Projects
Intro to Digitization Projects
zsrlibrary
 
Motivation for big data
Motivation for big dataMotivation for big data
Motivation for big data
Arockiaraj Durairaj
 
Digital preservation: an introduction
Digital preservation: an introductionDigital preservation: an introduction
Digital preservation: an introduction
Michael Day
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in DataverseClariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
vty
 
NordForsk Open Access Reykjavik 14-15/8-2014:Rda
NordForsk Open Access Reykjavik 14-15/8-2014:RdaNordForsk Open Access Reykjavik 14-15/8-2014:Rda
NordForsk Open Access Reykjavik 14-15/8-2014:RdaNordForsk
 
The world of Docker and Kubernetes
The world of Docker and Kubernetes The world of Docker and Kubernetes
The world of Docker and Kubernetes
vty
 
Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)
Chris Dagdigian
 
5 years of Dataverse evolution
5 years of Dataverse evolution 5 years of Dataverse evolution
5 years of Dataverse evolution
vty
 
Metaverse for Dataverse
Metaverse for DataverseMetaverse for Dataverse
Metaverse for Dataverse
vty
 
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Project
vty
 
Big Data: hype or necessity?
Big Data: hype or necessity?Big Data: hype or necessity?
Big Data: hype or necessity?
Bart Vandewoestyne
 
Long Live Posix - HPC Storage and the HPC Datacenter
Long Live Posix - HPC Storage and the HPC DatacenterLong Live Posix - HPC Storage and the HPC Datacenter
Long Live Posix - HPC Storage and the HPC Datacenter
inside-BigData.com
 
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Chris Dagdigian
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...
Vyacheslav Tykhonov
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
Venkata Reddy Konasani
 
Leveraging open source for big data stack
Leveraging open source for big data stackLeveraging open source for big data stack
Leveraging open source for big data stack
Flytxt
 

What's hot (20)

Digitization Basics for Libraries, Archives, and Museums
Digitization Basics for Libraries, Archives, and MuseumsDigitization Basics for Libraries, Archives, and Museums
Digitization Basics for Libraries, Archives, and Museums
 
External controlled vocabularies support in Dataverse
External controlled vocabularies support in DataverseExternal controlled vocabularies support in Dataverse
External controlled vocabularies support in Dataverse
 
Doc Book Vs Dita Teresa
Doc Book Vs Dita TeresaDoc Book Vs Dita Teresa
Doc Book Vs Dita Teresa
 
Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...
 
Intro to Digitization Projects
Intro to Digitization ProjectsIntro to Digitization Projects
Intro to Digitization Projects
 
Motivation for big data
Motivation for big dataMotivation for big data
Motivation for big data
 
Digital preservation: an introduction
Digital preservation: an introductionDigital preservation: an introduction
Digital preservation: an introduction
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in DataverseClariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
 
NordForsk Open Access Reykjavik 14-15/8-2014:Rda
NordForsk Open Access Reykjavik 14-15/8-2014:RdaNordForsk Open Access Reykjavik 14-15/8-2014:Rda
NordForsk Open Access Reykjavik 14-15/8-2014:Rda
 
The world of Docker and Kubernetes
The world of Docker and Kubernetes The world of Docker and Kubernetes
The world of Docker and Kubernetes
 
Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)
 
5 years of Dataverse evolution
5 years of Dataverse evolution 5 years of Dataverse evolution
5 years of Dataverse evolution
 
Metaverse for Dataverse
Metaverse for DataverseMetaverse for Dataverse
Metaverse for Dataverse
 
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Project
 
Big Data: hype or necessity?
Big Data: hype or necessity?Big Data: hype or necessity?
Big Data: hype or necessity?
 
Long Live Posix - HPC Storage and the HPC Datacenter
Long Live Posix - HPC Storage and the HPC DatacenterLong Live Posix - HPC Storage and the HPC Datacenter
Long Live Posix - HPC Storage and the HPC Datacenter
 
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
 
Leveraging open source for big data stack
Leveraging open source for big data stackLeveraging open source for big data stack
Leveraging open source for big data stack
 

Viewers also liked

Digital preservation
Digital preservationDigital preservation
Digital preservation
Sarika Sawant
 
DIACHRON Preservation: Evolution Management for Preservation
DIACHRON Preservation: Evolution Management for PreservationDIACHRON Preservation: Evolution Management for Preservation
DIACHRON Preservation: Evolution Management for Preservation
PRELIDA Project
 
Organizational and Economic Issues in Linked Data Preservation
Organizational and Economic Issues in Linked Data PreservationOrganizational and Economic Issues in Linked Data Preservation
Organizational and Economic Issues in Linked Data Preservation
PRELIDA Project
 
Towards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA projectTowards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA project
PRELIDA Project
 
2nd generation of design tools for ocean energy devices and arrays developmen...
2nd generation of design tools for ocean energy devices and arrays developmen...2nd generation of design tools for ocean energy devices and arrays developmen...
2nd generation of design tools for ocean energy devices and arrays developmen...
Joost Holleman
 
Preserving linked data: sustainability and organizational infrastructure
Preserving linked data: sustainability and organizational infrastructurePreserving linked data: sustainability and organizational infrastructure
Preserving linked data: sustainability and organizational infrastructure
PRELIDA Project
 
La conservazione dei documenti digitali
La conservazione dei documenti digitaliLa conservazione dei documenti digitali
La conservazione dei documenti digitali
Laboratorio di Cultura Digitale, Università di Pisa
 
Digital Preservation
Digital PreservationDigital Preservation
Digital Preservation
smtcd
 
Brief Introduction to Digital Preservation
Brief Introduction to Digital PreservationBrief Introduction to Digital Preservation
Brief Introduction to Digital Preservation
Michael Day
 

Viewers also liked (9)

Digital preservation
Digital preservationDigital preservation
Digital preservation
 
DIACHRON Preservation: Evolution Management for Preservation
DIACHRON Preservation: Evolution Management for PreservationDIACHRON Preservation: Evolution Management for Preservation
DIACHRON Preservation: Evolution Management for Preservation
 
Organizational and Economic Issues in Linked Data Preservation
Organizational and Economic Issues in Linked Data PreservationOrganizational and Economic Issues in Linked Data Preservation
Organizational and Economic Issues in Linked Data Preservation
 
Towards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA projectTowards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA project
 
2nd generation of design tools for ocean energy devices and arrays developmen...
2nd generation of design tools for ocean energy devices and arrays developmen...2nd generation of design tools for ocean energy devices and arrays developmen...
2nd generation of design tools for ocean energy devices and arrays developmen...
 
Preserving linked data: sustainability and organizational infrastructure
Preserving linked data: sustainability and organizational infrastructurePreserving linked data: sustainability and organizational infrastructure
Preserving linked data: sustainability and organizational infrastructure
 
La conservazione dei documenti digitali
La conservazione dei documenti digitaliLa conservazione dei documenti digitali
La conservazione dei documenti digitali
 
Digital Preservation
Digital PreservationDigital Preservation
Digital Preservation
 
Brief Introduction to Digital Preservation
Brief Introduction to Digital PreservationBrief Introduction to Digital Preservation
Brief Introduction to Digital Preservation
 

Similar to D.3.1: State of the Art - Linked Data and Digital Preservation

Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
Bob Hardaway
 
Population genomics is a data management problem
Population genomics is a data management problemPopulation genomics is a data management problem
Population genomics is a data management problem
Stavros Papadopoulos
 
Principles for proper data management and reuse--An RDA view
Principles for proper data management and reuse--An RDA viewPrinciples for proper data management and reuse--An RDA view
Principles for proper data management and reuse--An RDA view
Research Data Alliance
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Philip Filleul
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
Denodo
 
Big data in Private Banking
Big data in Private BankingBig data in Private Banking
Big data in Private Banking
Jérôme Kehrli
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
Denodo
 
Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringer
Microsoft
 
SKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSISSKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSIS
Skillwise Consulting
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big Data
Frank Kienle
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigManish Chopra
 
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupBig Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Scott Mitchell
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigData
Valarmathi V
 
Data Virtualization to Survive a Multi and Hybrid Cloud World
Data Virtualization to Survive a Multi and Hybrid Cloud WorldData Virtualization to Survive a Multi and Hybrid Cloud World
Data Virtualization to Survive a Multi and Hybrid Cloud World
Denodo
 
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriBig Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Demi Ben-Ari
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need It
Denodo
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Denodo
 
Digitalisation: How can we mix the "new oil" and the "old oil? The role of IT...
Digitalisation: How can we mix the "new oil" and the "old oil? The role of IT...Digitalisation: How can we mix the "new oil" and the "old oil? The role of IT...
Digitalisation: How can we mix the "new oil" and the "old oil? The role of IT...
SIRIUS Centre, University of Oslo
 
Gettingstartedwithdigitalcollectionsweb[1]
Gettingstartedwithdigitalcollectionsweb[1]Gettingstartedwithdigitalcollectionsweb[1]
Gettingstartedwithdigitalcollectionsweb[1]guest410707c
 

Similar to D.3.1: State of the Art - Linked Data and Digital Preservation (20)

Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Population genomics is a data management problem
Population genomics is a data management problemPopulation genomics is a data management problem
Population genomics is a data management problem
 
Principles for proper data management and reuse--An RDA view
Principles for proper data management and reuse--An RDA viewPrinciples for proper data management and reuse--An RDA view
Principles for proper data management and reuse--An RDA view
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Big data in Private Banking
Big data in Private BankingBig data in Private Banking
Big data in Private Banking
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringer
 
SKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSISSKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSIS
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big Data
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupBig Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigData
 
Data Virtualization to Survive a Multi and Hybrid Cloud World
Data Virtualization to Survive a Multi and Hybrid Cloud WorldData Virtualization to Survive a Multi and Hybrid Cloud World
Data Virtualization to Survive a Multi and Hybrid Cloud World
 
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriBig Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-Ari
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need It
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
 
Digitalisation: How can we mix the "new oil" and the "old oil? The role of IT...
Digitalisation: How can we mix the "new oil" and the "old oil? The role of IT...Digitalisation: How can we mix the "new oil" and the "old oil? The role of IT...
Digitalisation: How can we mix the "new oil" and the "old oil? The role of IT...
 
Gettingstartedwithdigitalcollectionsweb[1]
Gettingstartedwithdigitalcollectionsweb[1]Gettingstartedwithdigitalcollectionsweb[1]
Gettingstartedwithdigitalcollectionsweb[1]
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 

More from PRELIDA Project

Steps towards a Data Value Chain
Steps towards a Data Value ChainSteps towards a Data Value Chain
Steps towards a Data Value Chain
PRELIDA Project
 
CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...
CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...
CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...
PRELIDA Project
 
Experiments with evolving RDF
Experiments with evolving RDFExperiments with evolving RDF
Experiments with evolving RDF
PRELIDA Project
 
Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...
Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...
Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...
PRELIDA Project
 
Media Ecology Project
Media Ecology ProjectMedia Ecology Project
Media Ecology Project
PRELIDA Project
 
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and RemedyHIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
PRELIDA Project
 
CEDAR & PRELIDA Preservation of Linked Socio-Historical Data
CEDAR & PRELIDA Preservation of Linked Socio-Historical DataCEDAR & PRELIDA Preservation of Linked Socio-Historical Data
CEDAR & PRELIDA Preservation of Linked Socio-Historical Data
PRELIDA Project
 
DIACHRON Project Overview
DIACHRON Project OverviewDIACHRON Project Overview
DIACHRON Project Overview
PRELIDA Project
 
PRELIDA Project Draft Roadmap
PRELIDA Project Draft RoadmapPRELIDA Project Draft Roadmap
PRELIDA Project Draft Roadmap
PRELIDA Project
 
Introduction to PRELIDA Consolidation and Dissemination Workshop
Introduction to PRELIDA Consolidation and Dissemination WorkshopIntroduction to PRELIDA Consolidation and Dissemination Workshop
Introduction to PRELIDA Consolidation and Dissemination Workshop
PRELIDA Project
 
D3.1 State of the art assessment on Linked Data and Digital Preservation
D3.1 State of the art assessment on Linked Data and Digital PreservationD3.1 State of the art assessment on Linked Data and Digital Preservation
D3.1 State of the art assessment on Linked Data and Digital Preservation
PRELIDA Project
 
Gap Analysis
Gap AnalysisGap Analysis
Gap Analysis
PRELIDA Project
 
Introduction to Prelida
Introduction to PrelidaIntroduction to Prelida
Introduction to Prelida
PRELIDA Project
 

More from PRELIDA Project (13)

Steps towards a Data Value Chain
Steps towards a Data Value ChainSteps towards a Data Value Chain
Steps towards a Data Value Chain
 
CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...
CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...
CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...
 
Experiments with evolving RDF
Experiments with evolving RDFExperiments with evolving RDF
Experiments with evolving RDF
 
Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...
Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...
Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...
 
Media Ecology Project
Media Ecology ProjectMedia Ecology Project
Media Ecology Project
 
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and RemedyHIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
 
CEDAR & PRELIDA Preservation of Linked Socio-Historical Data
CEDAR & PRELIDA Preservation of Linked Socio-Historical DataCEDAR & PRELIDA Preservation of Linked Socio-Historical Data
CEDAR & PRELIDA Preservation of Linked Socio-Historical Data
 
DIACHRON Project Overview
DIACHRON Project OverviewDIACHRON Project Overview
DIACHRON Project Overview
 
PRELIDA Project Draft Roadmap
PRELIDA Project Draft RoadmapPRELIDA Project Draft Roadmap
PRELIDA Project Draft Roadmap
 
Introduction to PRELIDA Consolidation and Dissemination Workshop
Introduction to PRELIDA Consolidation and Dissemination WorkshopIntroduction to PRELIDA Consolidation and Dissemination Workshop
Introduction to PRELIDA Consolidation and Dissemination Workshop
 
D3.1 State of the art assessment on Linked Data and Digital Preservation
D3.1 State of the art assessment on Linked Data and Digital PreservationD3.1 State of the art assessment on Linked Data and Digital Preservation
D3.1 State of the art assessment on Linked Data and Digital Preservation
 
Gap Analysis
Gap AnalysisGap Analysis
Gap Analysis
 
Introduction to Prelida
Introduction to PrelidaIntroduction to Prelida
Introduction to Prelida
 

Recently uploaded

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 

Recently uploaded (20)

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 

D.3.1: State of the Art - Linked Data and Digital Preservation

  • 1. State of the Art SUMMARY OF D3.1 STATE OF THE ART D GIARETTA
  • 2. Outline Preservation – State of the Art Challenges for Linked Data Options Conclusions
  • 3. EC policy – a brief history – a personal view EC support for  DP research  for creating digital objects  Data  Digitisation  e-Infrastructure  to  Digital Agenda National funding  Significantly more than EC funding  What is the EC role?
  • 4. DP research: approx 100M€ from EC From Research on Digital Preservation within projects co-funded by the European Union in the ICT programme, 2011, Stephan Strodl et al http://cordis.europa.eu/fp7/ict/creativity/report-research-digital-preservation_en.pdf
  • 5. Situation now The digital preservation community has failed in persuading the EC that there is need for more funding for DP research ◦We do not have a consistent story about: ◦ Costs ◦ Rights ◦ Methods etc ◦ “Emulate or Migrate” inadequate! ◦ Who is doing it right Luxembourg unit which previously funded DP research – name changed to “Creativity” - now shows no funding for digital preservation research EC expects results from the previous 100 M € research by deploying solutions
  • 6. Digital Preservation – some quotes: Head of unit funding the Digital Preservation projects asked repeatedly: ◦“Who pays and why?” NSF colleague: ◦“Digital preservation is like VAT – people don’t like it”
  • 8. “The Digital Agenda for Europe outlines policies and actions to maximise the benefit of the digital revolution for all. Supporting research and innovation is a key priority of the Agenda, essential if we want to establish a flourishing digital economy.” Neelie Kroes, Vice-President of the EC, responsible for the Digital Agenda Data is the new gold. “We have a huge goldmine… Let’s start mining it.” Neelie Kroes That is the magic to find value amid the mass of data. The right infrastructure, the right networks, the right computing capacity and, last but not least, the right analysis methods and algorithms help us break through the mountains of rock to find the gold within.
  • 9. ……but Gold is precious because ◦it is rare ◦it does not combine with other elements ◦it does not perish ……..but………. Data is valuable because ◦there is so much of it ◦it is more valuable when it is combined together ◦BUT it is far from imperishable Role for Linked Data
  • 10. OR
  • 12. Problems when preserving data Preserve? Preserve what? For how long? How to test? Which people? Which organisations? How well? • Metadata? – What kind? How much?
  • 13. Difficulties in digital preservation Many different terminologies Many different views of preservation Many different kinds of digital objects ◦ Documents ◦ Data ◦ …… and new types of objects Tools and Services ◦ Which ones work for which digital objects? ◦ Which tools/techniques fit together? ◦ How to integrate new tools Consistent training needed Risks vs Cost Who can you trust? }Need a consistent, coherent approach to digital preservation - APARSEN. Need an Audit and Certification system – ISO 16363 OAIS – ISO 14721
  • 14. Preservation techniques For each technique look for evidence – what evidence? must at least make sure we consider different types of data ◦rendered vs non-rendered ◦composite vs simple ◦dynamic vs static ◦active vs passive must look at all types of threats
  • 15. Basic preservation activities Libraries say: “Emulate or migrate” ◦ Works well with data only in special cases ◦ Can repeat what was done before instead of new things ◦ Does not help with building cross-disciplinary communities • Can repeat what has been done before BUT • Cannot use new applications • Convert to format which new software can use BUT • What if there are many software systems?
  • 16. Contains numbers – need meaning 16
  • 17. ...to be combined and processed to get this 17 Level 2Level 0 Level 1 Processing Processing/c ombining
  • 19. OAIS Information model: Representation Information The Information Model is keyRecursion ends at KNOWLEDGEBASE of the DESIGNATED COMMUNITY (this knowledge will change over time and region) Does not demand that ALL Representation Information be collected at once. A process which can be tested
  • 20. FITS FILE FITS DICTIONARY FITS STANDARD PDF SOFTWAREJAVA VM PDF STANDARD FITS JAVA SOFTWARE DICTIONARY SPECIFICATION XML SPECIFICATION UNICODE SPECIFICATION Rep Info Network
  • 21. Additional technique: add Representation Information Descriptions of the digitally encoded object Ideal description allows a machine to extract information
  • 22. Migration OAIS defines various types of Migration: ◦Do not change the bits ◦Refresh ◦Replicate ◦Change the packaging but not the content ◦Repackage ◦Change the content ◦Transform (usually non-reversible) ◦Need to consider “Transformational Information Properties” – important for AUTHENTICITY ◦Related to “Significant properties” ◦Add appropriate Representation Information for the new format 22
  • 23. AND – be prepared to Hand-over Preservation requires funding Funding for a dataset (or a repository) may stop Need to be ready to hand over everything needed for preservation ◦OAIS (ISO 14721) defines “Archival Information Package (AIP). ◦Issues: ◦ Storage naming conventions ◦ Representation Information ◦ Provenance ◦ ….
  • 24. Preserving digitally encoded information Ensure that digitally encoded information are understandable and usable over the long term  Long term could start at just a few years  Chain of preservation Need to do something because things become “unfamiliar” over time But the same techniques enable use of data which is “unfamiliar” right now
  • 25. When things changes We need to: ◦Know something has changed ◦Identify the implications of that change ◦Decide on the best course of action for preservation ◦What RepInfo we need to fill the gaps ◦ Created by someone else or creating a new one ◦If transformed: how to maintain data authenticity ◦Alternatively: hand it over to another repository ◦Make sure data continues to be usable Orchestration Service Gap Identification Service Preservation Strategy Tk RepInfo Registry Service Authenticity Toolkit Packaging Tk Data Virtualisati on Toolkit Process Virtualisati on Toolkit RepInfo Toolkit
  • 27.
  • 28. Preservation objectives The same digital object may be preserved with different aims in mind by different repositories: For a digital document Re-print the pages? To understand the numbers printed in the page to do further research For a piece of performance art Replay a recording of a particular performance? Re-perform the work? For a scientific data file Understand the numbers? Understand the numbers in the context of a particular theory?
  • 29. Preservation, Value and Re-use (re-)usability the essential test for success of preservation ◦ Usability usually essential for justifying cost of preservation Impossible to insist on common formats, semantics or software ◦ How to avoid N2 problem? Impossible to know what formats, semantics or software will be used in future Needs appropriate Representation Information ◦ for preservation (use in the future when things have become unfamiliar) ◦ for use now (use of unfamiliar data i.e. most of it!) ◦ automated (re-)use as far as possible APARSEN is bringing together a coherent, consistent, evidence-based approach to digital preservation involving tools, services, consultancy and training.
  • 30. Classification of objects must at least make sure we consider different types of data ◦rendered vs non-rendered ◦composite vs simple ◦dynamic vs static ◦Active vs passive RDF Triple: dynamic/complex/non-rendered/passive
  • 31. Key questions about the what is to be preserved What is the object to be preserved? The specific piece of RDF? The specific RDF plus data pointed to The underlying database (if any)?  The whole linked “world”? What are the preservation objectives? The RDF and whole inference system? Just the RDF? Just the underlying database (if any)?
  • 32. Key questions about RDF What Representation information is needed for the LD? Schema? Additional semantics? Evolution of links e.g. replace this host by a new one)? Snapshots? What Transformation? One version of RDF to another? Move to replacement for RDF? Change of underlying database? Authenticity?? Who to hand over to What to do with the URIs? – maintain or change? What to do with the underlying database (if any)?
  • 33. Key questions about the things the RDF points to Will they be preserved? How to find the Representation Information? Will the Persistent Identifiers change?
  • 34. Joint Key Questions Who will pay, and why? For which things? Are some things more valuable – and therefore more likely to be preserved? What happens when some things disappear?
  • 35. Options Be clear about what is meant Understand what is possible Start with what is agreed as valuable Don’t promise too much
  • 36. Input to standards See http://www.iso16363.org Audit and Certification of Trustworthy repositories Forum: OAIS Futures
  • 37. Conclusions A great deal of funding (€100M) has been invested in digital preservation research by the EU EC is not putting further funding into digital preservation research There are technical challenges The biggest challenge is to be clear about what the preservation aims are for Linked Data

Editor's Notes

  1. Image, document Rendered/ Static/ Simple Dynamic database with stored procedures Non-rendered/ Dynamic/ Complex Scientific dataset Non-rendered/ Static/ Complex
  2. Data is migrated – big job but is done sometimes. Emulation is sometimes used but mainly for repeating processing for some specific reason. More generally users do not want to simply repeat what has been done before.
  3. Just to be clear – I am focussing on the OAIS Information Model
  4. Divide Migration into 3 groups depending what changes: Refresh – replace media like for like Replicate – maybe new media Repackage – e.g. copy from tape to disk Transform – e.g. change from Word to PDF or - The “migrate” in “emulate or migrate” is the third one - Transform
  5. Image, document Rendered/ Static/ Simple Dynamic database with stored procedures Non-rendered/ Dynamic/ Complex Scientific dataset Non-rendered/ Static/ Complex