SlideShare a Scribd company logo
Evolution Management for Preservation
PRELIDA Consolidation Workshop 17.10.2014
Giorgos Flouris (FORTH)
fgeo@ics.forth.gr
Evolution Management Problem
Preservation ↔ Evolution
Change Detection
• Change detection for evolution management
– Identifying changes between versions
• Challenges (in DIACHRON)
1. Diverse data models
2. Dynamic datasets
3. Recoverable versions
4. Changes as first-class citizens
5. Cross-snapshot queries
Evolution in DIACHRON
Pilot dataset DIACHRON
Version1
Pilot dataset DIACHRON
Version2
Change Types: Motivation
What a naïve diff will report
Add (Rec, diachron:subject, EFO_001927)
Add (Rec, diachron:hasRecordAttribute, rAtt1)
Add (rAtt1, diachron:predicate, rdfs:subClassOf)
Add (rAtt1, diachron:object, ObsoleteClass)
What the pilot expects
Add_SuperClass (EFO_001927, ObsoleteClass)
Change Hierarchy: Low-level (1/3)
• Low-level changes
– DIACHRON model, for internal use
– Fixed:
Add, Delete
– Just additions and deletions of triples
– Simple set difference
Change Hierarchy: Simple (2/3)
• Pilot terminology:
– Add_SuperClass
Add_Dimension
• Fixed, pre-defined
• Comprising of low-level changes
• Partitioning is perfect
– Complete and unambiguous
Change Hierarchy: Complex (3/3)
• Pilot terminology:
– Add_Synonym, Mark_As_Obsolete
• Totally custom, pilot-specific (defined at run-time)
Using Changes for
Evolution Management
• DIACHRON data model contains all versions
• Detection based on SPARQL queries
– Provided at deployment time (for simple)
– Generated at creation time (for complex)
• Recoverability
– Allows moving back and forth between versions
Representation Requirements
• Interesting queries
– Return the simple changes that dataset X underwent
between versions V1 and V2
– Return the changes that resource X underwent in the first
semester of 2014
– Give me all resources of type X that underwent change Y
– Return all countries for which the unemployment rate of
their capital city increased at a rate higher than the
average increase of the country as a whole, between
versions V1 and V2
• Access to both the changes and the data is required
– Changes are first-class citizens
– Allowing preservation
DIACHRON
Data
Changes Ontology
C1
Add_SuperClass
V1
V2
asc_p1
asc_p2
Simple_Change
Change
prov:Activity
Data level
Schema level
EFO_001927
ObsoleteClass
old_version
new_version
diachron:Entity
Add_Synonym
Complex_Change
… …
Conclusion
• Main DIACHRON message
– (Linked) data preservation is related to evolution management
• DIACHRON challenges
1. Diverse data models
2. Dynamic datasets
3. Recoverable versions
4. Changes as first-class citizens
5. Cross-snapshot queries
• Solutions
– DIACHRON data model (#1)
– Appropriate change definition and detection (#2, #3)
– Changes and data represented at the same level (#4, #5)

More Related Content

Similar to DIACHRON Preservation: Evolution Management for Preservation

Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...
Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...
Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...
PERICLES_FP7
 
JavaDiff - Java source code diff tool
JavaDiff - Java source code diff toolJavaDiff - Java source code diff tool
JavaDiff - Java source code diff tool
Enrico Micco
 
Icsm07.ppt
Icsm07.pptIcsm07.ppt
SPLC Presentation
SPLC PresentationSPLC Presentation
SPLC Presentation
Leonardo Passos
 
Changing Platforms
Changing PlatformsChanging Platforms
Changing Platforms
Richard Davis
 
Crafted Design - GeeCON 2014
Crafted Design - GeeCON 2014Crafted Design - GeeCON 2014
Crafted Design - GeeCON 2014
Sandro Mancuso
 
Db2 migration -_tips,_tricks,_and_pitfalls
Db2 migration -_tips,_tricks,_and_pitfallsDb2 migration -_tips,_tricks,_and_pitfalls
Db2 migration -_tips,_tricks,_and_pitfallssam2sung2
 
Windchill Migration Overview
Windchill Migration OverviewWindchill Migration Overview
Windchill Migration Overview
Eric Braun
 
#T3UXW14 : workspace Team Work
#T3UXW14 : workspace Team Work#T3UXW14 : workspace Team Work
#T3UXW14 : workspace Team Work
Paul Blondiaux
 
Tensor flow 2.0 what's new
Tensor flow 2.0  what's newTensor flow 2.0  what's new
Tensor flow 2.0 what's new
Poo Kuan Hoong
 
Eric Stone's Resume
Eric Stone's ResumeEric Stone's Resume
Eric Stone's ResumeEric Stone
 
Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...
Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...
Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...
HostedbyConfluent
 
Requirements for Supporting the Iterative Exploration of Scientific Workflow ...
Requirements for Supporting the Iterative Exploration of Scientific Workflow ...Requirements for Supporting the Iterative Exploration of Scientific Workflow ...
Requirements for Supporting the Iterative Exploration of Scientific Workflow ...
Lucas Augusto Carvalho
 
The Cha-Q Meta-Model: A Comprehensive, Change-Centric Software Representation
The Cha-Q Meta-Model: A Comprehensive, Change-Centric Software RepresentationThe Cha-Q Meta-Model: A Comprehensive, Change-Centric Software Representation
The Cha-Q Meta-Model: A Comprehensive, Change-Centric Software Representation
Coen De Roover
 
DSD-INT 2016 Calibration and scenario generation of hydrodynamics and water -...
DSD-INT 2016 Calibration and scenario generation of hydrodynamics and water -...DSD-INT 2016 Calibration and scenario generation of hydrodynamics and water -...
DSD-INT 2016 Calibration and scenario generation of hydrodynamics and water -...
Deltares
 
From Simple Features to Moving Features and Beyond? at OGC Member Meeting, Se...
From Simple Features to Moving Features and Beyond? at OGC Member Meeting, Se...From Simple Features to Moving Features and Beyond? at OGC Member Meeting, Se...
From Simple Features to Moving Features and Beyond? at OGC Member Meeting, Se...
Anita Graser
 
What's Next in OpenStack? A Glimpse At The Roadmap
What's Next in OpenStack? A Glimpse At The RoadmapWhat's Next in OpenStack? A Glimpse At The Roadmap
What's Next in OpenStack? A Glimpse At The Roadmap
ShamailXD
 
Microservices.pdf
Microservices.pdfMicroservices.pdf
Microservices.pdf
SelmaJelovac1
 
DM2E Project meeting Bergen: WP2 presentation, Kai Eckert (University of Mann...
DM2E Project meeting Bergen: WP2 presentation, Kai Eckert (University of Mann...DM2E Project meeting Bergen: WP2 presentation, Kai Eckert (University of Mann...
DM2E Project meeting Bergen: WP2 presentation, Kai Eckert (University of Mann...Digitised Manuscripts to Europeana
 

Similar to DIACHRON Preservation: Evolution Management for Preservation (20)

Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...
Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...
Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...
 
JavaDiff - Java source code diff tool
JavaDiff - Java source code diff toolJavaDiff - Java source code diff tool
JavaDiff - Java source code diff tool
 
Icsm07.ppt
Icsm07.pptIcsm07.ppt
Icsm07.ppt
 
SPLC Presentation
SPLC PresentationSPLC Presentation
SPLC Presentation
 
Changing Platforms
Changing PlatformsChanging Platforms
Changing Platforms
 
Crafted Design - GeeCON 2014
Crafted Design - GeeCON 2014Crafted Design - GeeCON 2014
Crafted Design - GeeCON 2014
 
Db2 migration -_tips,_tricks,_and_pitfalls
Db2 migration -_tips,_tricks,_and_pitfallsDb2 migration -_tips,_tricks,_and_pitfalls
Db2 migration -_tips,_tricks,_and_pitfalls
 
Windchill Migration Overview
Windchill Migration OverviewWindchill Migration Overview
Windchill Migration Overview
 
#T3UXW14 : workspace Team Work
#T3UXW14 : workspace Team Work#T3UXW14 : workspace Team Work
#T3UXW14 : workspace Team Work
 
Tensor flow 2.0 what's new
Tensor flow 2.0  what's newTensor flow 2.0  what's new
Tensor flow 2.0 what's new
 
Eric Stone's Resume
Eric Stone's ResumeEric Stone's Resume
Eric Stone's Resume
 
Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...
Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...
Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...
 
Requirements for Supporting the Iterative Exploration of Scientific Workflow ...
Requirements for Supporting the Iterative Exploration of Scientific Workflow ...Requirements for Supporting the Iterative Exploration of Scientific Workflow ...
Requirements for Supporting the Iterative Exploration of Scientific Workflow ...
 
The Cha-Q Meta-Model: A Comprehensive, Change-Centric Software Representation
The Cha-Q Meta-Model: A Comprehensive, Change-Centric Software RepresentationThe Cha-Q Meta-Model: A Comprehensive, Change-Centric Software Representation
The Cha-Q Meta-Model: A Comprehensive, Change-Centric Software Representation
 
DSD-INT 2016 Calibration and scenario generation of hydrodynamics and water -...
DSD-INT 2016 Calibration and scenario generation of hydrodynamics and water -...DSD-INT 2016 Calibration and scenario generation of hydrodynamics and water -...
DSD-INT 2016 Calibration and scenario generation of hydrodynamics and water -...
 
From Simple Features to Moving Features and Beyond? at OGC Member Meeting, Se...
From Simple Features to Moving Features and Beyond? at OGC Member Meeting, Se...From Simple Features to Moving Features and Beyond? at OGC Member Meeting, Se...
From Simple Features to Moving Features and Beyond? at OGC Member Meeting, Se...
 
Datawarehouse
DatawarehouseDatawarehouse
Datawarehouse
 
What's Next in OpenStack? A Glimpse At The Roadmap
What's Next in OpenStack? A Glimpse At The RoadmapWhat's Next in OpenStack? A Glimpse At The Roadmap
What's Next in OpenStack? A Glimpse At The Roadmap
 
Microservices.pdf
Microservices.pdfMicroservices.pdf
Microservices.pdf
 
DM2E Project meeting Bergen: WP2 presentation, Kai Eckert (University of Mann...
DM2E Project meeting Bergen: WP2 presentation, Kai Eckert (University of Mann...DM2E Project meeting Bergen: WP2 presentation, Kai Eckert (University of Mann...
DM2E Project meeting Bergen: WP2 presentation, Kai Eckert (University of Mann...
 

More from PRELIDA Project

Steps towards a Data Value Chain
Steps towards a Data Value ChainSteps towards a Data Value Chain
Steps towards a Data Value Chain
PRELIDA Project
 
CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...
CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...
CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...
PRELIDA Project
 
Experiments with evolving RDF
Experiments with evolving RDFExperiments with evolving RDF
Experiments with evolving RDF
PRELIDA Project
 
Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...
Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...
Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...
PRELIDA Project
 
Media Ecology Project
Media Ecology ProjectMedia Ecology Project
Media Ecology Project
PRELIDA Project
 
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and RemedyHIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
PRELIDA Project
 
CEDAR & PRELIDA Preservation of Linked Socio-Historical Data
CEDAR & PRELIDA Preservation of Linked Socio-Historical DataCEDAR & PRELIDA Preservation of Linked Socio-Historical Data
CEDAR & PRELIDA Preservation of Linked Socio-Historical Data
PRELIDA Project
 
PRELIDA Project Draft Roadmap
PRELIDA Project Draft RoadmapPRELIDA Project Draft Roadmap
PRELIDA Project Draft Roadmap
PRELIDA Project
 
Introduction to PRELIDA Consolidation and Dissemination Workshop
Introduction to PRELIDA Consolidation and Dissemination WorkshopIntroduction to PRELIDA Consolidation and Dissemination Workshop
Introduction to PRELIDA Consolidation and Dissemination Workshop
PRELIDA Project
 
D3.1 State of the art assessment on Linked Data and Digital Preservation
D3.1 State of the art assessment on Linked Data and Digital PreservationD3.1 State of the art assessment on Linked Data and Digital Preservation
D3.1 State of the art assessment on Linked Data and Digital Preservation
PRELIDA Project
 
Gap Analysis
Gap AnalysisGap Analysis
Gap Analysis
PRELIDA Project
 
Introduction to Prelida
Introduction to PrelidaIntroduction to Prelida
Introduction to Prelida
PRELIDA Project
 

More from PRELIDA Project (12)

Steps towards a Data Value Chain
Steps towards a Data Value ChainSteps towards a Data Value Chain
Steps towards a Data Value Chain
 
CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...
CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...
CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...
 
Experiments with evolving RDF
Experiments with evolving RDFExperiments with evolving RDF
Experiments with evolving RDF
 
Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...
Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...
Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...
 
Media Ecology Project
Media Ecology ProjectMedia Ecology Project
Media Ecology Project
 
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and RemedyHIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
 
CEDAR & PRELIDA Preservation of Linked Socio-Historical Data
CEDAR & PRELIDA Preservation of Linked Socio-Historical DataCEDAR & PRELIDA Preservation of Linked Socio-Historical Data
CEDAR & PRELIDA Preservation of Linked Socio-Historical Data
 
PRELIDA Project Draft Roadmap
PRELIDA Project Draft RoadmapPRELIDA Project Draft Roadmap
PRELIDA Project Draft Roadmap
 
Introduction to PRELIDA Consolidation and Dissemination Workshop
Introduction to PRELIDA Consolidation and Dissemination WorkshopIntroduction to PRELIDA Consolidation and Dissemination Workshop
Introduction to PRELIDA Consolidation and Dissemination Workshop
 
D3.1 State of the art assessment on Linked Data and Digital Preservation
D3.1 State of the art assessment on Linked Data and Digital PreservationD3.1 State of the art assessment on Linked Data and Digital Preservation
D3.1 State of the art assessment on Linked Data and Digital Preservation
 
Gap Analysis
Gap AnalysisGap Analysis
Gap Analysis
 
Introduction to Prelida
Introduction to PrelidaIntroduction to Prelida
Introduction to Prelida
 

DIACHRON Preservation: Evolution Management for Preservation

  • 1. Evolution Management for Preservation PRELIDA Consolidation Workshop 17.10.2014 Giorgos Flouris (FORTH) fgeo@ics.forth.gr
  • 3. Change Detection • Change detection for evolution management – Identifying changes between versions • Challenges (in DIACHRON) 1. Diverse data models 2. Dynamic datasets 3. Recoverable versions 4. Changes as first-class citizens 5. Cross-snapshot queries
  • 4. Evolution in DIACHRON Pilot dataset DIACHRON Version1 Pilot dataset DIACHRON Version2
  • 5. Change Types: Motivation What a naïve diff will report Add (Rec, diachron:subject, EFO_001927) Add (Rec, diachron:hasRecordAttribute, rAtt1) Add (rAtt1, diachron:predicate, rdfs:subClassOf) Add (rAtt1, diachron:object, ObsoleteClass) What the pilot expects Add_SuperClass (EFO_001927, ObsoleteClass)
  • 6. Change Hierarchy: Low-level (1/3) • Low-level changes – DIACHRON model, for internal use – Fixed: Add, Delete – Just additions and deletions of triples – Simple set difference
  • 7. Change Hierarchy: Simple (2/3) • Pilot terminology: – Add_SuperClass Add_Dimension • Fixed, pre-defined • Comprising of low-level changes • Partitioning is perfect – Complete and unambiguous
  • 8. Change Hierarchy: Complex (3/3) • Pilot terminology: – Add_Synonym, Mark_As_Obsolete • Totally custom, pilot-specific (defined at run-time)
  • 9. Using Changes for Evolution Management • DIACHRON data model contains all versions • Detection based on SPARQL queries – Provided at deployment time (for simple) – Generated at creation time (for complex) • Recoverability – Allows moving back and forth between versions
  • 10. Representation Requirements • Interesting queries – Return the simple changes that dataset X underwent between versions V1 and V2 – Return the changes that resource X underwent in the first semester of 2014 – Give me all resources of type X that underwent change Y – Return all countries for which the unemployment rate of their capital city increased at a rate higher than the average increase of the country as a whole, between versions V1 and V2 • Access to both the changes and the data is required – Changes are first-class citizens – Allowing preservation
  • 11. DIACHRON Data Changes Ontology C1 Add_SuperClass V1 V2 asc_p1 asc_p2 Simple_Change Change prov:Activity Data level Schema level EFO_001927 ObsoleteClass old_version new_version diachron:Entity Add_Synonym Complex_Change … …
  • 12. Conclusion • Main DIACHRON message – (Linked) data preservation is related to evolution management • DIACHRON challenges 1. Diverse data models 2. Dynamic datasets 3. Recoverable versions 4. Changes as first-class citizens 5. Cross-snapshot queries • Solutions – DIACHRON data model (#1) – Appropriate change definition and detection (#2, #3) – Changes and data represented at the same level (#4, #5)