DIACHRON Preservation: Evolution Management for Preservation

•Download as PPSX, PDF•

1 like•810 views

by Giorgos Flouris (FORTH), presented at the 3rd PRELIDA Consolidation and Dissemination Workshop, Riva, Italy, October, 17, 2014. More information about the workshop at: prelida.eu

Evolution Management for Preservation
PRELIDA Consolidation Workshop 17.10.2014
Giorgos Flouris (FORTH)
fgeo@ics.forth.gr

Evolution Management Problem
Preservation ↔ Evolution

Change Detection
• Change detection for evolution management
– Identifying changes between versions
• Challenges (in DIACHRON)
1. Diverse data models
2. Dynamic datasets
3. Recoverable versions
4. Changes as first-class citizens
5. Cross-snapshot queries

Evolution in DIACHRON
Pilot dataset DIACHRON
Version1
Pilot dataset DIACHRON
Version2

Change Types: Motivation
What a naïve diff will report
Add (Rec, diachron:subject, EFO_001927)
Add (Rec, diachron:hasRecordAttribute, rAtt1)
Add (rAtt1, diachron:predicate, rdfs:subClassOf)
Add (rAtt1, diachron:object, ObsoleteClass)
What the pilot expects
Add_SuperClass (EFO_001927, ObsoleteClass)

Change Hierarchy: Low-level (1/3)
• Low-level changes
– DIACHRON model, for internal use
– Fixed:
Add, Delete
– Just additions and deletions of triples
– Simple set difference

Change Hierarchy: Simple (2/3)
• Pilot terminology:
– Add_SuperClass
Add_Dimension
• Fixed, pre-defined
• Comprising of low-level changes
• Partitioning is perfect
– Complete and unambiguous

Change Hierarchy: Complex (3/3)
• Pilot terminology:
– Add_Synonym, Mark_As_Obsolete
• Totally custom, pilot-specific (defined at run-time)

Using Changes for
Evolution Management
• DIACHRON data model contains all versions
• Detection based on SPARQL queries
– Provided at deployment time (for simple)
– Generated at creation time (for complex)
• Recoverability
– Allows moving back and forth between versions

Representation Requirements
• Interesting queries
– Return the simple changes that dataset X underwent
between versions V1 and V2
– Return the changes that resource X underwent in the first
semester of 2014
– Give me all resources of type X that underwent change Y
– Return all countries for which the unemployment rate of
their capital city increased at a rate higher than the
average increase of the country as a whole, between
versions V1 and V2
• Access to both the changes and the data is required
– Changes are first-class citizens
– Allowing preservation

DIACHRON
Data
Changes Ontology
C1
Add_SuperClass
V1
V2
asc_p1
asc_p2
Simple_Change
Change
prov:Activity
Data level
Schema level
EFO_001927
ObsoleteClass
old_version
new_version
diachron:Entity
Add_Synonym
Complex_Change
… …

Conclusion
• Main DIACHRON message
– (Linked) data preservation is related to evolution management
• DIACHRON challenges
1. Diverse data models
2. Dynamic datasets
3. Recoverable versions
4. Changes as first-class citizens
5. Cross-snapshot queries
• Solutions
– DIACHRON data model (#1)
– Appropriate change definition and detection (#2, #3)
– Changes and data represented at the same level (#4, #5)

This presentation was prepared by George Papastefanatos (Athena-Research and Innovation Center) for the PERICLES final project conference 'Acting on Change: New Approaches and Future Practices in LTDP' (Wellcome Collection Conference Centre, London, 30 Nov -1 Dec 2016). George Papastefanatos joined a panel discussion on 'Preparing for Change' facilitated by Natalie Harrower (Digital Repository of Ireland). The panel comprised an exciting group of experts including Natasa Milic-Frayling (Intact Digital/PERSIST); Jean-Yves Vion-Dury (Xerox/PERICLES); Neil Beagrie (Charles Beagrie Ltd) and Nancy McGovern (MIT), There is a growing awareness of the broader scope of change in digital preservation, but has this awareness yet led to understanding? And understanding to action? Our question to experts in the field of digital preservation was this: how well prepared are we to deal with the multifaceted aspects of change in our digital environments? http://pericles-project.eu/

JavaDiff - Java source code diff tool

Icsm07.ppt

SPLC Presentation

Changing Platforms

Crafted Design - GeeCON 2014

Sandro Mancuso

How can we quickly tell what an application is about? How can we quickly tell what it does? How can we distinguish business concepts from architecture clutter? How can we quickly find the code we want to change? How can we instinctively know where to add code for new features? Purely looking at unit tests is either not possible or too painful. Looking at higher-level tests can take a long time and still not give us the answers we need. For years, we have all struggled to design and structure projects that reflect the business domain. In this talk Sandro will be sharing how he designed the last application he worked on, twisting a few concepts from Domain-Driven Design, properly applying MVC, borrowing concepts from CQRS, and structuring packages in non-conventional ways. Sandro will also be touching on SOLID principles, Agile incremental design, modularisation, and testing. By iteratively modifying the project structure to better model the product requirements, he has come up with a design style that helps developers create maintainable and domain-oriented software.

Db2 migration -_tips,_tricks,_and_pitfallssam2sung2

Windchill Migration Overview

Eric Braun

#T3UXW14 : workspace Team Work

Paul Blondiaux

Tensor flow 2.0 what's new

Poo Kuan Hoong

Eric Stone's ResumeEric Stone

Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...

HostedbyConfluent

Does your organization struggle with updating of its Kafka Streams application? Releasing a new version of a Kafka Streams application can be challenging, especially if its state has to be preserved between releases. Consider these best-practices and architectural ideas to make this process smoother and improve your release process. Having experienced accidental removal of change-log topics and needing to expand partitions, it is much easier to handle with some planning. With the proper planning, you can achieve easier application upgrades. Key take-aways from the session include: * How do minimize the rebuilding of the state-stores. * How to change stream topologies without affecting the existing state stores. * What you can do when you absolutely need to increase the number of partitions within your application. * How to leveraging schemas for application releases. * Measures to prevent data corruption, especially if Kafka is not only your system of record but also your source of truth. * Techniques to support rolling back an application. * The advantages of splitting apart a Kafka Streams application into multiple applications.

Requirements for Supporting the Iterative Exploration of Scientific Workflow ...

Lucas Augusto Carvalho

The Cha-Q Meta-Model: A Comprehensive, Change-Centric Software Representation

Coen De Roover

DSD-INT 2016 Calibration and scenario generation of hydrodynamics and water -...

Deltares

From Simple Features to Moving Features and Beyond? at OGC Member Meeting, Se...

Anita Graser

Presentation of arxiv preprint https://arxiv.org/abs/2006.16900 Mobility data science lacks common data structures and analytical functions. This position paper assesses the current status and open issues towards a universal API for mobility data science. In particular, we look at standardization efforts revolving around the OGC Moving Features standard which, so far, has not attracted much attention within the mobility data science community. We discuss the hurdles any universal API for movement data has to overcome and propose key steps of a roadmap that would provide the foundation for the development of this API.

DatawarehouseGlobal Online Trainings

What's Next in OpenStack? A Glimpse At The Roadmap

ShamailXD

YouTube Recording: https://www.youtube.com/watch?v=cCdqOxD5G0M Whether you are a newbie to OpenStack looking at building your first cloud or an experienced operator with years of OpenStack success behind you, you've probably spent some time wondering what to expect from the OpenStack project over the next several releases. Will it finally support that new capability you've been waiting for? Should you plan for an upgrade in the next 6 months? While the development community is always working and planning new features, its takes a lot of time on IRC to get a complete view across the different projects. The OpenStack Product WG spent time this cycle working with the project teams and PTLs to understand their priorities for the next several OpenStack releases. Where we have always had an understanding of what's to come in the next release, we're hoping to present a long-term view of the future landscape of OpenStack. In this session, we'll present our findings across the different projects in an effort to give users a glimpse into the OpenStack roadmap

Microservices.pdf

SelmaJelovac1

Following topics will be addressed into presentation: Motivation and goals of splitting monolith application Criteria and markers to start splitting process. Is it necessary at all? Optimal order of extracting microservices How organize the whole process in closed iterative steps? What can be done with common libraries and shared code? Options for technology and deployment of target microservices How organize and motivate the teams and convince management? Speaker Bio Andrei is a Software Architect in VMWare Tanzu Labs. The areas of his interest are REST API design, Microservices, Cloud, resilient distributed systems, security and agile development. Andrei is PMC and committer of Apache CXF and committer of Syncope projects.

DM2E Project meeting Bergen: WP2 presentation, Kai Eckert (University of Mann...Digitised Manuscripts to Europeana

Steps towards a Data Value Chain

PRELIDA Project

CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...

PRELIDA Project

Similar to DIACHRON Preservation: Evolution Management for Preservation

Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...

PERICLES_FP7

JavaDiff - Java source code diff tool

Icsm07.ppt

SPLC Presentation

Changing Platforms

Crafted Design - GeeCON 2014

Sandro Mancuso

Db2 migration -_tips,_tricks,_and_pitfallssam2sung2

Windchill Migration Overview

Eric Braun

#T3UXW14 : workspace Team Work

Paul Blondiaux

Tensor flow 2.0 what's new

Poo Kuan Hoong

Eric Stone's ResumeEric Stone

Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...

HostedbyConfluent

Requirements for Supporting the Iterative Exploration of Scientific Workflow ...

Lucas Augusto Carvalho

The Cha-Q Meta-Model: A Comprehensive, Change-Centric Software Representation

Coen De Roover

DSD-INT 2016 Calibration and scenario generation of hydrodynamics and water -...

Deltares

From Simple Features to Moving Features and Beyond? at OGC Member Meeting, Se...

Anita Graser

DatawarehouseGlobal Online Trainings

What's Next in OpenStack? A Glimpse At The Roadmap

ShamailXD

Microservices.pdf

SelmaJelovac1

DM2E Project meeting Bergen: WP2 presentation, Kai Eckert (University of Mann...Digitised Manuscripts to Europeana

Similar to DIACHRON Preservation: Evolution Management for Preservation (20)

Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...

JavaDiff - Java source code diff tool

Icsm07.ppt

SPLC Presentation

Changing Platforms

Crafted Design - GeeCON 2014

Db2 migration -_tips,_tricks,_and_pitfalls

Windchill Migration Overview

#T3UXW14 : workspace Team Work

Tensor flow 2.0 what's new

Eric Stone's Resume

Developing Kafka Streams Applications with Upgradability in Mind with Neil Bu...

Requirements for Supporting the Iterative Exploration of Scientific Workflow ...

The Cha-Q Meta-Model: A Comprehensive, Change-Centric Software Representation

DSD-INT 2016 Calibration and scenario generation of hydrodynamics and water -...

From Simple Features to Moving Features and Beyond? at OGC Member Meeting, Se...

Datawarehouse

What's Next in OpenStack? A Glimpse At The Roadmap

Microservices.pdf

DM2E Project meeting Bergen: WP2 presentation, Kai Eckert (University of Mann...

More from PRELIDA Project

Steps towards a Data Value Chain

PRELIDA Project

CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...

PRELIDA Project

Experiments with evolving RDF

PRELIDA Project

Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...

PRELIDA Project

Media Ecology Project

PRELIDA Project

HIBERLINK: Reference Rot and Linked Data: Threat and Remedy

PRELIDA Project

CEDAR & PRELIDA Preservation of Linked Socio-Historical Data

PRELIDA Project

PRELIDA Project Draft Roadmap

PRELIDA Project

Introduction to PRELIDA Consolidation and Dissemination Workshop

PRELIDA Project

D3.1 State of the art assessment on Linked Data and Digital Preservation

PRELIDA Project

Gap Analysis

PRELIDA Project

Introduction to Prelida

PRELIDA Project

More from PRELIDA Project (12)

Steps towards a Data Value Chain

CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...

Experiments with evolving RDF

Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...

Media Ecology Project

HIBERLINK: Reference Rot and Linked Data: Threat and Remedy

CEDAR & PRELIDA Preservation of Linked Socio-Historical Data

PRELIDA Project Draft Roadmap

Introduction to PRELIDA Consolidation and Dissemination Workshop

D3.1 State of the art assessment on Linked Data and Digital Preservation

Gap Analysis

Introduction to Prelida

DIACHRON Preservation: Evolution Management for Preservation

1. Evolution Management for Preservation PRELIDA Consolidation Workshop 17.10.2014 Giorgos Flouris (FORTH) fgeo@ics.forth.gr

2. Evolution Management Problem Preservation ↔ Evolution

3. Change Detection • Change detection for evolution management – Identifying changes between versions • Challenges (in DIACHRON) 1. Diverse data models 2. Dynamic datasets 3. Recoverable versions 4. Changes as first-class citizens 5. Cross-snapshot queries

4. Evolution in DIACHRON Pilot dataset DIACHRON Version1 Pilot dataset DIACHRON Version2

5. Change Types: Motivation What a naïve diff will report Add (Rec, diachron:subject, EFO_001927) Add (Rec, diachron:hasRecordAttribute, rAtt1) Add (rAtt1, diachron:predicate, rdfs:subClassOf) Add (rAtt1, diachron:object, ObsoleteClass) What the pilot expects Add_SuperClass (EFO_001927, ObsoleteClass)

6. Change Hierarchy: Low-level (1/3) • Low-level changes – DIACHRON model, for internal use – Fixed: Add, Delete – Just additions and deletions of triples – Simple set difference

7. Change Hierarchy: Simple (2/3) • Pilot terminology: – Add_SuperClass Add_Dimension • Fixed, pre-defined • Comprising of low-level changes • Partitioning is perfect – Complete and unambiguous

8. Change Hierarchy: Complex (3/3) • Pilot terminology: – Add_Synonym, Mark_As_Obsolete • Totally custom, pilot-specific (defined at run-time)

9. Using Changes for Evolution Management • DIACHRON data model contains all versions • Detection based on SPARQL queries – Provided at deployment time (for simple) – Generated at creation time (for complex) • Recoverability – Allows moving back and forth between versions

10. Representation Requirements • Interesting queries – Return the simple changes that dataset X underwent between versions V1 and V2 – Return the changes that resource X underwent in the first semester of 2014 – Give me all resources of type X that underwent change Y – Return all countries for which the unemployment rate of their capital city increased at a rate higher than the average increase of the country as a whole, between versions V1 and V2 • Access to both the changes and the data is required – Changes are first-class citizens – Allowing preservation

11. DIACHRON Data Changes Ontology C1 Add_SuperClass V1 V2 asc_p1 asc_p2 Simple_Change Change prov:Activity Data level Schema level EFO_001927 ObsoleteClass old_version new_version diachron:Entity Add_Synonym Complex_Change … …

12. Conclusion • Main DIACHRON message – (Linked) data preservation is related to evolution management • DIACHRON challenges 1. Diverse data models 2. Dynamic datasets 3. Recoverable versions 4. Changes as first-class citizens 5. Cross-snapshot queries • Solutions – DIACHRON data model (#1) – Appropriate change definition and detection (#2, #3) – Changes and data represented at the same level (#4, #5)

DIACHRON Preservation: Evolution Management for Preservation

Recommended

Recommended

More Related Content

Similar to DIACHRON Preservation: Evolution Management for Preservation

Similar to DIACHRON Preservation: Evolution Management for Preservation (20)

More from PRELIDA Project

More from PRELIDA Project (12)

DIACHRON Preservation: Evolution Management for Preservation